feat(senseaudio): BYOK chat with image + video generation tools (#2065)

* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
This commit is contained in:
mzl163 2026-05-19 23:14:56 +08:00 committed by GitHub
parent 431a5e2d79
commit 210b94069a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
52 changed files with 3305 additions and 55 deletions

View file

@ -63,7 +63,7 @@ OD stands on four open-source shoulders:
| | What you get |
|---|---|
| **Coding-agent CLIs (16)** | Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click |
| **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini, and the daemon normalizes SSE back to the same chat stream. Internal-IP/SSRF blocked at the daemon edge. |
| **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini / Ollama Cloud / SenseAudio, and the daemon normalizes SSE back to the same chat stream. SenseAudio chat additionally exposes `generate_image` and `generate_video` tools so the model can write rendered artifacts straight into the active project's folder. Internal-IP/SSRF blocked at the daemon edge. |
| **Design systems built-in** | **129** — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from [`awesome-design-md`][acd2], plus 57 design skills from [`awesome-design-skills`][ads] added directly under `design-systems/` |
| **Skills built-in** | **31** — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal. |
| **Media generation** | Image · video · audio surfaces ship alongside the design loop. **gpt-image-2** (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · **Seedance 2.0** (ByteDance) for cinematic 15s text-to-video and image-to-video · **HyperFrames** ([heygen-com/hyperframes](https://github.com/heygen-com/hyperframes)) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). **93** ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under [`prompt-templates/`](prompt-templates/), with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace. |
@ -304,7 +304,7 @@ Every layer is composable. Every layer is a file you can edit. Read [`apps/daemo
| Frontend | Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable |
| Daemon | Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates` |
| Agent transport | `child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI) |
| BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge |
| BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge |
| Storage | Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir |
| Preview | Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser ([`apps/web/src/artifacts/parser.ts`](apps/web/src/artifacts/parser.ts)) |
| Export | HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown |
@ -872,7 +872,7 @@ Pattern is the same as the rest: pick a template, edit the brief, send. The agen
The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:
- **Claude Design ZIP import.** Drop an export from claude.ai onto the welcome dialog. `POST /api/import/claude-design` extracts it into a real `.od/projects/<id>/`, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". ([`apps/daemon/src/server.ts`](apps/daemon/src/server.ts) — `/api/import/claude-design`)
- **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`.
- **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`.
- **User-saved templates.** Once you like a render, `POST /api/templates` snapshots the HTML + metadata into the SQLite `templates` table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
- **Tab persistence.** Every project remembers its open files and active tab in the `tabs` table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
- **Artifact lint API.** `POST /api/artifacts/lint` runs structural checks on a generated artifact (broken `<artifact>` framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
@ -974,7 +974,7 @@ Long-form provenance write-up — what we take from each, what we deliberately d
- [x] Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
- [x] 31 skills + 72 design systems + 5 visual directions + 5 device frames
- [x] SQLite-backed projects · conversations · messages · tabs · templates
- [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google}/stream`) with SSRF guard
- [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream`) with SSRF guard
- [x] Claude Design ZIP import (`/api/import/claude-design`)
- [x] Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
- [x] Artifact lint API + 5-dim self-critique pre-emit gate

View file

@ -0,0 +1,598 @@
// Tool definitions and executors exposed to BYOK chat sessions.
//
// Why this file exists: the BYOK chat proxy (e.g. /api/proxy/senseaudio/stream)
// is a thin pass-through that doesn't have the agent-runtime scaffolding the
// CLI agents (Claude Code / Codex / ...) carry. To let users ask their BYOK
// chat to "draw me a cat" and get an actual rendered PNG back, the daemon
// injects an OpenAI-shaped `tools` definition into the upstream completion
// request, then loops on the model's tool_calls: execute → feed the result
// back as a `role: 'tool'` message → re-issue the completion. The chat surface
// stays the same; the tool dispatch happens entirely daemon-side.
//
// Today we ship one tool — `generate_image` — backed by SenseAudio's
// /v1/image/sync endpoint, since the BYOK chat session already authenticates
// against SenseAudio with the same API key. Additional tools (TTS, video,
// research) can be added here as the BYOK surface expands.
import path from 'node:path';
import { writeFile } from 'node:fs/promises';
import { randomBytes } from 'node:crypto';
import { assertExternalAssetUrl } from './connectionTest.js';
import { resolveProviderConfig } from './media-config.js';
import { IMAGE_MODELS } from './media-models.js';
import { ensureProject } from './projects.js';
// SenseAudio image model allowlist — derived from the shared media-models
// registry so adding a new SenseAudio image model in one place (media-models)
// auto-extends the BYOK tool param enum, the Settings dropdown, and the
// daemon-side validation. No drift, no hand-maintained constant.
export const BYOK_SENSEAUDIO_IMAGE_MODELS: readonly string[] = IMAGE_MODELS
.filter((m) => m.provider === 'senseaudio')
.map((m) => m.id);
// Default falls back to the first entry from the registry (today
// `senseaudio-image-2.0-260319` — the multi-aspect latest). Kept as a
// computed constant so re-ordering the registry rotates the default
// without code edits here.
export const BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL =
BYOK_SENSEAUDIO_IMAGE_MODELS[0] ?? 'senseaudio-image-2.0-260319';
export function isSenseAudioImageModel(value: unknown): value is string {
return typeof value === 'string' && BYOK_SENSEAUDIO_IMAGE_MODELS.includes(value);
}
const SENSEAUDIO_DEFAULT_BASE_URL = 'https://api.senseaudio.cn';
const PROMPT_MAX_LENGTH = 2000;
// SenseAudio video — the API only documents one model today, so the
// wire id is a const. The chat tool's `generate_video` param surface
// (prompt, aspect_ratio, duration, resolution, generate_audio) covers
// every knob the doubao-seedance gateway accepts.
const SENSEAUDIO_VIDEO_MODEL = 'doubao-seedance-2-0-260128';
const SENSEAUDIO_VIDEO_ASPECT_RATIOS = ['16:9', '9:16', '4:3', '3:4', '1:1'] as const;
const SENSEAUDIO_VIDEO_RESOLUTIONS = ['480p', '720p', '1080p'] as const;
const SENSEAUDIO_VIDEO_DURATION_MIN = 4;
const SENSEAUDIO_VIDEO_DURATION_MAX = 15;
const SENSEAUDIO_VIDEO_DURATION_DEFAULT = 5;
// Polling: SenseAudio docs recommend 510 s intervals; we pick 5 s and
// cap total attempts so a stuck job can't pin the chat stream forever.
// 120 attempts × 5 s = 10 min ceiling — covers the real-world
// doubao-seedance latency range (1080p + audio jobs frequently spend
// 38 min on the gateway). Below this, the 5-min cap timed out otherwise
// valid jobs; above this the chat surface starts feeling stuck.
const SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT = 5000;
const SENSEAUDIO_VIDEO_MAX_POLLS = 120;
// Periodic progress log every N polls so a long-running job emits some
// signal to the daemon log — without flooding it with one line per
// 5 s. 6 polls = ~30 s between progress lines.
const SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY = 6;
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
// `参数错误size` (verified against logs from a failed call on
// 2026-05-16). We stick to common 16-multiple HD / SD sizes that the
// gateway is known to accept: 1024×1024 for square, 1280×720 / 720×1280
// for widescreen / portrait, 1024×768 / 768×1024 for the 4:3 family.
// The table is duplicated in renderSenseAudioImage (media.ts) for the
// CLI-agent path so both surfaces stay in sync.
const ASPECT_TO_SIZE: Record<string, string> = {
'1:1': '1024x1024',
'16:9': '1280x720',
'9:16': '720x1280',
'4:3': '1024x768',
'3:4': '768x1024',
};
/**
* OpenAI-compatible tool definition for image generation. Injected into
* the upstream `tools` array on every /api/proxy/senseaudio/stream
* request so the LLM can decide on its own when to call it. The
* description deliberately tells the model to embed the returned URL
* in markdown the chat UI already renders markdown images inline,
* so no client-side wiring is required for the bytes to show up.
*/
export const BYOK_SENSEAUDIO_TOOLS = [
{
type: 'function' as const,
function: {
name: 'generate_image',
description:
'Generate an image from a text prompt using SenseAudio image models. Returns a URL pointing to the rendered PNG. After this tool succeeds, embed the URL in your reply with markdown image syntax — ![alt](url) — so the user sees the image inline. Use this whenever the user asks to draw, create, generate, design, or illustrate something visual.',
parameters: {
type: 'object',
properties: {
prompt: {
type: 'string',
description:
'Detailed visual description of the image (Chinese or English are both fine). Include subject, style, lighting, composition. Maximum 2000 characters.',
},
aspect_ratio: {
type: 'string',
enum: ['1:1', '16:9', '9:16', '4:3', '3:4'],
description:
'Output aspect ratio. 1:1 for square avatars and product shots, 16:9 for hero banners, 9:16 for vertical phone posters, 4:3 for editorial covers, 3:4 for posters. Defaults to 1:1 when omitted.',
},
model: {
type: 'string',
enum: [...BYOK_SENSEAUDIO_IMAGE_MODELS],
description:
'Optional model override. Omit this to use the user-configured default from Settings (or the SenseAudio 2.0 multi-aspect model when unset). Choose senseaudio-image-2.0-260319 for multi-aspect generation, senseaudio-image-1.0-260319 for standard sizes, or doubao-seedream-5-0-260128 for high-resolution output through the ByteDance Seedream gateway. The user explicitly picked a default in their Settings — only override when the user asks for a different style/resolution.',
},
},
required: ['prompt'],
},
},
},
{
type: 'function' as const,
function: {
name: 'generate_video',
description:
'Generate a short video (415 seconds) from a text prompt using SenseAudio\'s ByteDance Seedance gateway. This is an asynchronous call that can take 30 s to a few minutes — the daemon polls the job for you, so the user just sees the chat waiting. After this tool succeeds, embed the returned URL in your reply as a markdown link, e.g. `[▶ Play video](url)`, because the chat\'s markdown renderer does not currently render `<video>` tags inline. Use this whenever the user asks for a video, clip, animation, or motion graphic.',
parameters: {
type: 'object',
properties: {
prompt: {
type: 'string',
description:
'Detailed motion description of the video. Include subject, action / camera move / scene transitions, style, lighting. Chinese or English. Maximum 2000 characters.',
},
aspect_ratio: {
type: 'string',
enum: [...SENSEAUDIO_VIDEO_ASPECT_RATIOS],
description:
'Output aspect ratio. 16:9 for cinematic, 9:16 for vertical (phone / TikTok), 1:1 for social square, 4:3 / 3:4 for editorial. Defaults to 16:9.',
},
duration: {
type: 'integer',
minimum: SENSEAUDIO_VIDEO_DURATION_MIN,
maximum: SENSEAUDIO_VIDEO_DURATION_MAX,
description:
`Video length in seconds (integer). Allowed range ${SENSEAUDIO_VIDEO_DURATION_MIN}${SENSEAUDIO_VIDEO_DURATION_MAX}; defaults to ${SENSEAUDIO_VIDEO_DURATION_DEFAULT}. Shorter durations finish faster.`,
},
resolution: {
type: 'string',
enum: [...SENSEAUDIO_VIDEO_RESOLUTIONS],
description:
'Output resolution. 480p (fastest), 720p (default, balanced), 1080p (best quality, slowest). Pick 1080p only when the user explicitly asks for high resolution.',
},
generate_audio: {
type: 'boolean',
description:
'Whether the model also synthesises an audio track for the clip (background sound, ambience). Defaults to false to keep generation fast; flip to true when the user asks for sound, music, or a "video with audio".',
},
},
required: ['prompt'],
},
},
},
];
/**
* Runtime context the BYOK tool executor needs. Passed by the chat
* route on every call so the tool layer stays free of global state and
* can be unit-tested with a temp directory.
*/
export interface BYOKToolContext {
/** Daemon project root used to look up media-config when the chat
* session key is missing. */
projectRoot: string;
/** Daemon's PROJECTS_DIR (the `<projectRoot>/.od/projects/` folder
* that holds per-project file trees). Generated images land in
* `<projectsRoot>/<projectId>/byok-<id>.png` so the project's
* FileViewer / DesignFilesPanel discover them automatically and
* the file travels with the project on export, archive, rename. */
projectsRoot: string;
/** Active project id from the chat surface. Required the BYOK
* chat always runs inside a project, so the tool dispatch refuses
* to fire without one rather than dump bytes into a global cache.
* Validated upstream via `isSafeId`. */
projectId: string;
/** The BYOK chat session's API key first credential we try. Bypasses
* the media-config indirection so the same key the user just pasted
* for chat is the same key the image call uses. */
upstreamApiKey: string;
/** The BYOK chat session's base URL (may be a custom gateway). Falls
* back to api.senseaudio.cn. */
upstreamBaseUrl?: string;
/** Default image model the user picked in BYOK Settings, used when the
* LLM didn't pass `model` in tool args. Validated upstream anything
* outside `BYOK_SENSEAUDIO_IMAGE_MODELS` is dropped so a stale
* client-side config can't smuggle an unregistered model id through.
* Falls back to `BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL` (the registry's
* first SenseAudio image entry) when missing. */
defaultImageModel?: string;
/** Test-only override for the video polling interval (ms). Production
* uses 5 s (SenseAudio's recommendation) tests pass small values
* (e.g. 1 ms) to keep the suite fast without changing the polling
* semantics. */
videoPollIntervalMs?: number;
}
export interface ImageToolResult {
ok: boolean;
/** Daemon-served URL on success. */
url?: string;
/** Short human-readable failure reason. Stuffed into the `tool` role
* reply so the LLM can apologize / retry. */
error?: string;
}
function sanitizeAspectRatio(raw: unknown): string {
if (typeof raw !== 'string') return '1:1';
return ASPECT_TO_SIZE[raw] ? raw : '1:1';
}
/**
* Execute the `generate_image` tool. Calls SenseAudio /v1/image/sync,
* downloads the rendered bytes, writes them to <byokImagesDir>/<id>.png,
* and returns a daemon-served URL. Pure async caller is responsible
* for emitting any SSE events (e.g. "tool result ready").
*
* Failure modes return `{ok: false, error}` rather than throwing so the
* caller can feed the message back to the LLM as a tool_result; that
* lets the model apologize / suggest a retry instead of the chat
* silently stopping.
*/
export async function executeGenerateImage(
args: { prompt?: unknown; aspect_ratio?: unknown; model?: unknown },
ctx: BYOKToolContext,
): Promise<ImageToolResult> {
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
if (!promptRaw) return { ok: false, error: 'prompt is required' };
const prompt =
promptRaw.length > PROMPT_MAX_LENGTH
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
: promptRaw;
const aspect = sanitizeAspectRatio(args.aspect_ratio);
const size = ASPECT_TO_SIZE[aspect];
// Model resolution order — LLM args > user's Settings default > registry
// default. The allowlist guards every step so a hallucinated or stale id
// can never reach the senseaudio /v1/image/sync wire — the catalogue is
// the source of truth.
const senseAudioImageModel = isSenseAudioImageModel(args.model)
? args.model
: isSenseAudioImageModel(ctx.defaultImageModel)
? ctx.defaultImageModel
: BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL;
// Resolve the project folder up front. ensureProject runs
// `isSafeId` internally, so an attacker who somehow bypassed the
// chat-routes guard and slipped `../escape` into projectId fails
// here before we make any upstream call. The returned `dir` is
// reused at writeFile time below.
let dir: string;
try {
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
} catch (err) {
return {
ok: false,
error: `invalid projectId for image storage: ${err instanceof Error ? err.message : String(err)}`,
};
}
// Prefer the BYOK session's key (what the user is actively using).
// Fall back to media-config (env var > stored) so a user who set
// OD_SENSEAUDIO_API_KEY but forgot to fill the chat panel still
// gets a working tool call.
let apiKey = ctx.upstreamApiKey;
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
if (!apiKey) {
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
apiKey = resolved.apiKey || '';
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
}
if (!apiKey) {
return { ok: false, error: 'no SenseAudio API key available' };
}
const trimmedBase = baseUrl.replace(/\/+$/, '');
let imageUrl: string;
try {
const resp = await fetch(`${trimmedBase}/v1/image/sync`, {
method: 'POST',
headers: {
authorization: `Bearer ${apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify({
model: senseAudioImageModel,
prompt,
size,
}),
});
if (!resp.ok) {
const text = await resp.text().catch(() => '');
return {
ok: false,
error: `senseaudio image ${resp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await resp.json()) as {
url?: string;
error_message?: string;
base_resp?: { status_code?: number; status_msg?: string };
};
if (data?.base_resp && data.base_resp.status_code !== 0) {
return {
ok: false,
error: `senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
};
}
if (typeof data?.error_message === 'string' && data.error_message) {
return { ok: false, error: `senseaudio image: ${data.error_message}` };
}
if (typeof data?.url !== 'string' || !data.url) {
return { ok: false, error: 'senseaudio image response missing url' };
}
imageUrl = data.url;
} catch (err) {
return {
ok: false,
error: err instanceof Error ? err.message : String(err),
};
}
const imageUrlCheck = await assertExternalAssetUrl(imageUrl);
if (!imageUrlCheck.ok) return { ok: false, error: imageUrlCheck.error };
let bytes: Buffer;
try {
const imgResp = await fetch(imageUrl, { redirect: 'error' });
if (!imgResp.ok) {
return { ok: false, error: `image download ${imgResp.status}` };
}
bytes = Buffer.from(await imgResp.arrayBuffer());
} catch (err) {
return {
ok: false,
error: `image download failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (bytes.length === 0) {
return { ok: false, error: 'image download returned zero bytes' };
}
// Persist into the active project's folder. `dir` was resolved up
// front via ensureProject — no DB write, no metadata side-effects —
// and the resulting path slots straight into the existing project
// file plumbing: listFiles enumerates it for the FileViewer,
// readProjectFile serves it via GET /api/projects/<id>/files/<filename>,
// and project archive / export pick it up automatically because it
// lives under the project's own directory.
//
// Filename pattern `byok-<timestamp>-<random>.png` keeps tool
// outputs distinguishable from user uploads at a glance while
// staying url-safe.
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
const filename = `byok-${id}.png`;
await writeFile(path.join(dir, filename), bytes);
// Return a relative URL through the project file serving route. The
// web's Next.js rewrites `/api/:path*` to the daemon (see
// apps/web/next.config.ts), so the chat UI loads the image
// same-origin — satisfying the strict CSP (`img-src 'self' data:
// blob:`) without any CORS plumbing.
return {
ok: true,
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
};
}
function sanitizeVideoAspectRatio(raw: unknown): (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number] {
if (typeof raw !== 'string') return '16:9';
return (SENSEAUDIO_VIDEO_ASPECT_RATIOS as readonly string[]).includes(raw)
? (raw as (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number])
: '16:9';
}
function sanitizeVideoResolution(raw: unknown): (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number] {
if (typeof raw !== 'string') return '720p';
return (SENSEAUDIO_VIDEO_RESOLUTIONS as readonly string[]).includes(raw)
? (raw as (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number])
: '720p';
}
function sanitizeVideoDuration(raw: unknown): number {
if (typeof raw !== 'number' || !Number.isFinite(raw)) return SENSEAUDIO_VIDEO_DURATION_DEFAULT;
const rounded = Math.round(raw);
if (rounded < SENSEAUDIO_VIDEO_DURATION_MIN) return SENSEAUDIO_VIDEO_DURATION_MIN;
if (rounded > SENSEAUDIO_VIDEO_DURATION_MAX) return SENSEAUDIO_VIDEO_DURATION_MAX;
return rounded;
}
const sleep = (ms: number): Promise<void> =>
new Promise((resolve) => setTimeout(resolve, ms));
/**
* Execute the `generate_video` tool. SenseAudio's video API is
* asynchronous-only: POST /v1/video/create returns a task_id, then
* GET /v1/video/status?id=<task_id> reports `pending` / `processing`
* `completed` (with `video_url`) or `failed` (with `error_message`).
* We poll every `videoPollIntervalMs` (default 5 s) and bail after
* `SENSEAUDIO_VIDEO_MAX_POLLS` so a stuck upstream can't pin the
* chat stream forever.
*
* The chat tool waits for the whole loop, so the daemon's outbound
* SSE response from /api/proxy/senseaudio/stream stays open for the
* duration. That's intentional the next chat turn cannot begin
* until we have a URL to feed back into the tool_result.
*/
export async function executeGenerateVideo(
args: {
prompt?: unknown;
aspect_ratio?: unknown;
duration?: unknown;
resolution?: unknown;
generate_audio?: unknown;
},
ctx: BYOKToolContext,
): Promise<ImageToolResult> {
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
if (!promptRaw) return { ok: false, error: 'prompt is required' };
const prompt =
promptRaw.length > PROMPT_MAX_LENGTH
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
: promptRaw;
const ratio = sanitizeVideoAspectRatio(args.aspect_ratio);
const resolution = sanitizeVideoResolution(args.resolution);
const duration = sanitizeVideoDuration(args.duration);
const generateAudio = args.generate_audio === true;
let dir: string;
try {
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
} catch (err) {
return {
ok: false,
error: `invalid projectId for video storage: ${err instanceof Error ? err.message : String(err)}`,
};
}
let apiKey = ctx.upstreamApiKey;
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
if (!apiKey) {
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
apiKey = resolved.apiKey || '';
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
}
if (!apiKey) {
return { ok: false, error: 'no SenseAudio API key available' };
}
const trimmedBase = baseUrl.replace(/\/+$/, '');
// Step 1: POST /v1/video/create → task_id.
let taskId: string;
try {
const resp = await fetch(`${trimmedBase}/v1/video/create`, {
method: 'POST',
headers: {
authorization: `Bearer ${apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify({
model: SENSEAUDIO_VIDEO_MODEL,
content: [{ type: 'text', text: prompt }],
duration,
resolution,
ratio,
provider_specific: { generate_audio: generateAudio },
}),
});
if (!resp.ok) {
const text = await resp.text().catch(() => '');
return {
ok: false,
error: `senseaudio video create ${resp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await resp.json()) as { task_id?: string };
if (typeof data?.task_id !== 'string' || !data.task_id) {
return { ok: false, error: 'senseaudio video create response missing task_id' };
}
taskId = data.task_id;
} catch (err) {
return {
ok: false,
error: err instanceof Error ? err.message : String(err),
};
}
// Step 2: poll /v1/video/status until completed / failed / timeout.
const pollIntervalMs = ctx.videoPollIntervalMs ?? SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT;
let videoUrl = '';
for (let attempt = 0; attempt < SENSEAUDIO_VIDEO_MAX_POLLS; attempt++) {
await sleep(pollIntervalMs);
let statusResp: Response;
try {
statusResp = await fetch(
`${trimmedBase}/v1/video/status?id=${encodeURIComponent(taskId)}`,
{
method: 'GET',
headers: { authorization: `Bearer ${apiKey}` },
},
);
} catch (err) {
return {
ok: false,
error: `senseaudio video poll failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (!statusResp.ok) {
const text = await statusResp.text().catch(() => '');
return {
ok: false,
error: `senseaudio video status ${statusResp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await statusResp.json()) as {
status?: string;
progress?: number;
video_url?: string;
error_message?: string;
};
if (data?.status === 'completed') {
if (typeof data.video_url !== 'string' || !data.video_url) {
return { ok: false, error: 'senseaudio video status completed but missing video_url' };
}
videoUrl = data.video_url;
break;
}
if (data?.status === 'failed') {
return {
ok: false,
error: `senseaudio video failed: ${data.error_message || 'unknown reason'}`,
};
}
// pending / processing — continue polling. Emit a periodic log line
// so a stuck job surfaces in the daemon log instead of silently
// burning attempts.
if ((attempt + 1) % SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY === 0) {
const pct = typeof data.progress === 'number' ? data.progress : '?';
console.log(
`[proxy:senseaudio] generate_video poll ${attempt + 1}/${SENSEAUDIO_VIDEO_MAX_POLLS} task=${taskId} status=${data.status ?? 'unknown'} progress=${pct}`,
);
}
}
if (!videoUrl) {
return {
ok: false,
error: `senseaudio video timed out after ${SENSEAUDIO_VIDEO_MAX_POLLS} polls`,
};
}
// Step 3: download the mp4 bytes and persist into the project folder.
// Re-validate the returned URL through validateBaseUrlResolved so a
// malicious gateway can't point us at 169.254.169.254 (AWS / Azure
// metadata service) or RFC1918 hosts via the response payload.
const videoUrlCheck = await assertExternalAssetUrl(videoUrl);
if (!videoUrlCheck.ok) return { ok: false, error: videoUrlCheck.error };
let bytes: Buffer;
try {
const videoResp = await fetch(videoUrl, { redirect: 'error' });
if (!videoResp.ok) {
return { ok: false, error: `video download ${videoResp.status}` };
}
bytes = Buffer.from(await videoResp.arrayBuffer());
} catch (err) {
return {
ok: false,
error: `video download failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (bytes.length === 0) {
return { ok: false, error: 'video download returned zero bytes' };
}
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
const filename = `byok-video-${id}.mp4`;
await writeFile(path.join(dir, filename), bytes);
return {
ok: true,
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
};
}

View file

@ -1,13 +1,22 @@
import type { Express } from 'express';
import type { RouteDeps } from './server-context.js';
import { newInsertId } from './analytics.js';
import { seedProviderIfMissing } from './media-config.js';
import {
BYOK_SENSEAUDIO_TOOLS,
executeGenerateImage,
executeGenerateVideo,
isSenseAudioImageModel,
type BYOKToolContext,
} from './byok-tools.js';
import { isSafeId as isSafeProjectId } from './projects.js';
import {
agentIdToTracking,
projectKindToTracking,
} from '@open-design/contracts/analytics';
import { validateBaseUrlResolved } from './connectionTest.js';
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle'> {}
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle' | 'paths'> {}
// Invariant: a chat assistant message row reflects its run's terminal state
// even when the web client never persists the cancel/finish itself (refresh
@ -310,13 +319,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
const protocol = body.protocol;
if (
typeof protocol !== 'string' ||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol)
!['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'protocol must be one of anthropic|openai|azure|google|ollama',
'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
);
}
if (
@ -371,13 +380,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
const protocol = body.protocol;
if (
typeof protocol !== 'string' ||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol)
!['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'protocol must be one of anthropic|openai|azure|google|ollama',
'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
);
}
if (
@ -1172,4 +1181,354 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
}
});
// SenseAudio chat completions. Wire-compatible with OpenAI (POST
// /v1/chat/completions, Bearer auth, SSE `data: {...}` + `data: [DONE]`)
// plus a daemon-side tool loop: the handler injects an OpenAI
// `tools` array on every upstream request and, when the model
// responds with a `tool_calls` finish_reason, executes the call
// locally, appends the assistant + tool messages to the conversation,
// and re-issues the completion. This is how BYOK chat — which has
// no agent-runtime scaffolding — gets image-generation parity with
// the CLI agent path. Loop is bounded by MAX_BYOK_TOOL_LOOPS so a
// misbehaving model can't pin the daemon in an infinite tool dance.
const MAX_BYOK_TOOL_LOOPS = 3;
type AccumulatedToolCall = { id: string; name: string; arguments: string };
type TurnResult =
| { kind: 'text_end' }
| { kind: 'error' }
| {
kind: 'tool_calls';
assistantMessage: any;
toolCalls: Array<{ id: string; type: 'function'; function: { name: string; arguments: string } }>;
};
app.post('/api/proxy/senseaudio/stream', async (req, res) => {
const proxyBody = req.body || {};
if (rejectProxyPluginContext(proxyBody, res)) return;
const {
baseUrl,
apiKey,
model,
systemPrompt,
messages,
maxTokens,
projectId,
byokImageModel,
} = proxyBody;
if (!apiKey || !model) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'apiKey and model are required',
);
}
// projectId is required because the BYOK generate_image tool writes
// into the active project's folder; without one we'd have to fall
// back to a daemon-global cache that orphans the file. The web
// client always passes project.id from ProjectView, so a missing
// value means the request did not come through the chat surface.
if (typeof projectId !== 'string' || !isSafeProjectId(projectId)) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'projectId is required and must be a safe identifier',
);
}
const effectiveBaseUrl = baseUrl || 'https://api.senseaudio.cn';
const validated = await validateExternalApiBaseUrl(effectiveBaseUrl);
if (validated.error) {
return sendApiError(
res,
validated.forbidden ? 403 : 400,
validated.forbidden ? 'FORBIDDEN' : 'BAD_REQUEST',
validated.error,
);
}
const url = appendVersionedApiPath(effectiveBaseUrl, '/chat/completions');
console.log(
`[proxy:senseaudio] ${req.method} ${validated.parsed?.hostname ?? '?'} model=${model} project=${projectId}`,
);
const workingMessages: any[] = Array.isArray(messages) ? [...messages] : [];
if (typeof systemPrompt === 'string' && systemPrompt) {
workingMessages.unshift({ role: 'system', content: systemPrompt });
}
// Tool execution context — built once per request. The image tool
// writes into `<projectsRoot>/<projectId>/byok-<id>.png` and returns
// a relative URL via `/api/projects/:id/files/:filename`. The web's
// Next.js rewrites `/api/:path*` to the daemon, so the chat UI
// loads images same-origin through the standard project file
// route — no CSP / CORS exceptions needed.
// User-configured BYOK default image model. Drop silently if the
// client sent an id outside the SenseAudio registry — the tool
// will fall back to the registry default and the LLM can still
// override per-call via the tool's `model` arg.
const validDefaultImageModel = isSenseAudioImageModel(byokImageModel)
? byokImageModel
: undefined;
const toolCtx: BYOKToolContext = {
projectRoot: ctx.paths.PROJECT_ROOT,
projectsRoot: ctx.paths.PROJECTS_DIR,
projectId,
upstreamApiKey: apiKey,
upstreamBaseUrl: effectiveBaseUrl,
// Spread-conditional because tsconfig's exactOptionalPropertyTypes
// forbids `field: undefined` on an optional slot. The byok-tools
// executor reads `ctx.defaultImageModel` with `isSenseAudioImageModel`
// anyway, so a missing key and an undefined value behave the same.
...(validDefaultImageModel
? { defaultImageModel: validDefaultImageModel }
: {}),
};
// Run one round-trip: POST to upstream, stream text deltas to the
// client as they arrive, accumulate any tool_call deltas. Returns
// a typed result describing what to do next (loop on tool calls,
// close the stream, or bail on error). Closures capture all the
// SSE helpers from registerChatRoutes.
const runSenseAudioTurn = async (
sse: any,
messagesForTurn: any[],
): Promise<TurnResult> => {
const payload: any = {
model,
messages: messagesForTurn,
max_tokens:
typeof maxTokens === 'number' && maxTokens > 0 ? maxTokens : 8192,
stream: true,
tools: BYOK_SENSEAUDIO_TOOLS,
tool_choice: 'auto',
};
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify(payload),
redirect: 'error',
});
if (!response.ok) {
const errorText = await response.text();
console.error(
`[proxy:senseaudio] upstream error: ${response.status} ${redactAuthTokens(errorText)}`,
);
sendProxyError(sse, `Upstream error: ${response.status}`, {
code: proxyErrorCode(response.status),
details: errorText,
retryable: response.status === 429 || response.status >= 500,
});
return { kind: 'error' };
}
const accum: Record<number, AccumulatedToolCall> = {};
let finishReason = '';
let providerError = '';
await streamUpstreamSse(response, ({ payload, data }: any) => {
if (payload === '[DONE]') return true;
if (!data) return false;
const streamErr = extractStreamErrorMessage(data);
if (streamErr) {
providerError = streamErr;
return true;
}
const choices = (data as any).choices;
if (!Array.isArray(choices) || choices.length === 0) return false;
const choice = choices[0] || {};
const delta = choice.delta || {};
// Text content streams to the client unchanged. Tool turns and
// text turns can both share this path — the OpenAI protocol
// never emits text+tool_calls in the same chunk, but it can
// emit text before / after a tool_call in the same turn, and
// we want the user to see whatever the model decided to say.
if (typeof delta.content === 'string' && delta.content) {
sse.send('delta', { delta: delta.content });
}
// Tool call deltas stream as fragments — `id` arrives once at
// the start, `function.name` once at the start, and
// `function.arguments` accumulates a chunked JSON string we
// have to concatenate. Parallel calls use the `index` field to
// distinguish slots. Default to 0 when omitted (older models).
if (Array.isArray(delta.tool_calls)) {
for (const tc of delta.tool_calls) {
const idx = typeof tc?.index === 'number' ? tc.index : 0;
if (!accum[idx]) {
accum[idx] = { id: '', name: '', arguments: '' };
}
const slot = accum[idx];
if (typeof tc.id === 'string' && tc.id) slot.id = tc.id;
if (typeof tc.function?.name === 'string' && tc.function.name) {
slot.name = tc.function.name;
}
if (typeof tc.function?.arguments === 'string') {
slot.arguments += tc.function.arguments;
}
}
}
if (typeof choice.finish_reason === 'string' && choice.finish_reason) {
finishReason = choice.finish_reason;
}
return false;
});
if (providerError) {
sendProxyError(sse, `Provider error: ${providerError}`, {
details: providerError,
});
return { kind: 'error' };
}
if (finishReason === 'tool_calls' && Object.keys(accum).length > 0) {
const indices = Object.keys(accum)
.map(Number)
.sort((a, b) => a - b);
const toolCalls = indices.map((i) => ({
id: accum[i]!.id || `call_${i}`,
type: 'function' as const,
function: {
name: accum[i]!.name,
arguments: accum[i]!.arguments,
},
}));
return {
kind: 'tool_calls',
assistantMessage: {
role: 'assistant',
content: null,
tool_calls: toolCalls,
},
toolCalls,
};
}
return { kind: 'text_end' };
};
const executeOneTool = async (call: {
id: string;
function: { name: string; arguments: string };
}): Promise<{ ok: boolean; url?: string; error?: string; kind?: 'image' | 'video' }> => {
const fnName = call?.function?.name ?? '';
if (fnName !== 'generate_image' && fnName !== 'generate_video') {
return {
ok: false,
error: `unknown tool: ${fnName || 'unnamed'}`,
};
}
let args: any = {};
try {
args = JSON.parse(call.function.arguments || '{}');
} catch {
return { ok: false, error: 'tool arguments were not valid JSON' };
}
if (fnName === 'generate_image') {
const result = await executeGenerateImage(args, toolCtx);
return { ...result, kind: 'image' };
}
// generate_video — longer (up to 5 min), async-with-polling.
const result = await executeGenerateVideo(args, toolCtx);
return { ...result, kind: 'video' };
};
const sse = createSseResponse(res);
sse.send('start', { model });
// SenseAudio's gateway issues one API key that works for both
// /v1/chat/completions and the image / TTS surfaces. Mirror the
// BYOK key into media-config so the CLI agent path (`od media
// generate`) picks it up automatically — fire-and-forget; the
// chat stream must not block on the disk write. seedProviderIfMissing
// is idempotent and preserves env-var-resolved keys.
seedProviderIfMissing(ctx.paths.PROJECT_ROOT, 'senseaudio', {
apiKey,
baseUrl: effectiveBaseUrl,
})
.then((seeded) => {
if (seeded) {
console.log(
'[proxy:senseaudio] seeded media-config.senseaudio from BYOK key',
);
}
})
.catch((err: unknown) => {
console.warn(
`[proxy:senseaudio] seed media-config failed: ${
err instanceof Error ? err.message : String(err)
}`,
);
});
try {
for (let loop = 0; loop < MAX_BYOK_TOOL_LOOPS; loop++) {
const turn = await runSenseAudioTurn(sse, workingMessages);
if (turn.kind === 'error') return sse.end();
if (turn.kind === 'text_end') {
sse.send('end', {});
return sse.end();
}
// turn.kind === 'tool_calls'
workingMessages.push(turn.assistantMessage);
for (const call of turn.toolCalls) {
const result = await executeOneTool(call);
// The tool result is delivered to the model as a `tool` role
// message — a structured payload the model can interpret. We
// also surface a daemon-side log line so a user reporting "no
// image showed up" can grep for the call id. The kind field
// distinguishes image vs video so the daemon picks the right
// embedding hint for the model (markdown image syntax for
// PNG, markdown link for MP4 since the chat renderer doesn't
// currently render <video> tags).
const toolName = call?.function?.name ?? 'unknown';
if (result.ok) {
console.log(
`[proxy:senseaudio] ${toolName} OK: ${call.id}${result.url}`,
);
} else {
console.warn(
`[proxy:senseaudio] ${toolName} FAILED: ${call.id}${result.error}`,
);
}
const content = result.ok
? result.kind === 'video'
? `Video generated successfully. URL: ${result.url}. Reply to the user with a clickable markdown link, e.g. [▶ Play video](${result.url}). Do NOT use markdown image syntax — the chat renderer does not embed <video> tags.`
: `Image generated successfully. URL: ${result.url}. Reply to the user with: ![generated image](${result.url})`
: result.kind === 'video'
? `Video generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt or a shorter duration.`
: `Image generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt.`;
workingMessages.push({
role: 'tool',
tool_call_id: call.id,
content,
});
}
}
// Tool loop exhausted — the model still wants to call tools but we
// refuse a 4th round. Close the stream gracefully; the last text
// delta the model emitted (if any) is already on the wire.
console.warn(
'[proxy:senseaudio] tool loop bounded at MAX_BYOK_TOOL_LOOPS=3',
);
sse.send('end', {});
return sse.end();
} catch (err: any) {
console.error(`[proxy:senseaudio] internal error: ${err.message}`);
sendProxyError(sse, err.message, { code: 'INTERNAL_ERROR' });
sse.end();
}
});
}

View file

@ -119,6 +119,41 @@ export async function validateBaseUrlResolved(
return sync;
}
/**
* SSRF guard for asset URLs handed back inside a successful API
* response typically a `data.url` or `data.video_url` that points
* at the gateway's CDN, but is attacker-controllable when the
* upstream gateway is compromised or misconfigured. Routes the URL
* through `validateBaseUrlResolved` (DNS-resolve reject loopback,
* RFC1918, link-local, CGNAT, metadata-service IPs) and returns a
* discriminated union so callers don't have to repeat the
* `validated.error || !validated.parsed` plumbing.
*
* Two callers today:
* - `byok-tools.ts` for the chat-tool image/video downloads
* - `media.ts` `renderSenseAudioImage` for the CLI agent path
* Both hand the URL straight to `fetch(...)` next, so pair this
* guard with `redirect: 'error'` on the fetch to also block a
* 3xx hop into private space.
*/
export async function assertExternalAssetUrl(
rawUrl: string,
): Promise<{ ok: true } | { ok: false; error: string }> {
if (typeof rawUrl !== 'string' || !rawUrl) {
return { ok: false, error: 'empty download url' };
}
const validated = await validateBaseUrlResolved(rawUrl);
if (validated.error || !validated.parsed) {
return {
ok: false,
error: validated.forbidden
? `blocked download url (${validated.error ?? 'internal address'})`
: `invalid download url: ${validated.error ?? 'unknown reason'}`,
};
}
return { ok: true };
}
// Aggressive but not punitive — happy paths usually return in under 2 s.
// Override with OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS for slow networks
// or distant providers; invalid values fall back to the default.
@ -315,10 +350,10 @@ function inspectProviderCompletion(
const obj = data && typeof data === 'object' ? data as Record<string, unknown> : null;
if (!obj) return { valid: false };
if (protocol === 'openai' || protocol === 'azure') {
if (protocol === 'openai' || protocol === 'azure' || protocol === 'senseaudio') {
const responseModel = typeof obj.model === 'string' ? obj.model : '';
if (
protocol === 'openai' &&
(protocol === 'openai' || protocol === 'senseaudio') &&
enforceResponseModel &&
responseModel &&
requestedModel &&
@ -518,6 +553,12 @@ function buildProviderCall(input: ProviderTestRequest): ProviderCallShape {
},
};
case 'openai':
case 'senseaudio':
// SenseAudio is wire-compatible with OpenAI (POST /v1/chat/completions,
// Bearer auth, identical body + response shape), so the connection
// smoke test reuses the same call shape. We default the base URL
// upstream-side in chat-routes; this layer assumes the caller passed
// a concrete URL via the BYOK form.
return {
url: appendVersionedApiPath(baseUrl, '/chat/completions'),
headers: {

View file

@ -521,3 +521,53 @@ export async function writeConfig(projectRoot: string, body: unknown) {
await writeStored(projectRoot, next);
return readMaskedConfig(projectRoot);
}
/**
* Idempotent "seed if empty" write for a single provider slot. The chat
* proxy uses this to mirror a BYOK key into media-config so the agent's
* image / TTS path picks up the same credential without the user having
* to paste it twice. Strict rules:
* * No-op when an apiKey is ALREADY stored for `providerId` (the user
* may have configured Media independently and we never overwrite).
* * No-op when an env-var key resolves for `providerId` (env wins
* regardless of disk state seeding would be invisible).
* * No-op when the incoming `apiKey` is empty (we only seed values
* the chat layer has just verified upstream).
* * Otherwise merge `{ [providerId]: entry }` into the existing
* provider map and persist. All other provider slots and aliases
* are preserved byte-for-byte.
*
* Returns `true` when a write happened (caller can log), `false` when
* the call was a no-op. Errors are surfaced the caller decides
* whether to swallow them (fire-and-forget) or propagate.
*/
export async function seedProviderIfMissing(
projectRoot: string,
providerId: string,
entry: { apiKey?: string; baseUrl?: string; model?: string },
): Promise<boolean> {
if (!PROVIDER_IDS.includes(providerId)) return false;
const apiKey = entry.apiKey?.trim() ?? '';
if (!apiKey) return false;
// Env var wins at resolution time, so seeding when env is set would
// be invisible to the user. Skip to avoid confusing on-disk state.
if (readEnvKey(providerId)) return false;
const prior = await readStored(projectRoot);
const priorApiKey =
typeof prior[providerId]?.apiKey === 'string' && prior[providerId].apiKey.trim()
? prior[providerId].apiKey.trim()
: '';
if (priorApiKey) return false;
const baseUrl = entry.baseUrl?.trim() ?? '';
const model = entry.model?.trim() ?? '';
const next: ProviderMap = { ...prior };
next[providerId] = {
apiKey,
...(baseUrl ? { baseUrl } : {}),
...(model ? { model } : {}),
};
await writeStored(projectRoot, next);
return true;
}

View file

@ -60,7 +60,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
{
id: 'senseaudio',
label: 'SenseAudio',
hint: 'TTS · 70+ system voices · clone',
hint: '',
integrated: true,
defaultBaseUrl: 'https://api.senseaudio.cn',
docsUrl: 'https://docs.senseaudio.cn',
@ -80,6 +80,10 @@ export const IMAGE_MODELS: MediaModel[] = [
{ id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] },
{ id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] },
{ id: 'senseaudio-image-2.0-260319', label: 'senseaudio-image-2.0', hint: 'SenseAudio · multi-aspect, latest', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'senseaudio-image-1.0-260319', label: 'senseaudio-image-1.0', hint: 'SenseAudio · standard', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'doubao-seedream-5-0-260128', label: 'seedream-5.0', hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] },
{ id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] },

View file

@ -57,6 +57,7 @@ import {
findProvider,
modelsForSurface,
} from './media-models.js';
import { assertExternalAssetUrl } from './connectionTest.js';
import { resolveModelAlias, resolveProviderConfig } from './media-config.js';
import {
ensureProject,
@ -559,6 +560,11 @@ export async function generateMedia(args: {
bytes = result.bytes;
providerNote = result.providerNote;
suggestedExt = result.suggestedExt;
} else if (def.provider === 'senseaudio' && surface === 'image') {
const result = await renderSenseAudioImage(ctx, credentials);
bytes = result.bytes;
providerNote = result.providerNote;
suggestedExt = result.suggestedExt;
} else if (def.provider === 'fishaudio' && surface === 'audio') {
const result = await renderFishAudioTTS(ctx, credentials);
bytes = result.bytes;
@ -2243,6 +2249,131 @@ async function renderSenseAudioTTS(ctx: MediaContext, credentials: ProviderConfi
};
}
// ---------------------------------------------------------------------------
// Provider: SenseAudio image — POST /v1/image/sync (synchronous text-to-image).
//
// Docs: https://docs.senseaudio.cn/guides/image/overview
// * Models: senseaudio-image-2.0-260319 (multi-aspect), senseaudio-image-1.0-260319
// (standard), doubao-seedream-5-0-260128 (hi-res). The wire `model` field
// accepts the catalog id directly so no alias map is needed.
// * Body: { model, prompt (≤2000 chars), size (WxH, required when no
// reference), reference (URL or data URI, optional), seed (optional int) }.
// * Response: { url: string } pointing at the rendered PNG; we fetch it
// once to materialise bytes the dispatcher can write to disk.
// * Auth: Authorization: Bearer <API_KEY>; shares the senseaudio provider
// slot with the TTS path (OD_SENSEAUDIO_API_KEY / SENSEAUDIO_API_KEY).
// We default to the /sync endpoint because the chat runtime already streams
// progress and a single round-trip keeps the dispatcher contract identical
// to OpenAI / Volcengine image. Switching to /v1/image/async + GET
// /v1/image/pending is a future option if the upstream model latency
// outgrows the daemon's request timeout.
// ---------------------------------------------------------------------------
const SENSEAUDIO_IMAGE_PROMPT_LIMIT = 2000;
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
// `参数错误size`. Keep this table in sync with byok-tools.ts's
// ASPECT_TO_SIZE — both paths hit the same /v1/image/sync endpoint.
function senseAudioImageSize(aspect?: string): string {
if (aspect === '16:9') return '1280x720';
if (aspect === '9:16') return '720x1280';
if (aspect === '4:3') return '1024x768';
if (aspect === '3:4') return '768x1024';
return '1024x1024';
}
async function renderSenseAudioImage(ctx: MediaContext, credentials: ProviderConfig): Promise<RenderResult> {
if (!credentials.apiKey) {
throw new Error(
'no SenseAudio API key — configure it in Settings or set OD_SENSEAUDIO_API_KEY',
);
}
const baseUrl = (credentials.baseUrl || SENSEAUDIO_DEFAULT_BASE_URL).replace(
/\/$/,
'',
);
const promptRaw = (ctx.prompt && ctx.prompt.trim()) || 'A high-quality reference image.';
// SenseAudio rejects >2000-char prompts with a 4xx; trim defensively so a
// verbose agent plan doesn't dead-end the generation. The truncated tail
// surfaces in providerNote so the user sees what was actually sent.
const prompt =
promptRaw.length > SENSEAUDIO_IMAGE_PROMPT_LIMIT
? promptRaw.slice(0, SENSEAUDIO_IMAGE_PROMPT_LIMIT)
: promptRaw;
const size = senseAudioImageSize(ctx.aspect);
const reference = ctx.imageRef?.dataUrl;
const body: Record<string, unknown> = {
model: ctx.wireModel,
prompt,
size,
};
if (reference) {
// When a reference image is supplied the API documents `size` as
// optional; we still send it so the output dimensions stay
// deterministic across t2i / i2i runs of the same project.
body.reference = reference;
}
const resp = await fetch(`${baseUrl}/v1/image/sync`, {
method: 'POST',
headers: {
authorization: `Bearer ${credentials.apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify(body),
});
const respText = await resp.text();
if (!resp.ok) {
throw new Error(`senseaudio image ${resp.status}: ${truncate(respText, 240)}`);
}
let data: any;
try {
data = JSON.parse(respText);
} catch {
throw new Error(`senseaudio image non-JSON: ${truncate(respText, 200)}`);
}
// Mirror the TTS base_resp envelope check: HTTP 200 can still encode an
// upstream logical failure. The image API uses the same shape on the
// failure path documented for /v1/image/pending (status=failed +
// error_message), so surface either source verbatim.
if (data?.base_resp && data.base_resp.status_code !== 0) {
throw new Error(
`senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
);
}
if (typeof data?.error_message === 'string' && data.error_message) {
throw new Error(`senseaudio image api error: ${data.error_message}`);
}
const url = typeof data?.url === 'string' ? data.url : '';
if (!url) {
throw new Error('senseaudio image response missing url');
}
// Mirror the chat-tool SSRF guard (byok-tools.ts): the gateway-returned
// `url` is attacker-controllable inside a successful response, so DNS-
// resolve it through validateBaseUrlResolved and refuse loopback /
// RFC1918 / metadata-service hosts. Pair with `redirect: 'error'` so a
// 3xx hop into private space is also blocked.
const urlCheck = await assertExternalAssetUrl(url);
if (!urlCheck.ok) {
throw new Error(`senseaudio image ${urlCheck.error}`);
}
const imgResp = await fetch(url, { redirect: 'error' });
if (!imgResp.ok) {
throw new Error(`senseaudio image fetch ${imgResp.status}`);
}
const bytes = Buffer.from(await imgResp.arrayBuffer());
if (bytes.length === 0) {
throw new Error('senseaudio image fetch returned zero bytes');
}
return {
bytes,
providerNote: `senseaudio/${ctx.wireModel} · ${size}${reference ? ' · i2i' : ''} · ${bytes.length} bytes`,
suggestedExt: '.png',
};
}
// ---------------------------------------------------------------------------
// Provider: FishAudio — Speech-1.x family text-to-speech (synchronous).
//

View file

@ -142,6 +142,15 @@ const PROVIDER_DEFAULTS = {
model: 'gemma3:4b',
baseUrl: 'https://ollama.com',
},
// SenseAudio's chat API is OpenAI-compatible (POST /v1/chat/completions,
// Bearer auth), so the extractor falls through to callOpenAI with this
// base URL and the user's SenseAudio API key. The default model is the
// small/fast variant so auto-pick stays cheap; users can swap in
// senseaudio-s2 or any gateway model via the picker.
senseaudio: {
model: 'senseaudio-s2-flash',
baseUrl: 'https://api.senseaudio.cn',
},
};
// Map an explicit override provider to the env var the daemon should
@ -169,6 +178,13 @@ function envKeyFor(provider) {
if (provider === 'ollama') {
return process.env.OLLAMA_API_KEY?.trim() || '';
}
if (provider === 'senseaudio') {
return (
process.env.OD_SENSEAUDIO_API_KEY?.trim()
|| process.env.SENSEAUDIO_API_KEY?.trim()
|| ''
);
}
return '';
}

View file

@ -149,7 +149,9 @@ function extractGoogleModels(data: unknown): ProviderModelOption[] {
}
function providerModelsUrl(protocol: ConnectionTestProtocol, baseUrl: string, apiKey: string): string {
if (protocol === 'openai') return appendVersionedApiPath(baseUrl, '/models');
if (protocol === 'openai' || protocol === 'senseaudio') {
return appendVersionedApiPath(baseUrl, '/models');
}
if (protocol === 'anthropic') {
const url = new URL(appendVersionedApiPath(baseUrl, '/models'));
url.searchParams.set('limit', '1000');
@ -167,7 +169,9 @@ function providerModelsHeaders(
protocol: ConnectionTestProtocol,
apiKey: string,
): Record<string, string> {
if (protocol === 'openai') return { authorization: `Bearer ${apiKey}` };
if (protocol === 'openai' || protocol === 'senseaudio') {
return { authorization: `Bearer ${apiKey}` };
}
if (protocol === 'anthropic') {
return {
'x-api-key': apiKey,
@ -178,7 +182,9 @@ function providerModelsHeaders(
}
function extractModels(protocol: ConnectionTestProtocol, data: unknown): ProviderModelOption[] {
if (protocol === 'openai') return extractOpenAiModels(data);
// SenseAudio's /v1/models response follows the OpenAI envelope
// (`{ data: [{ id, ... }] }`), so the same extractor handles both.
if (protocol === 'openai' || protocol === 'senseaudio') return extractOpenAiModels(data);
if (protocol === 'anthropic') return extractAnthropicModels(data);
if (protocol === 'google') return extractGoogleModels(data);
return [];

View file

@ -10859,6 +10859,7 @@ export async function startServer({
db,
design,
http: httpDeps,
paths: pathDeps,
chat: { startChatRun, submitToolResultToRun },
agents: agentDeps,
critique: critiqueDeps,

View file

@ -0,0 +1,686 @@
import { mkdir, mkdtemp, readFile, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
BYOK_SENSEAUDIO_TOOLS,
executeGenerateImage,
executeGenerateVideo,
} from '../src/byok-tools.js';
describe('BYOK_SENSEAUDIO_TOOLS', () => {
it('exports an OpenAI-shaped generate_image tool definition', () => {
const tool = BYOK_SENSEAUDIO_TOOLS.find(
(t) => t.function.name === 'generate_image',
);
expect(tool).toBeDefined();
expect(tool!.type).toBe('function');
expect(tool!.function.parameters.required).toEqual(['prompt']);
expect(tool!.function.parameters.properties.aspect_ratio.enum).toEqual([
'1:1',
'16:9',
'9:16',
'4:3',
'3:4',
]);
});
it('exposes both generate_image and generate_video tools', () => {
const names = BYOK_SENSEAUDIO_TOOLS.map((t) => t.function.name).sort();
expect(names).toEqual(['generate_image', 'generate_video']);
});
});
describe('executeGenerateImage', () => {
let root: string;
let projectsRoot: string;
const PROJECT_ID = 'test-project';
const realFetch = globalThis.fetch;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-byok-tools-'));
projectsRoot = path.join(root, 'projects');
});
afterEach(async () => {
globalThis.fetch = realFetch;
vi.unstubAllGlobals();
await rm(root, { recursive: true, force: true });
});
const baseCtx = () => ({
projectRoot: root,
projectsRoot,
projectId: PROJECT_ID,
upstreamApiKey: 'sa-byok-key',
upstreamBaseUrl: 'https://api.senseaudio.cn',
});
it('calls /v1/image/sync, downloads the URL, persists bytes, and returns a daemon URL', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sa-byok-key',
'content-type': 'application/json',
});
expect(JSON.parse(String(init?.body))).toEqual({
model: 'senseaudio-image-2.0-260319',
prompt: 'a tabby cat playing with yarn',
size: '1024x1024',
});
return new Response(
JSON.stringify({
url: 'https://cdn.example.test/generated/cat.png',
base_resp: { status_code: 0, status_msg: 'success' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/generated/cat.png') {
return new Response(pngBytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'a tabby cat playing with yarn' },
baseCtx(),
);
expect(result.ok).toBe(true);
// Returns a relative URL through the project file route so the
// chat UI loads same-origin via Next.js's /api/:path* rewrite,
// satisfying the strict CSP `img-src 'self'`. Path component is
// url-encoded so unusual (but isSafeId-passing) project ids don't
// break the URL.
expect(result.url).toMatch(
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-[a-z0-9-]+\\.png$`),
);
expect(fetchMock).toHaveBeenCalledTimes(2);
// Persisted file lives inside the project folder where listFiles /
// readProjectFile / archive plumbing will all discover it.
const filename = result.url!.split('/').pop()!;
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
expect(onDisk.equals(pngBytes)).toBe(true);
});
it('honours args.model when the LLM picks a SenseAudio image model', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).model).toBe('doubao-seedream-5-0-260128');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/hi.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'wallpaper', model: 'doubao-seedream-5-0-260128' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('falls back to ctx.defaultImageModel when args.model is missing', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/std.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'standard' },
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
);
expect(result.ok).toBe(true);
});
it('ignores args.model when it is not in the SenseAudio allowlist', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
// Falls through to ctx.defaultImageModel (registry-valid).
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'spoofed', model: 'evil-model-id' },
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
);
expect(result.ok).toBe(true);
});
it('falls back to registry default when both args.model and ctx.defaultImageModel are missing/invalid', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
// Registry default is the first SenseAudio entry — 2.0 today.
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-2.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/d.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'no model anywhere' },
{ ...baseCtx(), defaultImageModel: 'also-bogus' },
);
expect(result.ok).toBe(true);
});
it('rejects unsafe projectId before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'x' },
{ ...baseCtx(), projectId: '../escape' },
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/invalid projectId/);
// ensureProject runs up front so the unsafe id is caught BEFORE
// any senseaudio upstream call goes out — no token spent, no
// attempt to write outside the project tree.
expect(fetchMock).not.toHaveBeenCalled();
});
it('maps aspect_ratio to the SenseAudio size string', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/wide.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'widescreen banner', aspect_ratio: '16:9' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('falls back to 1:1 for unknown aspect_ratio values', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).size).toBe('1024x1024');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/square.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'square thing', aspect_ratio: 'something-else' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('returns { ok: false } on missing prompt', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({}, baseCtx());
expect(result).toEqual({ ok: false, error: 'prompt is required' });
expect(fetchMock).not.toHaveBeenCalled();
});
it('returns { ok: false } when no API key is available', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const ctx = { ...baseCtx(), upstreamApiKey: '' };
const result = await executeGenerateImage({ prompt: 'whatever' }, ctx);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/no SenseAudio API key/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('surfaces HTTP failures with status code and truncated body', async () => {
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio image 401/);
});
it('surfaces error_message envelope verbatim', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/sensitive_content_blocked/);
});
it('surfaces base_resp non-zero status_code', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({
base_resp: { status_code: 1004, status_msg: 'quota exhausted' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/api error 1004/);
expect(result.error).toMatch(/quota exhausted/);
});
it('returns { ok: false } when upstream returns no url', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'ok' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/missing url/);
});
it('returns { ok: false } when the image download fails', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/will-404.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response('not found', { status: 404 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/image download 404/);
});
});
describe('BYOK_SENSEAUDIO_TOOLS — video', () => {
it('exposes a generate_video tool definition with the documented param surface', () => {
const video = BYOK_SENSEAUDIO_TOOLS.find(
(t) => t.function.name === 'generate_video',
);
expect(video).toBeDefined();
const props = video!.function.parameters.properties as Record<string, any>;
expect(video!.function.parameters.required).toEqual(['prompt']);
expect(props.aspect_ratio.enum).toEqual(['16:9', '9:16', '4:3', '3:4', '1:1']);
expect(props.resolution.enum).toEqual(['480p', '720p', '1080p']);
expect(props.duration).toMatchObject({ type: 'integer', minimum: 4, maximum: 15 });
expect(props.generate_audio.type).toBe('boolean');
});
});
describe('executeGenerateVideo', () => {
let root: string;
let projectsRoot: string;
const PROJECT_ID = 'test-project';
const realFetch = globalThis.fetch;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-byok-video-'));
projectsRoot = path.join(root, 'projects');
});
afterEach(async () => {
globalThis.fetch = realFetch;
vi.unstubAllGlobals();
await rm(root, { recursive: true, force: true });
});
const baseCtx = () => ({
projectRoot: root,
projectsRoot,
projectId: PROJECT_ID,
upstreamApiKey: 'sa-byok-key',
upstreamBaseUrl: 'https://api.senseaudio.cn',
// Keep tests fast — 1 ms between polls instead of the production 5 s.
videoPollIntervalMs: 1,
});
it('creates, polls until completed, downloads, and writes the mp4 into the project folder', async () => {
const mp4Bytes = Buffer.from([0x00, 0x00, 0x00, 0x18, 0x66, 0x74, 0x79, 0x70]);
let pollCount = 0;
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url === 'https://api.senseaudio.cn/v1/video/create') {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sa-byok-key',
'content-type': 'application/json',
});
const body = JSON.parse(String(init?.body));
expect(body).toEqual({
model: 'doubao-seedance-2-0-260128',
content: [{ type: 'text', text: 'a sunset over the ocean' }],
duration: 8,
resolution: '1080p',
ratio: '16:9',
provider_specific: { generate_audio: true },
});
return new Response(
JSON.stringify({ task_id: 'task-abc' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status?id=task-abc')) {
pollCount++;
if (pollCount === 1) {
return new Response(
JSON.stringify({ status: 'pending', progress: 0 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (pollCount === 2) {
return new Response(
JSON.stringify({ status: 'processing', progress: 50 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(
JSON.stringify({
status: 'completed',
progress: 100,
video_url: 'https://cdn.example.test/video/done.mp4',
duration: 8,
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/video/done.mp4') {
return new Response(mp4Bytes, {
status: 200,
headers: { 'content-type': 'video/mp4' },
});
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{
prompt: 'a sunset over the ocean',
aspect_ratio: '16:9',
duration: 8,
resolution: '1080p',
generate_audio: true,
},
baseCtx(),
);
expect(result.ok).toBe(true);
expect(result.url).toMatch(
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-video-[a-z0-9-]+\\.mp4$`),
);
// 1× create + 3× poll + 1× download = 5 fetches total.
expect(fetchMock).toHaveBeenCalledTimes(5);
expect(pollCount).toBe(3);
const filename = result.url!.split('/').pop()!;
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
expect(onDisk.equals(mp4Bytes)).toBe(true);
});
it('defaults duration / resolution / aspect when caller omits them', async () => {
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
const body = JSON.parse(String(init?.body));
expect(body).toMatchObject({
duration: 5,
resolution: '720p',
ratio: '16:9',
provider_specific: { generate_audio: false },
});
return new Response(
JSON.stringify({ task_id: 'task-defaults' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'completed',
video_url: 'https://cdn.example.test/video/d.mp4',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(Buffer.from([0x01]), { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'minimal' }, baseCtx());
expect(result.ok).toBe(true);
});
it('clamps duration outside the 415 range and rejects non-enum aspect_ratio / resolution', async () => {
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
const body = JSON.parse(String(init?.body));
// 99 → clamped to 15; 'octagonal' → falls back to '16:9';
// '8k' → falls back to '720p'.
expect(body).toMatchObject({
duration: 15,
resolution: '720p',
ratio: '16:9',
});
return new Response(
JSON.stringify({ task_id: 'task-clamp' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'completed',
video_url: 'https://cdn.example.test/clamp.mp4',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(Buffer.from([0x02]), { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{
prompt: 'overflow',
duration: 99,
aspect_ratio: 'octagonal',
resolution: '8k',
},
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('surfaces a failed status as a tool error so the model can apologize', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
return new Response(
JSON.stringify({ task_id: 'task-fail' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'failed',
error_message: 'sensitive_content_blocked',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'blocked content' },
baseCtx(),
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio video failed/);
expect(result.error).toMatch(/sensitive_content_blocked/);
});
it('times out after SENSEAUDIO_VIDEO_MAX_POLLS polls when the job stays pending', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
return new Response(
JSON.stringify({ task_id: 'task-stuck' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({ status: 'pending', progress: 0 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'stuck job' },
baseCtx(),
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/timed out/);
// 1× create + 120× poll = 121 fetches (10-min ceiling at 5 s
// intervals — kept generous because doubao-seedance frequently
// spends 38 min on the gateway for 1080p+audio jobs).
expect(fetchMock).toHaveBeenCalledTimes(121);
}, 30_000);
it('returns a tool error when create response is missing task_id', async () => {
const fetchMock = vi.fn(async () =>
new Response('{"oops": true}', {
status: 200,
headers: { 'content-type': 'application/json' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/missing task_id/);
});
it('returns a tool error when create call returns non-2xx', async () => {
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio video create 401/);
});
it('rejects an unsafe projectId before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'x' },
{ ...baseCtx(), projectId: '../escape' },
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/invalid projectId/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('rejects empty prompt before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({}, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/prompt is required/);
expect(fetchMock).not.toHaveBeenCalled();
});
});

View file

@ -8,6 +8,7 @@ import {
readMaskedConfig,
resolveModelAlias,
resolveProviderConfig,
seedProviderIfMissing,
writeConfig,
} from '../src/media-config.js';
@ -868,3 +869,159 @@ describe('media-config model alias resolution (issue #1277)', () => {
).toBe('doubao-seedream-5-0');
});
});
describe('seedProviderIfMissing', () => {
let projectRoot: string;
const SENSEAUDIO_ENV_KEYS = ['OD_SENSEAUDIO_API_KEY', 'SENSEAUDIO_API_KEY'];
const originalEnv = Object.fromEntries(
SENSEAUDIO_ENV_KEYS.map((key) => [key, process.env[key]]),
);
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
const originalDataDir = process.env.OD_DATA_DIR;
beforeEach(async () => {
projectRoot = await mkdtemp(path.join(tmpdir(), 'od-media-seed-'));
for (const key of SENSEAUDIO_ENV_KEYS) {
delete process.env[key];
}
delete process.env.OD_MEDIA_CONFIG_DIR;
delete process.env.OD_DATA_DIR;
});
afterEach(async () => {
for (const key of SENSEAUDIO_ENV_KEYS) {
if (originalEnv[key] == null) {
delete process.env[key];
} else {
process.env[key] = originalEnv[key];
}
}
if (originalMediaConfigDir == null) {
delete process.env.OD_MEDIA_CONFIG_DIR;
} else {
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
}
if (originalDataDir == null) {
delete process.env.OD_DATA_DIR;
} else {
process.env.OD_DATA_DIR = originalDataDir;
}
await rm(projectRoot, { recursive: true, force: true });
});
async function writeStored(data: unknown) {
const file = path.join(projectRoot, '.od', 'media-config.json');
await mkdir(path.dirname(file), { recursive: true });
await writeFile(file, JSON.stringify(data), 'utf8');
}
async function readStoredJson(): Promise<unknown> {
const file = path.join(projectRoot, '.od', 'media-config.json');
const raw = await readFile(file, 'utf8');
return JSON.parse(raw);
}
it('writes a fresh entry when the slot is empty', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-test-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(true);
const stored = await readStoredJson();
expect(stored).toEqual({
providers: {
senseaudio: {
apiKey: 'sa-test-key',
baseUrl: 'https://api.senseaudio.cn',
},
},
});
});
it('no-ops and preserves the stored key when one is already configured', async () => {
await writeStored({
providers: {
senseaudio: { apiKey: 'pre-existing-key', baseUrl: 'https://existing.example' },
},
});
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'newer-byok-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
const stored = (await readStoredJson()) as { providers: Record<string, unknown> };
expect(stored.providers.senseaudio).toEqual({
apiKey: 'pre-existing-key',
baseUrl: 'https://existing.example',
});
});
it('preserves every other provider and aliases when seeding', async () => {
await writeStored({
providers: {
openai: { apiKey: 'sk-openai', baseUrl: 'https://api.openai.com/v1' },
volcengine: { apiKey: 'ark-key', baseUrl: 'https://ark.cn-beijing.volces.com/api/v3' },
},
aliases: { 'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0' },
});
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-new',
});
expect(wrote).toBe(true);
const stored = (await readStoredJson()) as {
providers: Record<string, unknown>;
aliases: Record<string, string>;
};
expect(stored.providers.openai).toEqual({
apiKey: 'sk-openai',
baseUrl: 'https://api.openai.com/v1',
});
expect(stored.providers.volcengine).toEqual({
apiKey: 'ark-key',
baseUrl: 'https://ark.cn-beijing.volces.com/api/v3',
});
expect(stored.providers.senseaudio).toEqual({ apiKey: 'sa-new' });
expect(stored.aliases).toEqual({
'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0',
});
});
it('no-ops when an env var resolves a key for the provider', async () => {
process.env.OD_SENSEAUDIO_API_KEY = 'env-key';
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-byok-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('no-ops on empty apiKey', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: '',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('no-ops for unknown provider ids', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'not-a-provider', {
apiKey: 'whatever',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('resolves the seeded key through resolveProviderConfig', async () => {
await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-final',
baseUrl: 'https://api.senseaudio.cn',
});
const resolved = await resolveProviderConfig(projectRoot, 'senseaudio');
expect(resolved).toEqual({
apiKey: 'sa-final',
baseUrl: 'https://api.senseaudio.cn',
});
});
});

View file

@ -0,0 +1,305 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { generateMedia } from '../src/media.js';
const TEST_SENSEAUDIO_BASE_URL = 'https://senseaudio-gateway.example.test';
const TEST_IMAGE_URL = 'https://cdn.example.test/generated/abc.png';
const TEST_IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
function buildOkResponse(url = TEST_IMAGE_URL) {
return new Response(
JSON.stringify({ url, base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
function buildImageFetchResponse(bytes: Buffer) {
return new Response(bytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
describe('senseaudio image generation', () => {
let root: string;
let projectRoot: string;
let projectsRoot: string;
const realFetch = globalThis.fetch;
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
const originalDataDir = process.env.OD_DATA_DIR;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-senseaudio-image-'));
projectRoot = path.join(root, 'project-root');
projectsRoot = path.join(projectRoot, '.od', 'projects');
await mkdir(projectsRoot, { recursive: true });
delete process.env.OD_MEDIA_CONFIG_DIR;
delete process.env.OD_DATA_DIR;
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
});
afterEach(async () => {
globalThis.fetch = realFetch;
if (originalMediaConfigDir == null) {
delete process.env.OD_MEDIA_CONFIG_DIR;
} else {
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
}
if (originalDataDir == null) {
delete process.env.OD_DATA_DIR;
} else {
process.env.OD_DATA_DIR = originalDataDir;
}
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
await rm(root, { recursive: true, force: true });
});
async function writeConfig(data: unknown) {
const file = path.join(projectRoot, '.od', 'media-config.json');
await mkdir(path.dirname(file), { recursive: true });
await writeFile(file, JSON.stringify(data), 'utf8');
}
it('renders a SenseAudio image with the documented sync defaults', async () => {
await writeConfig({
providers: {
senseaudio: {
apiKey: 'sense-test-key',
baseUrl: TEST_SENSEAUDIO_BASE_URL,
},
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sense-test-key',
'content-type': 'application/json',
});
expect(JSON.parse(String(init?.body))).toEqual({
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
size: '1024x1024',
});
return buildOkResponse();
}
if (urlStr === TEST_IMAGE_URL) {
return buildImageFetchResponse(TEST_IMAGE_BYTES);
}
throw new Error(`unexpected fetch: ${urlStr}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
output: 'sa-hero.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
expect(result.providerId).toBe('senseaudio');
expect(result.providerNote).toContain('senseaudio/senseaudio-image-2.0-260319');
expect(result.providerNote).toContain('1024x1024');
const bytes = await readFile(path.join(projectsRoot, 'project-1', 'sa-hero.png'));
expect(bytes.equals(TEST_IMAGE_BYTES)).toBe(true);
});
it('maps aspect ratios to the SenseAudio size strings', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-1.0-260319',
aspect: '16:9',
prompt: 'Widescreen banner.',
output: 'sa-banner.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('falls back to the canonical base URL when none is configured', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key' },
},
});
const fetchMock = vi.fn(async (input: unknown) => {
const urlStr = String(input);
if (urlStr === 'https://api.senseaudio.cn/v1/image/sync') {
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'doubao-seedream-5-0-260128',
prompt: 'Default base url.',
output: 'sa-default-base.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('reads the API key from OD_SENSEAUDIO_API_KEY when storage is empty', async () => {
process.env.OD_SENSEAUDIO_API_KEY = 'env-sense-key';
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
if (String(input).endsWith('/v1/image/sync')) {
expect(init?.headers).toMatchObject({ authorization: 'Bearer env-sense-key' });
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Env-only key.',
output: 'sa-env.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('errors when no API key is configured', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Should fail.',
output: 'sa-no-key.png',
}),
).rejects.toThrow(/no SenseAudio API key/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('surfaces HTTP-level failures with the status code and truncated body', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Bad auth.',
output: 'sa-401.png',
}),
).rejects.toThrow('senseaudio image 401: unauthorized');
});
it('surfaces upstream error_message verbatim when the body reports failure', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Blocked.',
output: 'sa-blocked.png',
}),
).rejects.toThrow('senseaudio image api error: sensitive_content_blocked');
});
it('errors when the response body is missing the image url', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Missing url.',
output: 'sa-missing-url.png',
}),
).rejects.toThrow('senseaudio image response missing url');
});
});

View file

@ -523,6 +523,497 @@ describe('API proxy routes', () => {
expect(upstreamInit?.redirect).toBe('error');
});
it('streams delta + end for SenseAudio chat completions', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse([
'data: {"choices":[{"delta":{"content":"sense"}}]}',
'',
'data: [DONE]',
'',
].join('\n')));
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hello' }],
}),
});
await expect(res.text()).resolves.toContain('event: delta\ndata: {"delta":"sense"}');
expect(fetchMock).toHaveBeenCalledWith(
'https://api.senseaudio.cn/v1/chat/completions',
expect.objectContaining({
headers: expect.objectContaining({ Authorization: 'Bearer sa-test' }),
redirect: 'error',
}),
);
});
it('defaults SenseAudio base URL to api.senseaudio.cn when caller omits it', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(String(fetchMock.mock.calls[0]![0])).toBe(
'https://api.senseaudio.cn/v1/chat/completions',
);
});
it('rejects SenseAudio requests that omit apiKey or model', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const missingKey = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(missingKey.status).toBe(400);
const missingModel = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(missingModel.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
it('disables upstream redirects for senseaudio proxy requests', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'model-one',
messages: [{ role: 'user', content: 'hi' }],
}),
});
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
!String(input).startsWith(baseUrl),
);
expect(upstreamCall).toBeDefined();
const upstreamInit = upstreamCall![1] as FetchInit;
expect(upstreamInit?.redirect).toBe('error');
});
it('injects generate_image tool definition on every SenseAudio request', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse([
'data: {"choices":[{"delta":{"content":"ok"}}]}',
'',
'data: [DONE]',
'',
].join('\n')));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
!String(input).startsWith(baseUrl),
);
expect(upstreamCall).toBeDefined();
const body = JSON.parse(String((upstreamCall![1] as FetchInit)?.body));
expect(body.tool_choice).toBe('auto');
expect(Array.isArray(body.tools)).toBe(true);
expect(body.tools[0]).toMatchObject({
type: 'function',
function: { name: 'generate_image' },
});
});
it('runs the BYOK image tool loop end-to-end', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
const upstreamChatBodies: any[] = [];
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
// SenseAudio image generation
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({
url: 'https://cdn.example.test/cat.png',
base_resp: { status_code: 0, status_msg: 'success' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
// Image bytes download (initiated by the tool, not via the proxy)
if (url === 'https://cdn.example.test/cat.png') {
return new Response(pngBytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
// Upstream chat completions — capture bodies, return different SSE per call
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
chatCallIndex++;
if (chatCallIndex === 1) {
// First turn: model decides to call generate_image
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"a cat\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
// Second turn: model summarises with image embedded in markdown
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"Here is your cat: "}}]}',
'',
'data: {"choices":[{"index":0,"delta":{"content":"![cat](generated)"}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'draw a cat' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
// Final assistant text streams through to the client
expect(body).toContain('event: delta');
expect(body).toContain('Here is your cat');
expect(body).toContain('![cat](generated)');
expect(body).toContain('event: end');
// Two upstream chat completions calls happened (loop ran exactly once)
expect(upstreamChatBodies).toHaveLength(2);
// Second upstream call includes assistant{tool_calls} + tool{result}
const secondMessages = upstreamChatBodies[1].messages;
expect(secondMessages).toHaveLength(3);
expect(secondMessages[0]).toEqual({ role: 'user', content: 'draw a cat' });
expect(secondMessages[1]).toMatchObject({
role: 'assistant',
content: null,
tool_calls: [
{
id: 'call_abc',
type: 'function',
function: {
name: 'generate_image',
arguments: '{"prompt":"a cat"}',
},
},
],
});
expect(secondMessages[2]).toMatchObject({
role: 'tool',
tool_call_id: 'call_abc',
content: expect.stringMatching(
/Image generated successfully\. URL: \/api\/projects\/test-project\/files\/byok-[a-z0-9-]+\.png/,
),
});
});
it('feeds a tool error message back to the model when generate_image fails', async () => {
const upstreamChatBodies: any[] = [];
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
chatCallIndex++;
if (chatCallIndex === 1) {
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_err","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"...\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"Sorry, that one was blocked."}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'draw something blocked' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
expect(body).toContain('Sorry, that one was blocked');
expect(upstreamChatBodies).toHaveLength(2);
const toolMsg = upstreamChatBodies[1].messages[2];
expect(toolMsg.role).toBe('tool');
expect(toolMsg.tool_call_id).toBe('call_err');
expect(toolMsg.content).toMatch(/Image generation failed/);
expect(toolMsg.content).toMatch(/sensitive_content_blocked/);
});
it('bounds the BYOK tool loop at MAX_BYOK_TOOL_LOOPS=3', async () => {
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/x.png') {
return new Response(Buffer.from([0x89, 0x50]), { status: 200 });
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
chatCallIndex++;
// Always return tool_calls — the model never returns text
return sseResponse([
`data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_${chatCallIndex}","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"x\\"}"}}]},"finish_reason":null}]}`,
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'infinite' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
expect(body).toContain('event: end');
// Loop ran exactly MAX_BYOK_TOOL_LOOPS times before bailing.
expect(chatCallIndex).toBe(3);
});
it('writes the generated image into the project folder and serves it via /api/projects/:id/files/*', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x42, 0x59]);
let capturedUrl: string | undefined;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/served.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/served.png') {
return new Response(pngBytes, { status: 200 });
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
const body = JSON.parse(String(init?.body || '{}'));
// Capture URL the tool produced from the second turn's tool message.
const toolMsg = body.messages?.find((m: any) => m.role === 'tool');
if (toolMsg) {
const match = /URL: (\/api\/projects\/[A-Za-z0-9._-]+\/files\/byok-[a-z0-9-]+\.png)/.exec(toolMsg.content);
if (match) capturedUrl = match[1];
}
const isToolTurn = !toolMsg;
if (isToolTurn) {
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_serve","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"s\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"done"}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const proxyRes = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'gen' }],
}),
});
// Drain the SSE body so the tool loop fully completes before we assert.
await proxyRes.text();
expect(capturedUrl).toBeDefined();
// The URL the tool emits is relative — same-origin via Next.js
// rewrite in production, hits this test server directly here.
// We GET the captured URL through the standard project file route
// and assert the bytes come back. This proves both halves:
// (1) the image landed in <projectsRoot>/<projectId>/ as expected
// (so listFiles / FileViewer / archive will find it), and
// (2) /api/projects/:id/files/* serves it without needing any
// byok-specific route.
const imgRes = await realFetch(`${baseUrl}${capturedUrl!}`);
expect(imgRes.status).toBe(200);
expect(imgRes.headers.get('content-type')).toMatch(/^image\/png/);
const served = Buffer.from(await imgRes.arrayBuffer());
expect(served.equals(pngBytes)).toBe(true);
});
it('rejects senseaudio chat requests without a projectId', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
// no projectId — should 400
}),
});
expect(res.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
it('rejects senseaudio chat requests with an unsafe projectId', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
model: 'senseaudio-s2',
projectId: '../etc/passwd',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(res.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
// Plan §3.A4 / spec §11.8 (e2e-7): the API-fallback proxy paths must
// never carry plugin context. The web sidecar's fallback mode bypasses
// the daemon snapshot bus, so any pluginId / appliedPluginSnapshotId in
@ -534,6 +1025,7 @@ describe('API proxy routes', () => {
'/api/proxy/openai/stream',
'/api/proxy/azure/stream',
'/api/proxy/google/stream',
'/api/proxy/senseaudio/stream',
];
for (const path of proxies) {

View file

@ -14,6 +14,7 @@ import {
trackStudioClickChatComposer,
trackStudioViewChatPanel,
} from '../analytics/events';
import { IMAGE_MODELS } from "../media/models";
import { projectRawUrl, uploadProjectFiles, openFolderDialog, fetchConnectors } from "../providers/registry";
import { patchProject } from "../state/projects";
import { fetchMcpServers } from "../state/mcp";
@ -126,6 +127,14 @@ interface Props {
researchAvailable?: boolean;
projectMetadata?: ProjectMetadata;
onProjectMetadataChange?: (metadata: ProjectMetadata) => void;
// SenseAudio BYOK image-model picker shown above the textarea. Hidden
// when the active chat protocol is anything other than 'senseaudio',
// so the composer stays clean for every other BYOK tab. The state
// owner is ProjectView (per-session, reset on refresh); ChatComposer
// is a fully controlled select.
byokApiProtocol?: AppConfig['apiProtocol'];
byokImageModel?: string;
onChangeByokImageModel?: (model: string) => void;
currentSkillId?: string | null;
onProjectSkillChange?: (skillId: string | null) => void;
// Set when the project was created with a plugin already pinned
@ -188,6 +197,9 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
researchAvailable = false,
projectMetadata,
onProjectMetadataChange,
byokApiProtocol,
byokImageModel,
onChangeByokImageModel,
currentSkillId = null,
onProjectSkillChange,
pinnedPluginId = null,
@ -1186,6 +1198,53 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
t={t}
/>
) : null}
{byokApiProtocol === 'senseaudio' && onChangeByokImageModel ? (
<div
className="composer-byok-image-model"
data-testid="composer-byok-image-model"
style={{
display: 'flex',
alignItems: 'center',
gap: 8,
padding: '4px 8px',
fontSize: 12,
color: 'var(--text-muted, #888)',
}}
>
<Icon name="image" size={13} />
<label
htmlFor="composer-byok-image-model-select"
style={{ flexShrink: 0 }}
>
{t('settings.byokImageModel')}
</label>
<select
id="composer-byok-image-model-select"
value={byokImageModel ?? ''}
onChange={(e) => onChangeByokImageModel(e.target.value)}
style={{
background: 'transparent',
border: '1px solid var(--border, #444)',
borderRadius: 4,
padding: '2px 6px',
color: 'inherit',
fontSize: 12,
}}
>
<option value="">
{(IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
?? 'senseaudio-image-2.0') + ' (default)'}
</option>
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
(m) => (
<option key={m.id} value={m.id}>
{m.label}
</option>
),
)}
</select>
</div>
) : null}
{/*
Spec §8.4 context bar above the composer input. The
section now behaves as a pure context bar: it renders the

View file

@ -279,6 +279,12 @@ interface Props {
// message" without forcing a separate side widget.
activePluginSnapshot?: AppliedPluginSnapshot | null;
onCollapse?: () => void;
// SenseAudio BYOK only — wired straight through to ChatComposer for the
// in-composer image-model picker. Active protocol is read so the picker
// hides when the user is on any other BYOK tab (azure / openai / …).
byokApiProtocol?: AppConfig['apiProtocol'];
byokImageModel?: string;
onChangeByokImageModel?: (model: string) => void;
}
type Tab = 'chat' | 'comments';
@ -327,6 +333,9 @@ export function ChatPane({
activePluginSnapshot,
skills = [],
onCollapse,
byokApiProtocol,
byokImageModel,
onChangeByokImageModel,
}: Props) {
const t = useT();
const logRef = useRef<HTMLDivElement | null>(null);
@ -872,6 +881,9 @@ export function ChatPane({
researchAvailable={researchAvailable}
projectMetadata={projectMetadata}
onProjectMetadataChange={onProjectMetadataChange}
byokApiProtocol={byokApiProtocol}
byokImageModel={byokImageModel}
onChangeByokImageModel={onChangeByokImageModel}
currentSkillId={currentSkillId}
onProjectSkillChange={onProjectSkillChange}
pinnedPluginId={activePluginSnapshot?.pluginId ?? null}

View file

@ -1192,7 +1192,14 @@ export function DesignFilesPanel({
</div>
</div>
{preview && previewFile ? (
// Key on the file name so React unmounts the previous DfPreview
// (and its iframe / image element) when the user clicks a
// different file. Without this, React diffing reuses the same
// iframe DOM node and the browser keeps showing the first
// file's contents — only the `src` prop changes but the iframe
// never actually navigates.
<DfPreview
key={previewFile.name}
projectId={projectId}
file={previewFile}
onOpen={() => onOpenFile(previewFile.name)}

View file

@ -486,6 +486,15 @@ export function ProjectView({
const [liveArtifacts, setLiveArtifacts] = useState<LiveArtifactSummary[]>([]);
const [liveArtifactEvents, setLiveArtifactEvents] = useState<LiveArtifactEventItem[]>([]);
const [workspaceFocused, setWorkspaceFocused] = useState(false);
// Per-session override for the BYOK SenseAudio chat's generate_image
// tool. Seeded once from Settings (config.byokImageModel) so the
// composer dropdown opens on the user's chosen default; subsequent
// selections live only in this component's state — page refresh /
// project switch resets to the Settings default. Persistent defaults
// live in Settings → BYOK → SenseAudio → Image generation model.
const [byokImageModelOverride, setByokImageModelOverride] = useState<string>(
config.byokImageModel ?? '',
);
// `closed` → no surface; `review` → read-only saved-state panel with a
// preview + reopen-to-edit action (#1822); `edit` → the textarea editor.
const [instructionsMode, setInstructionsMode] = useState<'closed' | 'review' | 'edit'>('closed');
@ -2202,6 +2211,13 @@ export function ProjectView({
});
},
onError: handlers.onError,
}, {
projectId: project.id,
// SenseAudio BYOK chat reads this to pre-fill the tool param's
// default model. Prefer the live composer override; fall back
// to the Settings default when the composer dropdown is on
// "use default". Other protocols ignore unknown body fields.
byokImageModel: byokImageModelOverride || config.byokImageModel,
});
}
},
@ -3375,6 +3391,9 @@ export function ProjectView({
onTogglePet={onTogglePet}
onOpenPetSettings={onOpenPetSettings}
researchAvailable={config.mode === 'daemon'}
byokApiProtocol={config.apiProtocol}
byokImageModel={byokImageModelOverride}
onChangeByokImageModel={setByokImageModelOverride}
projectMetadata={project.metadata}
onProjectMetadataChange={(metadata) => {
onProjectChange({ ...project, metadata });

View file

@ -68,7 +68,7 @@ import type {
import { testAgent, testApiProvider } from '../providers/connection-test';
import { fetchProviderModels } from '../providers/provider-models';
import { fetchConnectors, fetchDesignTemplates } from '../providers/registry';
import { MEDIA_PROVIDERS } from '../media/models';
import { IMAGE_MODELS, MEDIA_PROVIDERS } from '../media/models';
import { XaiOAuthControl } from './XaiOAuthControl';
import type { MediaProvider } from '../media/models';
import { Toast } from './Toast';
@ -444,6 +444,7 @@ function currentApiProtocolConfig(config: AppConfig): ApiProtocolConfig {
model: config.model,
apiVersion: config.apiVersion ?? '',
apiProviderBaseUrl: config.apiProviderBaseUrl ?? null,
byokImageModel: config.byokImageModel ?? '',
};
}
@ -460,6 +461,11 @@ function applyApiProtocolConfig(
model: apiConfig.model,
apiProviderBaseUrl: apiConfig.apiProviderBaseUrl ?? null,
apiVersion: protocol === 'azure' ? (apiConfig.apiVersion ?? '') : '',
// byokImageModel is SenseAudio-only — flipping to another BYOK tab
// shouldn't carry a SenseAudio image-model choice into, say, the
// OpenAI form. Mirrors the apiVersion guarding above.
byokImageModel:
protocol === 'senseaudio' ? (apiConfig.byokImageModel ?? '') : '',
};
}
@ -2683,6 +2689,34 @@ export function SettingsDialog({
/>
</label>
) : null}
{apiProtocol === 'senseaudio' ? (
<label className="field">
<span className="field-label">{t('settings.byokImageModel')}</span>
<select
value={cfg.byokImageModel ?? ''}
onChange={(e) =>
updateApiConfig({ byokImageModel: e.target.value })
}
>
{/* Default-empty option resolves to the registry default
on the daemon side (senseaudio-image-2.0-260319 today).
Listing it explicitly lets the picker show what the
unconfigured state actually means. */}
<option value="">
{IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
?? 'senseaudio-image-2.0'}
{' (default)'}
</option>
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
(m) => (
<option key={m.id} value={m.id}>
{m.label}
</option>
),
)}
</select>
</label>
) : null}
<p className="hint">{t('settings.apiHint')}</p>
</section>
)}

View file

@ -202,6 +202,7 @@ export const ar: Dict = {
'settings.azureDeploymentModelHint':
'في Azure OpenAI، يُستخدم هذا الحقل كاسم النشر في /openai/deployments/<model>. أدخل اسم النشر الذي أنشأته في Azure.',
'settings.apiVersion': 'إصدار API',
'settings.byokImageModel': 'نموذج إنشاء الصور',
'settings.maxTokens': 'أقصى عدد من الرموز (اختياري)',
'settings.maxTokensHint':
'الحد الأقصى لطول الاستجابة. لكل نموذج قيمة افتراضية؛ اتركها فارغة لاستخدامها، أو أدخل رقماً للتجاوز.',

View file

@ -202,6 +202,7 @@ export const de: Dict = {
'settings.azureDeploymentModelHint':
'Fuer Azure OpenAI wird dieses Feld als Deployment-Name in /openai/deployments/<model> verwendet. Geben Sie den in Azure angelegten Deployment-Namen ein.',
'settings.apiVersion': 'API-Version',
'settings.byokImageModel': 'Bilderzeugungsmodell',
'settings.maxTokens': 'Max. Tokens (optional)',
'settings.maxTokensHint':
'Obergrenze für die Antwortlänge. Jedes Modell hat einen abgestimmten Standardwert (im Platzhalter sichtbar); leer lassen, um ihn zu verwenden, oder eine Zahl eingeben, um ihn zu überschreiben.',

View file

@ -227,6 +227,7 @@ export const en: Dict = {
'settings.azureModelFetchHint':
'For Azure OpenAI, enter the deployment name you created in Azure. Automatic deployment discovery is not available from this BYOK endpoint.',
'settings.apiVersion': 'API version',
'settings.byokImageModel': 'Image generation model',
'settings.maxTokens': 'Max tokens (optional)',
'settings.maxTokensHint':
'Cap on the response length. Each model has a tuned default (shown as a placeholder); leave blank to use it, or enter a number to override.',

View file

@ -202,6 +202,7 @@ export const esES: Dict = {
'settings.azureDeploymentModelHint':
'Para Azure OpenAI, este campo se usa como nombre del despliegue en /openai/deployments/<model>. Introduce el nombre del despliegue que creaste en Azure.',
'settings.apiVersion': 'Versión de API',
'settings.byokImageModel': 'Modelo de generación de imágenes',
'settings.maxTokens': 'Tokens máx. (opcional)',
'settings.maxTokensHint':
'Tope para la longitud de la respuesta. Cada modelo tiene un valor por defecto ajustado (visible en el placeholder); déjalo vacío para usarlo o introduce un número para anularlo.',

View file

@ -202,6 +202,7 @@ export const fa: Dict = {
'settings.azureDeploymentModelHint':
'در Azure OpenAI، این فیلد به عنوان نام استقرار در /openai/deployments/<model> استفاده می‌شود. نام استقراری را که در Azure ساخته‌اید وارد کنید.',
'settings.apiVersion': 'نسخه API',
'settings.byokImageModel': 'مدل تولید تصویر',
'settings.maxTokens': 'حداکثر توکن (اختیاری)',
'settings.maxTokensHint':
'سقف طول پاسخ. هر مدل مقدار پیش‌فرض تنظیم‌شدهٔ خود را دارد (در placeholder نمایش داده می‌شود)؛ برای استفاده از آن خالی بگذارید، یا برای جایگزینی، عددی وارد کنید.',

View file

@ -202,6 +202,7 @@ export const fr: Dict = {
'settings.azureDeploymentModelHint':
'Pour Azure OpenAI, ce champ est utilisé comme nom du déploiement dans /openai/deployments/<model>. Saisissez le nom du déploiement créé dans Azure.',
'settings.apiVersion': 'Version API',
'settings.byokImageModel': "Modèle de génération d'images",
'settings.maxTokens': 'Tokens max (optionnel)',
'settings.maxTokensHint':
'Limite de la longueur de réponse. Chaque modèle a une valeur par défaut (affichée à titre indicatif) ; laissez vide pour l\'utiliser, ou entrez un nombre pour la remplacer.',

View file

@ -202,6 +202,7 @@ export const hu: Dict = {
'settings.azureDeploymentModelHint':
'Azure OpenAI esetén ez a mező a /openai/deployments/<model> deployment neveként szerepel. Add meg az Azure-ban létrehozott deployment nevét.',
'settings.apiVersion': 'API-verzió',
'settings.byokImageModel': 'Képgenerálási modell',
'settings.maxTokens': 'Max tokenek (opcionális)',
'settings.maxTokensHint':
'A válasz hosszának felső határa. Minden modellnek van hangolt alapértelmezése (placeholderként látható); hagyd üresen az alkalmazásához, vagy adj meg számot a felülíráshoz.',

View file

@ -202,6 +202,7 @@ export const id: Dict = {
'settings.azureDeploymentModelHint':
'Untuk Azure OpenAI, field ini digunakan sebagai nama deployment di /openai/deployments/<model>. Masukkan nama deployment yang kamu buat di Azure.',
'settings.apiVersion': 'Versi API',
'settings.byokImageModel': 'Model pembuatan gambar',
'settings.maxTokens': 'Token maks (opsional)',
'settings.maxTokensHint':
'Batas panjang respons. Setiap model punya default sendiri; kosongkan untuk memakainya, atau isi angka untuk menimpa.',

View file

@ -199,6 +199,7 @@ export const it: Dict = {
'settings.azureDeploymentModelHint':
'Per Azure OpenAI, questo campo viene utilizzato come nome del deployment in /openai/deployments/<model>. Inserisci il nome del deployment creato in Azure.',
'settings.apiVersion': 'Versione API',
'settings.byokImageModel': 'Modello di generazione immagini',
'settings.maxTokens': 'Token massimi (opzionale)',
'settings.maxTokensHint':
'Limite della lunghezza della risposta. Ogni modello ha un valore predefinito (mostrato nel placeholder); lascia vuoto per usarlo, o inserisci un numero per sostituirlo.',

View file

@ -202,6 +202,7 @@ export const ja: Dict = {
'settings.azureDeploymentModelHint':
'Azure OpenAI では、このフィールドが /openai/deployments/<model> のデプロイ名として使われます。Azure で作成したデプロイ名を入力してください。',
'settings.apiVersion': 'API バージョン',
'settings.byokImageModel': '画像生成モデル',
'settings.maxTokens': '最大トークン(任意)',
'settings.maxTokensHint':
'応答長の上限。各モデルにチューニング済みのデフォルト値があります(プレースホルダーに表示)。空のままにすればそれを使用し、数値を入力すれば上書きされます。',

View file

@ -205,6 +205,7 @@ export const ko: Dict = {
'settings.azureDeploymentModelHint':
'Azure OpenAI에서는 이 필드가 /openai/deployments/<model>의 배포 이름으로 사용됩니다. Azure에서 만든 배포 이름을 입력하세요.',
'settings.apiVersion': 'API 버전',
'settings.byokImageModel': '이미지 생성 모델',
'settings.apiHint': '요청은 로컬 daemon 프록시를 통해 설정한 Base URL로 전송됩니다. 키는 이 브라우저에만 저장되며 제공자 요청과 함께 전송됩니다.',
'settings.skipForNow': '지금은 건너뛰기',
'settings.getStarted': '시작하기',

View file

@ -202,6 +202,7 @@ export const pl: Dict = {
'settings.azureDeploymentModelHint':
'Dla Azure OpenAI to pole jest używane jako nazwa wdrożenia w /openai/deployments/<model>. Wpisz nazwę wdrożenia utworzonego w Azure.',
'settings.apiVersion': 'Wersja API',
'settings.byokImageModel': 'Model generowania obrazów',
'settings.maxTokens': 'Maks. liczba tokenów (opcjonalnie)',
'settings.maxTokensHint':
'Limit długości odpowiedzi. Każdy model ma dostrojony domyślny limit (widoczny jako placeholder); pozostaw puste, aby go użyć, lub wpisz liczbę.',

View file

@ -202,6 +202,7 @@ export const ptBR: Dict = {
'settings.azureDeploymentModelHint':
'No Azure OpenAI, este campo e usado como nome do deployment em /openai/deployments/<model>. Informe o nome do deployment criado no Azure.',
'settings.apiVersion': 'Versão da API',
'settings.byokImageModel': 'Modelo de geração de imagens',
'settings.maxTokens': 'Tokens máx. (opcional)',
'settings.maxTokensHint':
'Limite para o comprimento da resposta. Cada modelo tem um valor padrão ajustado (visível no placeholder); deixe em branco para usá-lo ou insira um número para substituí-lo.',

View file

@ -202,6 +202,7 @@ export const ru: Dict = {
'settings.azureDeploymentModelHint':
'Для Azure OpenAI это поле используется как имя развертывания в /openai/deployments/<model>. Укажите имя развертывания, созданного в Azure.',
'settings.apiVersion': 'Версия API',
'settings.byokImageModel': 'Модель генерации изображений',
'settings.maxTokens': 'Макс. токенов (опционально)',
'settings.maxTokensHint':
'Ограничение длины ответа. У каждой модели свой настроенный дефолт (виден в плейсхолдере); оставьте поле пустым, чтобы использовать его, или введите число, чтобы переопределить.',

View file

@ -198,6 +198,7 @@ export const th: Dict = {
'settings.azureDeploymentModel': 'ชื่อ Deployment',
'settings.azureDeploymentModelHint': 'สำหรับ Azure OpenAI ฟิลด์นี้ใช้เป็นชื่อ Deployment ใน /openai/deployments/<model> ป้อนชื่อ Deployment ที่คุณสร้างใน Azure',
'settings.apiVersion': 'เวอร์ชัน API',
'settings.byokImageModel': 'โมเดลสร้างภาพ',
'settings.maxTokens': 'Max tokens (เลือกได้)',
'settings.maxTokensHint': 'ขีดจำกัดความยาวในการตอบกลับ',
'settings.apiHint': 'คำสั่งจะถูกส่งผ่าน local daemon proxy ไปยัง base URL ที่คุณตั้งไว้ API Key จะถูกเก็บในเบราว์เซอร์นี้เท่านั้น',

View file

@ -202,6 +202,7 @@ export const tr: Dict = {
'settings.azureDeploymentModelHint':
'Azure OpenAI icin bu alan /openai/deployments/<model> icindeki dagitim adi olarak kullanilir. Azureda olusturdugunuz dagitim adini girin.',
'settings.apiVersion': 'API sürümü',
'settings.byokImageModel': 'Görüntü oluşturma modeli',
'settings.maxTokens': 'Maks. token (isteğe bağlı)',
'settings.maxTokensHint':
'Yanıt uzunluğu sınırı. Her modelin ayarlanmış bir varsayılanı vardır (yer tutucuda görünür); kullanmak için boş bırakın, üzerine yazmak için bir sayı girin.',

View file

@ -203,6 +203,7 @@ export const uk: Dict = {
'settings.azureDeploymentModelHint':
'Для Azure OpenAI це поле використовується як назва розгортання в /openai/deployments/<model>. Введіть назву розгортання, створену в Azure.',
'settings.apiVersion': 'Версія API',
'settings.byokImageModel': 'Модель генерації зображень',
'settings.maxTokens': 'Макс. токенів (необов\'язково)',
'settings.maxTokensHint':
'Обмеження на довжину відповіді. Кожна модель має налаштовану за замовчуванням (показано в заповнювачі); залиште поле порожнім, щоб використовувати її, або введіть число, щоб переопрацювати.',

View file

@ -227,6 +227,7 @@ export const zhCN: Dict = {
'settings.azureModelFetchHint':
'对于 Azure OpenAI请填写你在 Azure 中创建的部署名称。当前 BYOK 端点无法自动发现 deployment。',
'settings.apiVersion': 'API 版本',
'settings.byokImageModel': '图片生成模型',
'settings.maxTokens': '最大 tokens可选',
'settings.maxTokensHint':
'响应长度上限。每个 model 有调优过的默认值(在 placeholder 里显示),留空即使用,输入数字则覆盖。',

View file

@ -201,6 +201,7 @@ export const zhTW: Dict = {
'settings.azureDeploymentModelHint':
'對於 Azure OpenAI此欄位會作為 /openai/deployments/<model> 中的部署名稱使用。請填入你在 Azure 中建立的部署名稱。',
'settings.apiVersion': 'API 版本',
'settings.byokImageModel': '圖片生成模型',
'settings.maxTokens': '最大 tokens可選',
'settings.maxTokensHint':
'回應長度上限。每個 model 有調過的預設值(在 placeholder 顯示),留空即使用,輸入數字則覆蓋。',

View file

@ -252,6 +252,7 @@ export interface Dict {
'settings.azureDeploymentModelHint': string;
'settings.azureModelFetchHint': string;
'settings.apiVersion': string;
'settings.byokImageModel': string;
'settings.apiHint': string;
'settings.skipForNow': string;
'settings.getStarted': string;

View file

@ -234,7 +234,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
{
id: 'senseaudio',
label: 'SenseAudio',
hint: 'TTS · 70+ system voices · clone',
hint: '',
integrated: true,
defaultBaseUrl: 'https://api.senseaudio.cn',
docsUrl: 'https://docs.senseaudio.cn',
@ -344,6 +344,29 @@ export const IMAGE_MODELS: MediaModel[] = [
caps: ['i2i'],
},
// SenseAudio — synchronous /v1/image/sync, Bearer auth, reference URL or data URI.
{
id: 'senseaudio-image-2.0-260319',
label: 'senseaudio-image-2.0',
hint: 'SenseAudio · multi-aspect, latest',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
{
id: 'senseaudio-image-1.0-260319',
label: 'senseaudio-image-1.0',
hint: 'SenseAudio · standard',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
{
id: 'doubao-seedream-5-0-260128',
label: 'seedream-5.0',
hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
// xAI Grok Imagine — text-to-image (1k/2k, 11+ aspect ratios).
{
id: 'grok-imagine-image',

View file

@ -11,10 +11,12 @@ import Anthropic from '@anthropic-ai/sdk';
import { effectiveMaxTokens } from '../state/maxTokens';
import type { AppConfig, ChatMessage } from '../types';
import { streamMessageAnthropicProxy } from './anthropic-compatible';
import type { ProxyContext } from './api-proxy';
import { streamMessageAzure } from './azure-compatible';
import { streamMessageGoogle } from './google-compatible';
import { streamMessageOllama } from './ollama-compatible';
import { isOpenAICompatible, streamMessageOpenAI } from './openai-compatible';
import { streamMessageSenseAudio } from './senseaudio-compatible';
// Re-export for convenience
export { isOpenAICompatible } from './openai-compatible';
@ -39,6 +41,12 @@ export async function streamMessage(
history: ChatMessage[],
signal: AbortSignal,
handlers: StreamHandlers,
// Only the senseaudio branch reads `context.projectId` today (so the
// daemon-side `generate_image` tool can write into the active
// project's folder). Other branches accept and ignore — keeping the
// signature uniform means the single call site in ProjectView passes
// the same shape regardless of protocol.
context?: ProxyContext,
): Promise<void> {
// Prefer the explicit Settings protocol; keep the legacy heuristic as a
// fallback for configs saved before apiProtocol existed.
@ -51,6 +59,9 @@ export async function streamMessage(
if (cfg.apiProtocol === 'google') {
return streamMessageGoogle(cfg, system, history, signal, handlers);
}
if (cfg.apiProtocol === 'senseaudio') {
return streamMessageSenseAudio(cfg, system, history, signal, handlers, context);
}
if (cfg.apiProtocol === 'openai' || (!cfg.apiProtocol && isOpenAICompatible(cfg.model, cfg.baseUrl))) {
return streamMessageOpenAI(cfg, system, history, signal, handlers);
}

View file

@ -3,6 +3,22 @@ import type { AppConfig, ChatMessage } from '../types';
import type { StreamHandlers } from './anthropic';
import { parseSseFrame } from './sse';
/**
* Optional per-request context that some protocols thread into the
* proxy body. Today only the senseaudio proxy reads these fields:
* - `projectId` lets the `generate_image` tool write into the active
* project's folder instead of a daemon-global cache.
* - `byokImageModel` is the user's BYOK Settings default for the
* image tool. The LLM can still override per-call via the tool's
* `model` arg; this is just the fallback when it omits one.
* Other protocols ignore unknown body fields, so callers are free to
* pass this for every protocol.
*/
export interface ProxyContext {
projectId?: string;
byokImageModel?: string;
}
export async function streamProxyEndpoint(
endpoint: string,
cfg: AppConfig,
@ -10,6 +26,7 @@ export async function streamProxyEndpoint(
history: ChatMessage[],
signal: AbortSignal,
handlers: StreamHandlers,
context?: ProxyContext,
): Promise<void> {
if (!cfg.apiKey) {
handlers.onError(new Error('Missing API key — open Settings and paste one in.'));
@ -30,6 +47,10 @@ export async function streamProxyEndpoint(
messages: history.map((m) => ({ role: m.role, content: m.content })),
maxTokens: effectiveMaxTokens(cfg),
apiVersion: cfg.apiVersion,
...(context?.projectId ? { projectId: context.projectId } : {}),
...(context?.byokImageModel
? { byokImageModel: context.byokImageModel }
: {}),
}),
signal,
});

View file

@ -0,0 +1,33 @@
/**
* SenseAudio chat completions provider. Wire-compatible with OpenAI
* (POST /v1/chat/completions, Bearer auth, SSE delta frames + [DONE]),
* so the only thing that differs from streamMessageOpenAI is the
* daemon proxy endpoint keeping a dedicated client makes the picker
* tab daemon log line upstream call chain readable end-to-end and
* leaves room for SenseAudio-specific divergence in the future.
*
* Routes through the daemon proxy to avoid browser CORS issues.
* BYOK the key stays on the user's machine.
*/
import type { AppConfig, ChatMessage } from '../types';
import type { StreamHandlers } from './anthropic';
import { streamProxyEndpoint, type ProxyContext } from './api-proxy';
export async function streamMessageSenseAudio(
cfg: AppConfig,
system: string,
history: ChatMessage[],
signal: AbortSignal,
handlers: StreamHandlers,
context?: ProxyContext,
): Promise<void> {
return streamProxyEndpoint(
'/api/proxy/senseaudio/stream',
cfg,
system,
history,
signal,
handlers,
context,
);
}

View file

@ -262,6 +262,24 @@ function renderBlock(block: Block, key: number): ReactNode {
return null;
}
// Allowed schemes / forms for image `src` attributes. The BYOK chat
// tool loop emits relative URLs like `/api/byok-image/<id>.png` which
// the web's Next.js rewrites proxy to the daemon — that's the common
// case. data: + blob: cover inline / generated images. http(s):// is
// allowed so a model can reference public images. Anything else
// (javascript:, file:, vbscript:, …) is rejected so a hallucinated
// or adversarial URL cannot exfiltrate or execute.
function isSafeMarkdownImageSrc(src: string): boolean {
if (!src) return false;
if (src.startsWith('/') && !src.startsWith('//')) return true;
return (
src.startsWith('http://')
|| src.startsWith('https://')
|| src.startsWith('data:image/')
|| src.startsWith('blob:')
);
}
// Inline pass: tokenize into runs of `code`, **bold**, *italic*, links,
// and plain text. We walk the string with a regex that matches whichever
// delimiter shows up next; everything between delimiters becomes a text
@ -270,14 +288,19 @@ function renderInline(text: string): ReactNode {
const out: ReactNode[] = [];
// Order matters:
// 1. inline code first so its contents are not re-tokenized as bold/italic.
// 2. explicit `[text](url)` markdown links before bare URL autolink so the
// 2. image syntax `![alt](url)` BEFORE the link branch. Both share
// `[…](…)` and the image is only distinguished by the leading `!`;
// letting the link branch win would render `[alt](url)` as a text
// link with `!` stranded as a sibling text node and the user would
// see the link copy but never the image.
// 3. explicit `[text](url)` markdown links before bare URL autolink so the
// autolink does not greedily swallow the closing paren.
// 3. bare http(s) URL autolink BEFORE italic markers — chat output often
// 4. bare http(s) URL autolink BEFORE italic markers — chat output often
// contains OAuth-style links with `_type=` / `_id=` query params, and
// leaving italic to win turns the URL into an italic-fragmented mess.
// 4. bold (**a** / __a__) before italic (*a* / _a_).
// 5. bold (**a** / __a__) before italic (*a* / _a_).
const re =
/(`[^`]+`)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g;
/(`[^`]+`)|!\[([^\]]*)\]\(([^)\s]+)\)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g;
let lastIndex = 0;
let m: RegExpExecArray | null;
let key = 0;
@ -291,40 +314,61 @@ function renderInline(text: string): ReactNode {
{m[1].slice(1, -1)}
</code>,
);
} else if (m[2] && m[3]) {
} else if (m[3] !== undefined) {
// Image: m[2] = alt (may be empty), m[3] = src
const src = m[3];
const alt = m[2] || '';
if (isSafeMarkdownImageSrc(src)) {
out.push(
<img
key={key++}
className="md-image"
src={src}
alt={alt}
loading="lazy"
referrerPolicy="no-referrer"
style={{ maxWidth: '100%', height: 'auto', borderRadius: 6 }}
/>,
);
} else {
// Unsafe scheme — drop the image tag but keep the alt text so
// the user sees what the model meant to show.
pushText(out, alt, key++);
}
} else if (m[4] && m[5]) {
out.push(
<a
key={key++}
className="md-link"
href={m[3]}
target="_blank"
rel="noreferrer noopener"
>
{m[2]}
</a>,
);
} else if (m[4]) {
// Bare URL — autolink with the URL as both href and visible text,
// matching the Markdown `<https://…>` autolink convention.
out.push(
<a
key={key++}
className="md-link md-link-bare"
href={m[4]}
href={m[5]}
target="_blank"
rel="noreferrer noopener"
>
{m[4]}
</a>,
);
} else if (m[5]) {
out.push(<strong key={key++}>{m[5].slice(2, -2)}</strong>);
} else if (m[6]) {
out.push(<strong key={key++}>{m[6].slice(2, -2)}</strong>);
// Bare URL — autolink with the URL as both href and visible text,
// matching the Markdown `<https://…>` autolink convention.
out.push(
<a
key={key++}
className="md-link md-link-bare"
href={m[6]}
target="_blank"
rel="noreferrer noopener"
>
{m[6]}
</a>,
);
} else if (m[7]) {
out.push(<em key={key++}>{m[7].slice(1, -1)}</em>);
out.push(<strong key={key++}>{m[7].slice(2, -2)}</strong>);
} else if (m[8]) {
out.push(<em key={key++}>{m[8].slice(1, -1)}</em>);
out.push(<strong key={key++}>{m[8].slice(2, -2)}</strong>);
} else if (m[9]) {
out.push(<em key={key++}>{m[9].slice(1, -1)}</em>);
} else if (m[10]) {
out.push(<em key={key++}>{m[10].slice(1, -1)}</em>);
}
lastIndex = re.lastIndex;
}

View file

@ -65,6 +65,22 @@ export const SUGGESTED_MODELS_BY_PROTOCOL: Record<ApiProtocol, readonly string[]
'gemini-1.5-pro',
'gemini-1.5-flash',
],
senseaudio: [
// SenseAudio is an OpenAI-compatible gateway that fronts both its own
// models (senseaudio-s2 family) and aggregator routes to deepseek /
// glm / kimi / minimax. Listing the headline house models first keeps
// the picker's default selection on a SenseAudio-native checkpoint;
// the aggregator IDs trail so users who arrived for a specific
// upstream still find it in this tab without retyping it.
'senseaudio-s2',
'senseaudio-s2-flash',
'deepseek-v4-flash',
'deepseek-v4-pro',
'glm-5.1',
'kimi-k2.6',
'MiniMax-M2.7-highspeed',
'MiniMax-M2.7',
],
ollama: [
'cogito-2.1:671b',
'deepseek-v3.1:671b',
@ -123,6 +139,7 @@ export const FAST_MODEL_BY_PROTOCOL: Record<ApiProtocol, string> = {
// pick produces a deterministic answer; users who care can override
// through the Memory model picker.
ollama: 'gemma3:4b',
senseaudio: 'senseaudio-s2-flash',
};
export const API_PROTOCOL_TABS: ReadonlyArray<{
@ -134,6 +151,7 @@ export const API_PROTOCOL_TABS: ReadonlyArray<{
{ id: 'azure', title: 'Azure OpenAI' },
{ id: 'google', title: 'Google Gemini' },
{ id: 'ollama', title: 'Ollama Cloud' },
{ id: 'senseaudio', title: 'SenseAudio' },
];
export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
@ -142,6 +160,7 @@ export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
azure: 'Azure OpenAI',
google: 'Google Gemini',
ollama: 'Ollama Cloud API',
senseaudio: 'SenseAudio API',
};
export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
@ -150,6 +169,7 @@ export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
azure: 'azure key',
google: 'AIza...',
ollama: 'Ollama API key',
senseaudio: 'SenseAudio API key',
};
// Default base URL the daemon assumes when the user leaves the field
@ -161,4 +181,5 @@ export const DEFAULT_BASE_URL_BY_PROTOCOL: Record<ApiProtocol, string> = {
azure: '',
google: 'https://generativelanguage.googleapis.com',
ollama: 'https://ollama.com',
senseaudio: 'https://api.senseaudio.cn',
};

View file

@ -249,6 +249,22 @@ export const KNOWN_PROVIDERS: KnownProvider[] = [
model: 'mimo-v2.5-pro',
models: ['mimo-v2.5-pro'],
},
{
label: 'SenseAudio',
protocol: 'senseaudio',
baseUrl: 'https://api.senseaudio.cn',
model: 'senseaudio-s2',
models: [
'senseaudio-s2',
'senseaudio-s2-flash',
'deepseek-v4-flash',
'deepseek-v4-pro',
'glm-5.1',
'kimi-k2.6',
'MiniMax-M2.7-highspeed',
'MiniMax-M2.7',
],
},
];
function normalizePet(input: Partial<PetConfig> | undefined): PetConfig {
@ -290,6 +306,10 @@ function inferApiProtocol(model: string, baseUrl: string): ApiProtocol {
// protocol so both chat and the connection test hit the native Ollama
// proxy instead of the Anthropic or OpenAI paths.
if (normalized.includes('ollama.com')) return 'ollama';
// SenseAudio host gets routed to its own proxy so the daemon log line
// and the BYOK tab UI stay consistent with the protocol the user
// picked — even though the on-wire shape is OpenAI-compatible.
if (normalized.includes('senseaudio.cn')) return 'senseaudio';
return isOpenAICompatible(model, baseUrl) ? 'openai' : 'anthropic';
} catch {
// Preserve the rest of the user's settings even if an old saved base URL is

View file

@ -91,7 +91,7 @@ export type {
} from '@open-design/contracts';
export type ExecMode = 'daemon' | 'api';
export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama';
export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
export type LiveArtifactTabId = `live:${string}`;
export type ProjectWorkspaceTabId = string | LiveArtifactTabId;
@ -180,6 +180,13 @@ export interface ApiProtocolConfig {
model: string;
apiVersion?: string;
apiProviderBaseUrl?: string | null;
/** SenseAudio BYOK only default image model the daemon-side
* `generate_image` tool uses when the LLM doesn't pass one. Carries
* one of the SenseAudio image model ids (`senseaudio-image-2.0-260319`,
* `senseaudio-image-1.0-260319`, `doubao-seedream-5-0-260128`). Stored
* per-protocol so flipping between BYOK tabs doesn't reset the
* SenseAudio image-model choice. */
byokImageModel?: string;
}
// Per-CLI model + reasoning the user picked in the model menu. Each agent
@ -294,6 +301,11 @@ export interface AppConfig {
model: string;
apiProtocol?: ApiProtocol;
apiVersion?: string;
/** SenseAudio BYOK only default image model for the daemon-side
* generate_image tool. Mirrors apiProtocolConfigs.senseaudio.byokImageModel
* so the active protocol's value lives at the top level (consistent
* with how apiKey / baseUrl / model are projected onto AppConfig). */
byokImageModel?: string;
apiProtocolConfigs?: Partial<Record<ApiProtocol, ApiProtocolConfig>>;
/** Internal config schema/migration version for localStorage upgrades. */
configMigrationVersion?: number;

View file

@ -6,6 +6,7 @@ const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
azure: 'Azure OpenAI',
google: 'Google Gemini',
ollama: 'Ollama Cloud API',
senseaudio: 'SenseAudio API',
};
const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
@ -14,6 +15,7 @@ const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
azure: 'azure-openai-api',
google: 'google-gemini-api',
ollama: 'ollama-cloud-api',
senseaudio: 'senseaudio-api',
};
export function apiProtocolLabel(protocol: ApiProtocol | undefined): string {

View file

@ -105,4 +105,67 @@ describe('renderMarkdown', () => {
const bodyTd = (out.match(/<tbody>[\s\S]*<\/tbody>/)?.[0] ?? '').match(/<td/g) ?? [];
expect(bodyTd.length).toBe(2);
});
it('renders ![alt](url) as <img> for relative BYOK image URLs', () => {
const out = html('Here is your cat: ![cute kitten](/api/byok-image/abc-123.png)');
expect(out).toContain('<img');
expect(out).toContain('class="md-image"');
expect(out).toContain('src="/api/byok-image/abc-123.png"');
expect(out).toContain('alt="cute kitten"');
expect(out).toContain('loading="lazy"');
expect(out).toContain('referrerPolicy="no-referrer"');
// Image syntax must NOT be turned into an <a> link — `[alt](url)`
// with a leading `!` is image, not link.
expect(out).not.toContain('<a class="md-link"');
});
it('renders ![](url) with empty alt text', () => {
const out = html('![](/api/byok-image/abc.png)');
expect(out).toContain('<img');
expect(out).toContain('alt=""');
});
it('renders https image URLs', () => {
const out = html('![logo](https://example.com/logo.png)');
expect(out).toContain('<img');
expect(out).toContain('src="https://example.com/logo.png"');
});
it('renders data: image URIs', () => {
const out = html('![inline](data:image/png;base64,iVBORw0KGgo=)');
expect(out).toContain('<img');
expect(out).toContain('src="data:image/png;base64,iVBORw0KGgo="');
});
it('drops image tags with unsafe schemes and keeps alt text as plain text', () => {
const out = html('![hacked](javascript:alert(1))');
expect(out).not.toContain('<img');
expect(out).not.toContain('javascript:');
expect(out).toContain('hacked');
});
it('rejects protocol-relative image URLs (could load cross-origin)', () => {
// `//evil.com/track.png` would inherit the page protocol; not in our
// allowlist. Should fall through to alt-as-text.
const out = html('![track](//evil.com/track.png)');
expect(out).not.toContain('<img');
expect(out).toContain('track');
});
it('keeps regular [text](url) links working alongside image syntax', () => {
const out = html('Click [here](https://example.com) and look ![image](/api/byok-image/a.png)');
expect(out).toContain('<a class="md-link"');
expect(out).toContain('href="https://example.com"');
expect(out).toContain('>here</a>');
expect(out).toContain('<img');
expect(out).toContain('src="/api/byok-image/a.png"');
});
it('preserves bold + italic + code after the image regex addition', () => {
const out = html('**b** and *i* and `c` and ![a](/p.png)');
expect(out).toContain('<strong>b</strong>');
expect(out).toContain('<em>i</em>');
expect(out).toContain('<code class="md-inline-code">c</code>');
expect(out).toContain('<img');
});
});

View file

@ -229,7 +229,7 @@ export interface SettingsClickByokProviderOptionProps {
// Tracking doc names azure/google/ollama as azure_openai/google_gemini/
// ollama_cloud — we forward the code value verbatim and let dashboards
// map; see tracking-doc-issues.md §2.5.
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
// True when the clicked chip was already the active protocol (no-op
// toggle); false when the click switches protocol.
is_selected: boolean;
@ -242,10 +242,10 @@ export interface SettingsClickByokFieldProps {
action: 'focus_byok_field';
field_id: 'api_key' | 'base_url' | 'model';
// Code's `apiProtocol` is wider than the CSV's BYOK provider enum
// (anthropic|openai|azure|ollama|google). We forward the code value
// verbatim so dashboards can group by the actual protocol; the CSV enum
// is a strict subset the product team can revise.
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
// (anthropic|openai|azure|ollama|google|senseaudio). We forward the code
// value verbatim so dashboards can group by the actual protocol; the CSV
// enum is a strict subset the product team can revise.
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
has_value: boolean;
}
@ -261,7 +261,7 @@ export interface SettingsCliTestResultProps {
export interface SettingsByokTestResultProps {
page: 'settings';
area: 'execution_model';
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
result: 'success' | 'failed' | 'timeout';
error_code?: string;
duration_ms: number;

View file

@ -139,7 +139,7 @@ export type ConnectionTestKind =
| 'agent_spawn_failed'
| 'unknown';
export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama';
export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
export interface ProviderTestRequest {
protocol: ConnectionTestProtocol;

View file

@ -80,16 +80,19 @@ export interface MemoryListResponse {
/** Provider/protocol the memory extractor calls. Mirrors the chat
* BYOK form's protocols anthropic + openai-compatible + azure
* (openai-compatible at a different URL/header) + google gemini +
* ollama (also openai-compatible, just hosted on Ollama Cloud) so
* the memory picker can offer the same options as the chat picker
* above it. The daemon routes ollama through the same callOpenAI
* path since the wire protocol is identical. */
* ollama (also openai-compatible, just hosted on Ollama Cloud) +
* senseaudio (also openai-compatible, SenseAudio's OpenAI-shaped
* /v1/chat/completions gateway) so the memory picker can offer the
* same options as the chat picker above it. The daemon routes both
* ollama and senseaudio through the same callOpenAI path since the
* wire protocol is identical. */
export type MemoryExtractionProvider =
| 'anthropic'
| 'openai'
| 'azure'
| 'google'
| 'ollama';
| 'ollama'
| 'senseaudio';
/** Masked version of MemoryExtractionConfig returned by GET endpoints
* the api key field is replaced with a 4-char tail so the settings UI