mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools
Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.
- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
(async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
DesignFilesPanel, and project export pick them up automatically;
legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns  image syntax with a scheme
allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
(7), proxy-routes (47), markdown image rendering (8)
* fix(senseaudio): unblock image gen + design file preview switching
- SenseAudio /v1/image/sync rejected the previous size mapping with
`参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
the gateway's accepted set). Switched to standard HD / SD sizes that
every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
so the BYOK chat tool and the CLI agent path both stop failing on
non-square aspects.
- DesignFilesPanel's <DfPreview> was missing a key prop, so React
reused the same iframe DOM node when the user picked a different
file — the src prop changed but the iframe never navigated. Added
key={previewFile.name} so the previous preview unmounts cleanly.
- Updated byok-tools + media-senseaudio-image tests for the new size
expectations.
* docs(senseaudio): clear stale provider hint + update README
- Settings → Media → SenseAudio: clear the auto-promoted
"Image · TTS · 70+ voices · clone" hint; the provider label alone is
enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
the BYOK section reflects what the daemon actually serves, and
mention the generate_image / generate_video chat tools that ship
with the SenseAudio path.
* fix(senseaudio): address PR #2065 review feedback
Three non-blocking review notes from @PerishCode on PR #2065:
1. Drop the dead /api/byok-image/:id route. The PR description claimed
it was "legacy fallback for old chat history" but that storage
layout never existed on main, so the route can only ever 400 or
404 — never 200. Removed the handler, the isSafeByokImageId
export, the unused createReadStream / stat / path / Request /
Response imports, and the two byok-image regression tests.
2. Add rejectProxyPluginContext guard to the senseaudio proxy
handler so it matches the invariant the other five proxy paths
already enforce (plugin runs must go through /api/runs for
snapshot pinning). Extended the existing "API fallback rejects
plugin runs" describe to also cover /api/proxy/senseaudio/stream
with the 409 PLUGIN_REQUIRES_DAEMON expectation.
3. Wrap the secondary image / video downloads (the URLs the
SenseAudio gateway hands back in /v1/image/sync .url and
/v1/video/status .video_url) in validateBaseUrlResolved so a
malicious gateway can't point us at 169.254.169.254 (AWS / Azure
metadata) or RFC1918 hosts via the response payload. Also passed
`redirect: 'error'` on both fetches to match the SSRF posture
the primary proxy fetch already uses. The new
assertExternalAssetUrl helper lives next to executeGenerateImage
so future tool downloads can reuse it.
Tests: 120/120 daemon tests pass; guard + typecheck green.
* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path
Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.
- Hoist assertExternalAssetUrl from byok-tools.ts into
connectionTest.ts next to validateBaseUrlResolved so both call
sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
share one helper. Made the error strings provider-agnostic so a
future caller doesn't get a misleading "senseaudio" attribution
for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
assertExternalAssetUrl before fetching bytes, and passes
redirect: 'error' to block a 3xx hop into private space.
Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.
Tests: 127/127 daemon tests pass; guard + typecheck green.
---------
Co-authored-by: unknown <mazeliang@sensetime.com>
This commit is contained in:
parent
431a5e2d79
commit
210b94069a
52 changed files with 3305 additions and 55 deletions
|
|
@ -63,7 +63,7 @@ OD stands on four open-source shoulders:
|
||||||
| | What you get |
|
| | What you get |
|
||||||
|---|---|
|
|---|---|
|
||||||
| **Coding-agent CLIs (16)** | Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click |
|
| **Coding-agent CLIs (16)** | Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click |
|
||||||
| **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini, and the daemon normalizes SSE back to the same chat stream. Internal-IP/SSRF blocked at the daemon edge. |
|
| **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini / Ollama Cloud / SenseAudio, and the daemon normalizes SSE back to the same chat stream. SenseAudio chat additionally exposes `generate_image` and `generate_video` tools so the model can write rendered artifacts straight into the active project's folder. Internal-IP/SSRF blocked at the daemon edge. |
|
||||||
| **Design systems built-in** | **129** — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from [`awesome-design-md`][acd2], plus 57 design skills from [`awesome-design-skills`][ads] added directly under `design-systems/` |
|
| **Design systems built-in** | **129** — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from [`awesome-design-md`][acd2], plus 57 design skills from [`awesome-design-skills`][ads] added directly under `design-systems/` |
|
||||||
| **Skills built-in** | **31** — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal. |
|
| **Skills built-in** | **31** — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal. |
|
||||||
| **Media generation** | Image · video · audio surfaces ship alongside the design loop. **gpt-image-2** (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · **Seedance 2.0** (ByteDance) for cinematic 15s text-to-video and image-to-video · **HyperFrames** ([heygen-com/hyperframes](https://github.com/heygen-com/hyperframes)) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). **93** ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under [`prompt-templates/`](prompt-templates/), with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace. |
|
| **Media generation** | Image · video · audio surfaces ship alongside the design loop. **gpt-image-2** (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · **Seedance 2.0** (ByteDance) for cinematic 15s text-to-video and image-to-video · **HyperFrames** ([heygen-com/hyperframes](https://github.com/heygen-com/hyperframes)) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). **93** ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under [`prompt-templates/`](prompt-templates/), with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace. |
|
||||||
|
|
@ -304,7 +304,7 @@ Every layer is composable. Every layer is a file you can edit. Read [`apps/daemo
|
||||||
| Frontend | Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable |
|
| Frontend | Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable |
|
||||||
| Daemon | Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates` |
|
| Daemon | Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates` |
|
||||||
| Agent transport | `child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI) |
|
| Agent transport | `child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI) |
|
||||||
| BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge |
|
| BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge |
|
||||||
| Storage | Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir |
|
| Storage | Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir |
|
||||||
| Preview | Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser ([`apps/web/src/artifacts/parser.ts`](apps/web/src/artifacts/parser.ts)) |
|
| Preview | Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser ([`apps/web/src/artifacts/parser.ts`](apps/web/src/artifacts/parser.ts)) |
|
||||||
| Export | HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown |
|
| Export | HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown |
|
||||||
|
|
@ -872,7 +872,7 @@ Pattern is the same as the rest: pick a template, edit the brief, send. The agen
|
||||||
The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:
|
The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:
|
||||||
|
|
||||||
- **Claude Design ZIP import.** Drop an export from claude.ai onto the welcome dialog. `POST /api/import/claude-design` extracts it into a real `.od/projects/<id>/`, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". ([`apps/daemon/src/server.ts`](apps/daemon/src/server.ts) — `/api/import/claude-design`)
|
- **Claude Design ZIP import.** Drop an export from claude.ai onto the welcome dialog. `POST /api/import/claude-design` extracts it into a real `.od/projects/<id>/`, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". ([`apps/daemon/src/server.ts`](apps/daemon/src/server.ts) — `/api/import/claude-design`)
|
||||||
- **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`.
|
- **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`.
|
||||||
- **User-saved templates.** Once you like a render, `POST /api/templates` snapshots the HTML + metadata into the SQLite `templates` table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
|
- **User-saved templates.** Once you like a render, `POST /api/templates` snapshots the HTML + metadata into the SQLite `templates` table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
|
||||||
- **Tab persistence.** Every project remembers its open files and active tab in the `tabs` table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
|
- **Tab persistence.** Every project remembers its open files and active tab in the `tabs` table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
|
||||||
- **Artifact lint API.** `POST /api/artifacts/lint` runs structural checks on a generated artifact (broken `<artifact>` framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
|
- **Artifact lint API.** `POST /api/artifacts/lint` runs structural checks on a generated artifact (broken `<artifact>` framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
|
||||||
|
|
@ -974,7 +974,7 @@ Long-form provenance write-up — what we take from each, what we deliberately d
|
||||||
- [x] Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
|
- [x] Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
|
||||||
- [x] 31 skills + 72 design systems + 5 visual directions + 5 device frames
|
- [x] 31 skills + 72 design systems + 5 visual directions + 5 device frames
|
||||||
- [x] SQLite-backed projects · conversations · messages · tabs · templates
|
- [x] SQLite-backed projects · conversations · messages · tabs · templates
|
||||||
- [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google}/stream`) with SSRF guard
|
- [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream`) with SSRF guard
|
||||||
- [x] Claude Design ZIP import (`/api/import/claude-design`)
|
- [x] Claude Design ZIP import (`/api/import/claude-design`)
|
||||||
- [x] Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
|
- [x] Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
|
||||||
- [x] Artifact lint API + 5-dim self-critique pre-emit gate
|
- [x] Artifact lint API + 5-dim self-critique pre-emit gate
|
||||||
|
|
|
||||||
598
apps/daemon/src/byok-tools.ts
Normal file
598
apps/daemon/src/byok-tools.ts
Normal file
|
|
@ -0,0 +1,598 @@
|
||||||
|
// Tool definitions and executors exposed to BYOK chat sessions.
|
||||||
|
//
|
||||||
|
// Why this file exists: the BYOK chat proxy (e.g. /api/proxy/senseaudio/stream)
|
||||||
|
// is a thin pass-through that doesn't have the agent-runtime scaffolding the
|
||||||
|
// CLI agents (Claude Code / Codex / ...) carry. To let users ask their BYOK
|
||||||
|
// chat to "draw me a cat" and get an actual rendered PNG back, the daemon
|
||||||
|
// injects an OpenAI-shaped `tools` definition into the upstream completion
|
||||||
|
// request, then loops on the model's tool_calls: execute → feed the result
|
||||||
|
// back as a `role: 'tool'` message → re-issue the completion. The chat surface
|
||||||
|
// stays the same; the tool dispatch happens entirely daemon-side.
|
||||||
|
//
|
||||||
|
// Today we ship one tool — `generate_image` — backed by SenseAudio's
|
||||||
|
// /v1/image/sync endpoint, since the BYOK chat session already authenticates
|
||||||
|
// against SenseAudio with the same API key. Additional tools (TTS, video,
|
||||||
|
// research) can be added here as the BYOK surface expands.
|
||||||
|
|
||||||
|
import path from 'node:path';
|
||||||
|
import { writeFile } from 'node:fs/promises';
|
||||||
|
import { randomBytes } from 'node:crypto';
|
||||||
|
import { assertExternalAssetUrl } from './connectionTest.js';
|
||||||
|
import { resolveProviderConfig } from './media-config.js';
|
||||||
|
import { IMAGE_MODELS } from './media-models.js';
|
||||||
|
import { ensureProject } from './projects.js';
|
||||||
|
|
||||||
|
// SenseAudio image model allowlist — derived from the shared media-models
|
||||||
|
// registry so adding a new SenseAudio image model in one place (media-models)
|
||||||
|
// auto-extends the BYOK tool param enum, the Settings dropdown, and the
|
||||||
|
// daemon-side validation. No drift, no hand-maintained constant.
|
||||||
|
export const BYOK_SENSEAUDIO_IMAGE_MODELS: readonly string[] = IMAGE_MODELS
|
||||||
|
.filter((m) => m.provider === 'senseaudio')
|
||||||
|
.map((m) => m.id);
|
||||||
|
|
||||||
|
// Default falls back to the first entry from the registry (today
|
||||||
|
// `senseaudio-image-2.0-260319` — the multi-aspect latest). Kept as a
|
||||||
|
// computed constant so re-ordering the registry rotates the default
|
||||||
|
// without code edits here.
|
||||||
|
export const BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL =
|
||||||
|
BYOK_SENSEAUDIO_IMAGE_MODELS[0] ?? 'senseaudio-image-2.0-260319';
|
||||||
|
|
||||||
|
export function isSenseAudioImageModel(value: unknown): value is string {
|
||||||
|
return typeof value === 'string' && BYOK_SENSEAUDIO_IMAGE_MODELS.includes(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
const SENSEAUDIO_DEFAULT_BASE_URL = 'https://api.senseaudio.cn';
|
||||||
|
const PROMPT_MAX_LENGTH = 2000;
|
||||||
|
|
||||||
|
// SenseAudio video — the API only documents one model today, so the
|
||||||
|
// wire id is a const. The chat tool's `generate_video` param surface
|
||||||
|
// (prompt, aspect_ratio, duration, resolution, generate_audio) covers
|
||||||
|
// every knob the doubao-seedance gateway accepts.
|
||||||
|
const SENSEAUDIO_VIDEO_MODEL = 'doubao-seedance-2-0-260128';
|
||||||
|
const SENSEAUDIO_VIDEO_ASPECT_RATIOS = ['16:9', '9:16', '4:3', '3:4', '1:1'] as const;
|
||||||
|
const SENSEAUDIO_VIDEO_RESOLUTIONS = ['480p', '720p', '1080p'] as const;
|
||||||
|
const SENSEAUDIO_VIDEO_DURATION_MIN = 4;
|
||||||
|
const SENSEAUDIO_VIDEO_DURATION_MAX = 15;
|
||||||
|
const SENSEAUDIO_VIDEO_DURATION_DEFAULT = 5;
|
||||||
|
// Polling: SenseAudio docs recommend 5–10 s intervals; we pick 5 s and
|
||||||
|
// cap total attempts so a stuck job can't pin the chat stream forever.
|
||||||
|
// 120 attempts × 5 s = 10 min ceiling — covers the real-world
|
||||||
|
// doubao-seedance latency range (1080p + audio jobs frequently spend
|
||||||
|
// 3–8 min on the gateway). Below this, the 5-min cap timed out otherwise
|
||||||
|
// valid jobs; above this the chat surface starts feeling stuck.
|
||||||
|
const SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT = 5000;
|
||||||
|
const SENSEAUDIO_VIDEO_MAX_POLLS = 120;
|
||||||
|
// Periodic progress log every N polls so a long-running job emits some
|
||||||
|
// signal to the daemon log — without flooding it with one line per
|
||||||
|
// 5 s. 6 polls = ~30 s between progress lines.
|
||||||
|
const SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY = 6;
|
||||||
|
|
||||||
|
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
|
||||||
|
// `参数错误:size` (verified against logs from a failed call on
|
||||||
|
// 2026-05-16). We stick to common 16-multiple HD / SD sizes that the
|
||||||
|
// gateway is known to accept: 1024×1024 for square, 1280×720 / 720×1280
|
||||||
|
// for widescreen / portrait, 1024×768 / 768×1024 for the 4:3 family.
|
||||||
|
// The table is duplicated in renderSenseAudioImage (media.ts) for the
|
||||||
|
// CLI-agent path so both surfaces stay in sync.
|
||||||
|
const ASPECT_TO_SIZE: Record<string, string> = {
|
||||||
|
'1:1': '1024x1024',
|
||||||
|
'16:9': '1280x720',
|
||||||
|
'9:16': '720x1280',
|
||||||
|
'4:3': '1024x768',
|
||||||
|
'3:4': '768x1024',
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* OpenAI-compatible tool definition for image generation. Injected into
|
||||||
|
* the upstream `tools` array on every /api/proxy/senseaudio/stream
|
||||||
|
* request so the LLM can decide on its own when to call it. The
|
||||||
|
* description deliberately tells the model to embed the returned URL
|
||||||
|
* in markdown — the chat UI already renders markdown images inline,
|
||||||
|
* so no client-side wiring is required for the bytes to show up.
|
||||||
|
*/
|
||||||
|
export const BYOK_SENSEAUDIO_TOOLS = [
|
||||||
|
{
|
||||||
|
type: 'function' as const,
|
||||||
|
function: {
|
||||||
|
name: 'generate_image',
|
||||||
|
description:
|
||||||
|
'Generate an image from a text prompt using SenseAudio image models. Returns a URL pointing to the rendered PNG. After this tool succeeds, embed the URL in your reply with markdown image syntax —  — so the user sees the image inline. Use this whenever the user asks to draw, create, generate, design, or illustrate something visual.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
prompt: {
|
||||||
|
type: 'string',
|
||||||
|
description:
|
||||||
|
'Detailed visual description of the image (Chinese or English are both fine). Include subject, style, lighting, composition. Maximum 2000 characters.',
|
||||||
|
},
|
||||||
|
aspect_ratio: {
|
||||||
|
type: 'string',
|
||||||
|
enum: ['1:1', '16:9', '9:16', '4:3', '3:4'],
|
||||||
|
description:
|
||||||
|
'Output aspect ratio. 1:1 for square avatars and product shots, 16:9 for hero banners, 9:16 for vertical phone posters, 4:3 for editorial covers, 3:4 for posters. Defaults to 1:1 when omitted.',
|
||||||
|
},
|
||||||
|
model: {
|
||||||
|
type: 'string',
|
||||||
|
enum: [...BYOK_SENSEAUDIO_IMAGE_MODELS],
|
||||||
|
description:
|
||||||
|
'Optional model override. Omit this to use the user-configured default from Settings (or the SenseAudio 2.0 multi-aspect model when unset). Choose senseaudio-image-2.0-260319 for multi-aspect generation, senseaudio-image-1.0-260319 for standard sizes, or doubao-seedream-5-0-260128 for high-resolution output through the ByteDance Seedream gateway. The user explicitly picked a default in their Settings — only override when the user asks for a different style/resolution.',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ['prompt'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'function' as const,
|
||||||
|
function: {
|
||||||
|
name: 'generate_video',
|
||||||
|
description:
|
||||||
|
'Generate a short video (4–15 seconds) from a text prompt using SenseAudio\'s ByteDance Seedance gateway. This is an asynchronous call that can take 30 s to a few minutes — the daemon polls the job for you, so the user just sees the chat waiting. After this tool succeeds, embed the returned URL in your reply as a markdown link, e.g. `[▶ Play video](url)`, because the chat\'s markdown renderer does not currently render `<video>` tags inline. Use this whenever the user asks for a video, clip, animation, or motion graphic.',
|
||||||
|
parameters: {
|
||||||
|
type: 'object',
|
||||||
|
properties: {
|
||||||
|
prompt: {
|
||||||
|
type: 'string',
|
||||||
|
description:
|
||||||
|
'Detailed motion description of the video. Include subject, action / camera move / scene transitions, style, lighting. Chinese or English. Maximum 2000 characters.',
|
||||||
|
},
|
||||||
|
aspect_ratio: {
|
||||||
|
type: 'string',
|
||||||
|
enum: [...SENSEAUDIO_VIDEO_ASPECT_RATIOS],
|
||||||
|
description:
|
||||||
|
'Output aspect ratio. 16:9 for cinematic, 9:16 for vertical (phone / TikTok), 1:1 for social square, 4:3 / 3:4 for editorial. Defaults to 16:9.',
|
||||||
|
},
|
||||||
|
duration: {
|
||||||
|
type: 'integer',
|
||||||
|
minimum: SENSEAUDIO_VIDEO_DURATION_MIN,
|
||||||
|
maximum: SENSEAUDIO_VIDEO_DURATION_MAX,
|
||||||
|
description:
|
||||||
|
`Video length in seconds (integer). Allowed range ${SENSEAUDIO_VIDEO_DURATION_MIN}–${SENSEAUDIO_VIDEO_DURATION_MAX}; defaults to ${SENSEAUDIO_VIDEO_DURATION_DEFAULT}. Shorter durations finish faster.`,
|
||||||
|
},
|
||||||
|
resolution: {
|
||||||
|
type: 'string',
|
||||||
|
enum: [...SENSEAUDIO_VIDEO_RESOLUTIONS],
|
||||||
|
description:
|
||||||
|
'Output resolution. 480p (fastest), 720p (default, balanced), 1080p (best quality, slowest). Pick 1080p only when the user explicitly asks for high resolution.',
|
||||||
|
},
|
||||||
|
generate_audio: {
|
||||||
|
type: 'boolean',
|
||||||
|
description:
|
||||||
|
'Whether the model also synthesises an audio track for the clip (background sound, ambience). Defaults to false to keep generation fast; flip to true when the user asks for sound, music, or a "video with audio".',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ['prompt'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Runtime context the BYOK tool executor needs. Passed by the chat
|
||||||
|
* route on every call so the tool layer stays free of global state and
|
||||||
|
* can be unit-tested with a temp directory.
|
||||||
|
*/
|
||||||
|
export interface BYOKToolContext {
|
||||||
|
/** Daemon project root — used to look up media-config when the chat
|
||||||
|
* session key is missing. */
|
||||||
|
projectRoot: string;
|
||||||
|
/** Daemon's PROJECTS_DIR (the `<projectRoot>/.od/projects/` folder
|
||||||
|
* that holds per-project file trees). Generated images land in
|
||||||
|
* `<projectsRoot>/<projectId>/byok-<id>.png` so the project's
|
||||||
|
* FileViewer / DesignFilesPanel discover them automatically and
|
||||||
|
* the file travels with the project on export, archive, rename. */
|
||||||
|
projectsRoot: string;
|
||||||
|
/** Active project id from the chat surface. Required — the BYOK
|
||||||
|
* chat always runs inside a project, so the tool dispatch refuses
|
||||||
|
* to fire without one rather than dump bytes into a global cache.
|
||||||
|
* Validated upstream via `isSafeId`. */
|
||||||
|
projectId: string;
|
||||||
|
/** The BYOK chat session's API key — first credential we try. Bypasses
|
||||||
|
* the media-config indirection so the same key the user just pasted
|
||||||
|
* for chat is the same key the image call uses. */
|
||||||
|
upstreamApiKey: string;
|
||||||
|
/** The BYOK chat session's base URL (may be a custom gateway). Falls
|
||||||
|
* back to api.senseaudio.cn. */
|
||||||
|
upstreamBaseUrl?: string;
|
||||||
|
/** Default image model the user picked in BYOK Settings, used when the
|
||||||
|
* LLM didn't pass `model` in tool args. Validated upstream — anything
|
||||||
|
* outside `BYOK_SENSEAUDIO_IMAGE_MODELS` is dropped so a stale
|
||||||
|
* client-side config can't smuggle an unregistered model id through.
|
||||||
|
* Falls back to `BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL` (the registry's
|
||||||
|
* first SenseAudio image entry) when missing. */
|
||||||
|
defaultImageModel?: string;
|
||||||
|
/** Test-only override for the video polling interval (ms). Production
|
||||||
|
* uses 5 s (SenseAudio's recommendation) — tests pass small values
|
||||||
|
* (e.g. 1 ms) to keep the suite fast without changing the polling
|
||||||
|
* semantics. */
|
||||||
|
videoPollIntervalMs?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ImageToolResult {
|
||||||
|
ok: boolean;
|
||||||
|
/** Daemon-served URL on success. */
|
||||||
|
url?: string;
|
||||||
|
/** Short human-readable failure reason. Stuffed into the `tool` role
|
||||||
|
* reply so the LLM can apologize / retry. */
|
||||||
|
error?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
function sanitizeAspectRatio(raw: unknown): string {
|
||||||
|
if (typeof raw !== 'string') return '1:1';
|
||||||
|
return ASPECT_TO_SIZE[raw] ? raw : '1:1';
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Execute the `generate_image` tool. Calls SenseAudio /v1/image/sync,
|
||||||
|
* downloads the rendered bytes, writes them to <byokImagesDir>/<id>.png,
|
||||||
|
* and returns a daemon-served URL. Pure async — caller is responsible
|
||||||
|
* for emitting any SSE events (e.g. "tool result ready").
|
||||||
|
*
|
||||||
|
* Failure modes return `{ok: false, error}` rather than throwing so the
|
||||||
|
* caller can feed the message back to the LLM as a tool_result; that
|
||||||
|
* lets the model apologize / suggest a retry instead of the chat
|
||||||
|
* silently stopping.
|
||||||
|
*/
|
||||||
|
export async function executeGenerateImage(
|
||||||
|
args: { prompt?: unknown; aspect_ratio?: unknown; model?: unknown },
|
||||||
|
ctx: BYOKToolContext,
|
||||||
|
): Promise<ImageToolResult> {
|
||||||
|
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
|
||||||
|
if (!promptRaw) return { ok: false, error: 'prompt is required' };
|
||||||
|
const prompt =
|
||||||
|
promptRaw.length > PROMPT_MAX_LENGTH
|
||||||
|
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
|
||||||
|
: promptRaw;
|
||||||
|
|
||||||
|
const aspect = sanitizeAspectRatio(args.aspect_ratio);
|
||||||
|
const size = ASPECT_TO_SIZE[aspect];
|
||||||
|
|
||||||
|
// Model resolution order — LLM args > user's Settings default > registry
|
||||||
|
// default. The allowlist guards every step so a hallucinated or stale id
|
||||||
|
// can never reach the senseaudio /v1/image/sync wire — the catalogue is
|
||||||
|
// the source of truth.
|
||||||
|
const senseAudioImageModel = isSenseAudioImageModel(args.model)
|
||||||
|
? args.model
|
||||||
|
: isSenseAudioImageModel(ctx.defaultImageModel)
|
||||||
|
? ctx.defaultImageModel
|
||||||
|
: BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL;
|
||||||
|
|
||||||
|
// Resolve the project folder up front. ensureProject runs
|
||||||
|
// `isSafeId` internally, so an attacker who somehow bypassed the
|
||||||
|
// chat-routes guard and slipped `../escape` into projectId fails
|
||||||
|
// here before we make any upstream call. The returned `dir` is
|
||||||
|
// reused at writeFile time below.
|
||||||
|
let dir: string;
|
||||||
|
try {
|
||||||
|
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `invalid projectId for image storage: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prefer the BYOK session's key (what the user is actively using).
|
||||||
|
// Fall back to media-config (env var > stored) so a user who set
|
||||||
|
// OD_SENSEAUDIO_API_KEY but forgot to fill the chat panel still
|
||||||
|
// gets a working tool call.
|
||||||
|
let apiKey = ctx.upstreamApiKey;
|
||||||
|
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
|
||||||
|
if (!apiKey) {
|
||||||
|
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
|
||||||
|
apiKey = resolved.apiKey || '';
|
||||||
|
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
|
||||||
|
}
|
||||||
|
if (!apiKey) {
|
||||||
|
return { ok: false, error: 'no SenseAudio API key available' };
|
||||||
|
}
|
||||||
|
|
||||||
|
const trimmedBase = baseUrl.replace(/\/+$/, '');
|
||||||
|
let imageUrl: string;
|
||||||
|
try {
|
||||||
|
const resp = await fetch(`${trimmedBase}/v1/image/sync`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
authorization: `Bearer ${apiKey}`,
|
||||||
|
'content-type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: senseAudioImageModel,
|
||||||
|
prompt,
|
||||||
|
size,
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
if (!resp.ok) {
|
||||||
|
const text = await resp.text().catch(() => '');
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio image ${resp.status}: ${text.slice(0, 240)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const data = (await resp.json()) as {
|
||||||
|
url?: string;
|
||||||
|
error_message?: string;
|
||||||
|
base_resp?: { status_code?: number; status_msg?: string };
|
||||||
|
};
|
||||||
|
if (data?.base_resp && data.base_resp.status_code !== 0) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (typeof data?.error_message === 'string' && data.error_message) {
|
||||||
|
return { ok: false, error: `senseaudio image: ${data.error_message}` };
|
||||||
|
}
|
||||||
|
if (typeof data?.url !== 'string' || !data.url) {
|
||||||
|
return { ok: false, error: 'senseaudio image response missing url' };
|
||||||
|
}
|
||||||
|
imageUrl = data.url;
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: err instanceof Error ? err.message : String(err),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const imageUrlCheck = await assertExternalAssetUrl(imageUrl);
|
||||||
|
if (!imageUrlCheck.ok) return { ok: false, error: imageUrlCheck.error };
|
||||||
|
|
||||||
|
let bytes: Buffer;
|
||||||
|
try {
|
||||||
|
const imgResp = await fetch(imageUrl, { redirect: 'error' });
|
||||||
|
if (!imgResp.ok) {
|
||||||
|
return { ok: false, error: `image download ${imgResp.status}` };
|
||||||
|
}
|
||||||
|
bytes = Buffer.from(await imgResp.arrayBuffer());
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `image download failed: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (bytes.length === 0) {
|
||||||
|
return { ok: false, error: 'image download returned zero bytes' };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Persist into the active project's folder. `dir` was resolved up
|
||||||
|
// front via ensureProject — no DB write, no metadata side-effects —
|
||||||
|
// and the resulting path slots straight into the existing project
|
||||||
|
// file plumbing: listFiles enumerates it for the FileViewer,
|
||||||
|
// readProjectFile serves it via GET /api/projects/<id>/files/<filename>,
|
||||||
|
// and project archive / export pick it up automatically because it
|
||||||
|
// lives under the project's own directory.
|
||||||
|
//
|
||||||
|
// Filename pattern `byok-<timestamp>-<random>.png` keeps tool
|
||||||
|
// outputs distinguishable from user uploads at a glance while
|
||||||
|
// staying url-safe.
|
||||||
|
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
|
||||||
|
const filename = `byok-${id}.png`;
|
||||||
|
await writeFile(path.join(dir, filename), bytes);
|
||||||
|
|
||||||
|
// Return a relative URL through the project file serving route. The
|
||||||
|
// web's Next.js rewrites `/api/:path*` to the daemon (see
|
||||||
|
// apps/web/next.config.ts), so the chat UI loads the image
|
||||||
|
// same-origin — satisfying the strict CSP (`img-src 'self' data:
|
||||||
|
// blob:`) without any CORS plumbing.
|
||||||
|
return {
|
||||||
|
ok: true,
|
||||||
|
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function sanitizeVideoAspectRatio(raw: unknown): (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number] {
|
||||||
|
if (typeof raw !== 'string') return '16:9';
|
||||||
|
return (SENSEAUDIO_VIDEO_ASPECT_RATIOS as readonly string[]).includes(raw)
|
||||||
|
? (raw as (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number])
|
||||||
|
: '16:9';
|
||||||
|
}
|
||||||
|
|
||||||
|
function sanitizeVideoResolution(raw: unknown): (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number] {
|
||||||
|
if (typeof raw !== 'string') return '720p';
|
||||||
|
return (SENSEAUDIO_VIDEO_RESOLUTIONS as readonly string[]).includes(raw)
|
||||||
|
? (raw as (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number])
|
||||||
|
: '720p';
|
||||||
|
}
|
||||||
|
|
||||||
|
function sanitizeVideoDuration(raw: unknown): number {
|
||||||
|
if (typeof raw !== 'number' || !Number.isFinite(raw)) return SENSEAUDIO_VIDEO_DURATION_DEFAULT;
|
||||||
|
const rounded = Math.round(raw);
|
||||||
|
if (rounded < SENSEAUDIO_VIDEO_DURATION_MIN) return SENSEAUDIO_VIDEO_DURATION_MIN;
|
||||||
|
if (rounded > SENSEAUDIO_VIDEO_DURATION_MAX) return SENSEAUDIO_VIDEO_DURATION_MAX;
|
||||||
|
return rounded;
|
||||||
|
}
|
||||||
|
|
||||||
|
const sleep = (ms: number): Promise<void> =>
|
||||||
|
new Promise((resolve) => setTimeout(resolve, ms));
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Execute the `generate_video` tool. SenseAudio's video API is
|
||||||
|
* asynchronous-only: POST /v1/video/create returns a task_id, then
|
||||||
|
* GET /v1/video/status?id=<task_id> reports `pending` / `processing`
|
||||||
|
* → `completed` (with `video_url`) or `failed` (with `error_message`).
|
||||||
|
* We poll every `videoPollIntervalMs` (default 5 s) and bail after
|
||||||
|
* `SENSEAUDIO_VIDEO_MAX_POLLS` so a stuck upstream can't pin the
|
||||||
|
* chat stream forever.
|
||||||
|
*
|
||||||
|
* The chat tool waits for the whole loop, so the daemon's outbound
|
||||||
|
* SSE response from /api/proxy/senseaudio/stream stays open for the
|
||||||
|
* duration. That's intentional — the next chat turn cannot begin
|
||||||
|
* until we have a URL to feed back into the tool_result.
|
||||||
|
*/
|
||||||
|
export async function executeGenerateVideo(
|
||||||
|
args: {
|
||||||
|
prompt?: unknown;
|
||||||
|
aspect_ratio?: unknown;
|
||||||
|
duration?: unknown;
|
||||||
|
resolution?: unknown;
|
||||||
|
generate_audio?: unknown;
|
||||||
|
},
|
||||||
|
ctx: BYOKToolContext,
|
||||||
|
): Promise<ImageToolResult> {
|
||||||
|
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
|
||||||
|
if (!promptRaw) return { ok: false, error: 'prompt is required' };
|
||||||
|
const prompt =
|
||||||
|
promptRaw.length > PROMPT_MAX_LENGTH
|
||||||
|
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
|
||||||
|
: promptRaw;
|
||||||
|
|
||||||
|
const ratio = sanitizeVideoAspectRatio(args.aspect_ratio);
|
||||||
|
const resolution = sanitizeVideoResolution(args.resolution);
|
||||||
|
const duration = sanitizeVideoDuration(args.duration);
|
||||||
|
const generateAudio = args.generate_audio === true;
|
||||||
|
|
||||||
|
let dir: string;
|
||||||
|
try {
|
||||||
|
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `invalid projectId for video storage: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
let apiKey = ctx.upstreamApiKey;
|
||||||
|
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
|
||||||
|
if (!apiKey) {
|
||||||
|
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
|
||||||
|
apiKey = resolved.apiKey || '';
|
||||||
|
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
|
||||||
|
}
|
||||||
|
if (!apiKey) {
|
||||||
|
return { ok: false, error: 'no SenseAudio API key available' };
|
||||||
|
}
|
||||||
|
const trimmedBase = baseUrl.replace(/\/+$/, '');
|
||||||
|
|
||||||
|
// Step 1: POST /v1/video/create → task_id.
|
||||||
|
let taskId: string;
|
||||||
|
try {
|
||||||
|
const resp = await fetch(`${trimmedBase}/v1/video/create`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
authorization: `Bearer ${apiKey}`,
|
||||||
|
'content-type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: SENSEAUDIO_VIDEO_MODEL,
|
||||||
|
content: [{ type: 'text', text: prompt }],
|
||||||
|
duration,
|
||||||
|
resolution,
|
||||||
|
ratio,
|
||||||
|
provider_specific: { generate_audio: generateAudio },
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
if (!resp.ok) {
|
||||||
|
const text = await resp.text().catch(() => '');
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio video create ${resp.status}: ${text.slice(0, 240)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const data = (await resp.json()) as { task_id?: string };
|
||||||
|
if (typeof data?.task_id !== 'string' || !data.task_id) {
|
||||||
|
return { ok: false, error: 'senseaudio video create response missing task_id' };
|
||||||
|
}
|
||||||
|
taskId = data.task_id;
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: err instanceof Error ? err.message : String(err),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 2: poll /v1/video/status until completed / failed / timeout.
|
||||||
|
const pollIntervalMs = ctx.videoPollIntervalMs ?? SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT;
|
||||||
|
let videoUrl = '';
|
||||||
|
for (let attempt = 0; attempt < SENSEAUDIO_VIDEO_MAX_POLLS; attempt++) {
|
||||||
|
await sleep(pollIntervalMs);
|
||||||
|
let statusResp: Response;
|
||||||
|
try {
|
||||||
|
statusResp = await fetch(
|
||||||
|
`${trimmedBase}/v1/video/status?id=${encodeURIComponent(taskId)}`,
|
||||||
|
{
|
||||||
|
method: 'GET',
|
||||||
|
headers: { authorization: `Bearer ${apiKey}` },
|
||||||
|
},
|
||||||
|
);
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio video poll failed: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (!statusResp.ok) {
|
||||||
|
const text = await statusResp.text().catch(() => '');
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio video status ${statusResp.status}: ${text.slice(0, 240)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const data = (await statusResp.json()) as {
|
||||||
|
status?: string;
|
||||||
|
progress?: number;
|
||||||
|
video_url?: string;
|
||||||
|
error_message?: string;
|
||||||
|
};
|
||||||
|
if (data?.status === 'completed') {
|
||||||
|
if (typeof data.video_url !== 'string' || !data.video_url) {
|
||||||
|
return { ok: false, error: 'senseaudio video status completed but missing video_url' };
|
||||||
|
}
|
||||||
|
videoUrl = data.video_url;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if (data?.status === 'failed') {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio video failed: ${data.error_message || 'unknown reason'}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
// pending / processing — continue polling. Emit a periodic log line
|
||||||
|
// so a stuck job surfaces in the daemon log instead of silently
|
||||||
|
// burning attempts.
|
||||||
|
if ((attempt + 1) % SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY === 0) {
|
||||||
|
const pct = typeof data.progress === 'number' ? data.progress : '?';
|
||||||
|
console.log(
|
||||||
|
`[proxy:senseaudio] generate_video poll ${attempt + 1}/${SENSEAUDIO_VIDEO_MAX_POLLS} task=${taskId} status=${data.status ?? 'unknown'} progress=${pct}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!videoUrl) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `senseaudio video timed out after ${SENSEAUDIO_VIDEO_MAX_POLLS} polls`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 3: download the mp4 bytes and persist into the project folder.
|
||||||
|
// Re-validate the returned URL through validateBaseUrlResolved so a
|
||||||
|
// malicious gateway can't point us at 169.254.169.254 (AWS / Azure
|
||||||
|
// metadata service) or RFC1918 hosts via the response payload.
|
||||||
|
const videoUrlCheck = await assertExternalAssetUrl(videoUrl);
|
||||||
|
if (!videoUrlCheck.ok) return { ok: false, error: videoUrlCheck.error };
|
||||||
|
|
||||||
|
let bytes: Buffer;
|
||||||
|
try {
|
||||||
|
const videoResp = await fetch(videoUrl, { redirect: 'error' });
|
||||||
|
if (!videoResp.ok) {
|
||||||
|
return { ok: false, error: `video download ${videoResp.status}` };
|
||||||
|
}
|
||||||
|
bytes = Buffer.from(await videoResp.arrayBuffer());
|
||||||
|
} catch (err) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `video download failed: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (bytes.length === 0) {
|
||||||
|
return { ok: false, error: 'video download returned zero bytes' };
|
||||||
|
}
|
||||||
|
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
|
||||||
|
const filename = `byok-video-${id}.mp4`;
|
||||||
|
await writeFile(path.join(dir, filename), bytes);
|
||||||
|
|
||||||
|
return {
|
||||||
|
ok: true,
|
||||||
|
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
|
@ -1,13 +1,22 @@
|
||||||
import type { Express } from 'express';
|
import type { Express } from 'express';
|
||||||
import type { RouteDeps } from './server-context.js';
|
import type { RouteDeps } from './server-context.js';
|
||||||
import { newInsertId } from './analytics.js';
|
import { newInsertId } from './analytics.js';
|
||||||
|
import { seedProviderIfMissing } from './media-config.js';
|
||||||
|
import {
|
||||||
|
BYOK_SENSEAUDIO_TOOLS,
|
||||||
|
executeGenerateImage,
|
||||||
|
executeGenerateVideo,
|
||||||
|
isSenseAudioImageModel,
|
||||||
|
type BYOKToolContext,
|
||||||
|
} from './byok-tools.js';
|
||||||
|
import { isSafeId as isSafeProjectId } from './projects.js';
|
||||||
import {
|
import {
|
||||||
agentIdToTracking,
|
agentIdToTracking,
|
||||||
projectKindToTracking,
|
projectKindToTracking,
|
||||||
} from '@open-design/contracts/analytics';
|
} from '@open-design/contracts/analytics';
|
||||||
import { validateBaseUrlResolved } from './connectionTest.js';
|
import { validateBaseUrlResolved } from './connectionTest.js';
|
||||||
|
|
||||||
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle'> {}
|
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle' | 'paths'> {}
|
||||||
|
|
||||||
// Invariant: a chat assistant message row reflects its run's terminal state
|
// Invariant: a chat assistant message row reflects its run's terminal state
|
||||||
// even when the web client never persists the cancel/finish itself (refresh
|
// even when the web client never persists the cancel/finish itself (refresh
|
||||||
|
|
@ -310,13 +319,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
|
||||||
const protocol = body.protocol;
|
const protocol = body.protocol;
|
||||||
if (
|
if (
|
||||||
typeof protocol !== 'string' ||
|
typeof protocol !== 'string' ||
|
||||||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol)
|
!['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
|
||||||
) {
|
) {
|
||||||
return sendApiError(
|
return sendApiError(
|
||||||
res,
|
res,
|
||||||
400,
|
400,
|
||||||
'BAD_REQUEST',
|
'BAD_REQUEST',
|
||||||
'protocol must be one of anthropic|openai|azure|google|ollama',
|
'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
if (
|
if (
|
||||||
|
|
@ -371,13 +380,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
|
||||||
const protocol = body.protocol;
|
const protocol = body.protocol;
|
||||||
if (
|
if (
|
||||||
typeof protocol !== 'string' ||
|
typeof protocol !== 'string' ||
|
||||||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol)
|
!['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
|
||||||
) {
|
) {
|
||||||
return sendApiError(
|
return sendApiError(
|
||||||
res,
|
res,
|
||||||
400,
|
400,
|
||||||
'BAD_REQUEST',
|
'BAD_REQUEST',
|
||||||
'protocol must be one of anthropic|openai|azure|google|ollama',
|
'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
if (
|
if (
|
||||||
|
|
@ -1172,4 +1181,354 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// SenseAudio chat completions. Wire-compatible with OpenAI (POST
|
||||||
|
// /v1/chat/completions, Bearer auth, SSE `data: {...}` + `data: [DONE]`)
|
||||||
|
// plus a daemon-side tool loop: the handler injects an OpenAI
|
||||||
|
// `tools` array on every upstream request and, when the model
|
||||||
|
// responds with a `tool_calls` finish_reason, executes the call
|
||||||
|
// locally, appends the assistant + tool messages to the conversation,
|
||||||
|
// and re-issues the completion. This is how BYOK chat — which has
|
||||||
|
// no agent-runtime scaffolding — gets image-generation parity with
|
||||||
|
// the CLI agent path. Loop is bounded by MAX_BYOK_TOOL_LOOPS so a
|
||||||
|
// misbehaving model can't pin the daemon in an infinite tool dance.
|
||||||
|
const MAX_BYOK_TOOL_LOOPS = 3;
|
||||||
|
|
||||||
|
type AccumulatedToolCall = { id: string; name: string; arguments: string };
|
||||||
|
type TurnResult =
|
||||||
|
| { kind: 'text_end' }
|
||||||
|
| { kind: 'error' }
|
||||||
|
| {
|
||||||
|
kind: 'tool_calls';
|
||||||
|
assistantMessage: any;
|
||||||
|
toolCalls: Array<{ id: string; type: 'function'; function: { name: string; arguments: string } }>;
|
||||||
|
};
|
||||||
|
|
||||||
|
app.post('/api/proxy/senseaudio/stream', async (req, res) => {
|
||||||
|
const proxyBody = req.body || {};
|
||||||
|
if (rejectProxyPluginContext(proxyBody, res)) return;
|
||||||
|
const {
|
||||||
|
baseUrl,
|
||||||
|
apiKey,
|
||||||
|
model,
|
||||||
|
systemPrompt,
|
||||||
|
messages,
|
||||||
|
maxTokens,
|
||||||
|
projectId,
|
||||||
|
byokImageModel,
|
||||||
|
} = proxyBody;
|
||||||
|
if (!apiKey || !model) {
|
||||||
|
return sendApiError(
|
||||||
|
res,
|
||||||
|
400,
|
||||||
|
'BAD_REQUEST',
|
||||||
|
'apiKey and model are required',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// projectId is required because the BYOK generate_image tool writes
|
||||||
|
// into the active project's folder; without one we'd have to fall
|
||||||
|
// back to a daemon-global cache that orphans the file. The web
|
||||||
|
// client always passes project.id from ProjectView, so a missing
|
||||||
|
// value means the request did not come through the chat surface.
|
||||||
|
if (typeof projectId !== 'string' || !isSafeProjectId(projectId)) {
|
||||||
|
return sendApiError(
|
||||||
|
res,
|
||||||
|
400,
|
||||||
|
'BAD_REQUEST',
|
||||||
|
'projectId is required and must be a safe identifier',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const effectiveBaseUrl = baseUrl || 'https://api.senseaudio.cn';
|
||||||
|
const validated = await validateExternalApiBaseUrl(effectiveBaseUrl);
|
||||||
|
if (validated.error) {
|
||||||
|
return sendApiError(
|
||||||
|
res,
|
||||||
|
validated.forbidden ? 403 : 400,
|
||||||
|
validated.forbidden ? 'FORBIDDEN' : 'BAD_REQUEST',
|
||||||
|
validated.error,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const url = appendVersionedApiPath(effectiveBaseUrl, '/chat/completions');
|
||||||
|
console.log(
|
||||||
|
`[proxy:senseaudio] ${req.method} ${validated.parsed?.hostname ?? '?'} model=${model} project=${projectId}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
const workingMessages: any[] = Array.isArray(messages) ? [...messages] : [];
|
||||||
|
if (typeof systemPrompt === 'string' && systemPrompt) {
|
||||||
|
workingMessages.unshift({ role: 'system', content: systemPrompt });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tool execution context — built once per request. The image tool
|
||||||
|
// writes into `<projectsRoot>/<projectId>/byok-<id>.png` and returns
|
||||||
|
// a relative URL via `/api/projects/:id/files/:filename`. The web's
|
||||||
|
// Next.js rewrites `/api/:path*` to the daemon, so the chat UI
|
||||||
|
// loads images same-origin through the standard project file
|
||||||
|
// route — no CSP / CORS exceptions needed.
|
||||||
|
// User-configured BYOK default image model. Drop silently if the
|
||||||
|
// client sent an id outside the SenseAudio registry — the tool
|
||||||
|
// will fall back to the registry default and the LLM can still
|
||||||
|
// override per-call via the tool's `model` arg.
|
||||||
|
const validDefaultImageModel = isSenseAudioImageModel(byokImageModel)
|
||||||
|
? byokImageModel
|
||||||
|
: undefined;
|
||||||
|
|
||||||
|
const toolCtx: BYOKToolContext = {
|
||||||
|
projectRoot: ctx.paths.PROJECT_ROOT,
|
||||||
|
projectsRoot: ctx.paths.PROJECTS_DIR,
|
||||||
|
projectId,
|
||||||
|
upstreamApiKey: apiKey,
|
||||||
|
upstreamBaseUrl: effectiveBaseUrl,
|
||||||
|
// Spread-conditional because tsconfig's exactOptionalPropertyTypes
|
||||||
|
// forbids `field: undefined` on an optional slot. The byok-tools
|
||||||
|
// executor reads `ctx.defaultImageModel` with `isSenseAudioImageModel`
|
||||||
|
// anyway, so a missing key and an undefined value behave the same.
|
||||||
|
...(validDefaultImageModel
|
||||||
|
? { defaultImageModel: validDefaultImageModel }
|
||||||
|
: {}),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Run one round-trip: POST to upstream, stream text deltas to the
|
||||||
|
// client as they arrive, accumulate any tool_call deltas. Returns
|
||||||
|
// a typed result describing what to do next (loop on tool calls,
|
||||||
|
// close the stream, or bail on error). Closures capture all the
|
||||||
|
// SSE helpers from registerChatRoutes.
|
||||||
|
const runSenseAudioTurn = async (
|
||||||
|
sse: any,
|
||||||
|
messagesForTurn: any[],
|
||||||
|
): Promise<TurnResult> => {
|
||||||
|
const payload: any = {
|
||||||
|
model,
|
||||||
|
messages: messagesForTurn,
|
||||||
|
max_tokens:
|
||||||
|
typeof maxTokens === 'number' && maxTokens > 0 ? maxTokens : 8192,
|
||||||
|
stream: true,
|
||||||
|
tools: BYOK_SENSEAUDIO_TOOLS,
|
||||||
|
tool_choice: 'auto',
|
||||||
|
};
|
||||||
|
const response = await fetch(url, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
Authorization: `Bearer ${apiKey}`,
|
||||||
|
},
|
||||||
|
body: JSON.stringify(payload),
|
||||||
|
redirect: 'error',
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
const errorText = await response.text();
|
||||||
|
console.error(
|
||||||
|
`[proxy:senseaudio] upstream error: ${response.status} ${redactAuthTokens(errorText)}`,
|
||||||
|
);
|
||||||
|
sendProxyError(sse, `Upstream error: ${response.status}`, {
|
||||||
|
code: proxyErrorCode(response.status),
|
||||||
|
details: errorText,
|
||||||
|
retryable: response.status === 429 || response.status >= 500,
|
||||||
|
});
|
||||||
|
return { kind: 'error' };
|
||||||
|
}
|
||||||
|
|
||||||
|
const accum: Record<number, AccumulatedToolCall> = {};
|
||||||
|
let finishReason = '';
|
||||||
|
let providerError = '';
|
||||||
|
|
||||||
|
await streamUpstreamSse(response, ({ payload, data }: any) => {
|
||||||
|
if (payload === '[DONE]') return true;
|
||||||
|
if (!data) return false;
|
||||||
|
|
||||||
|
const streamErr = extractStreamErrorMessage(data);
|
||||||
|
if (streamErr) {
|
||||||
|
providerError = streamErr;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
const choices = (data as any).choices;
|
||||||
|
if (!Array.isArray(choices) || choices.length === 0) return false;
|
||||||
|
const choice = choices[0] || {};
|
||||||
|
const delta = choice.delta || {};
|
||||||
|
|
||||||
|
// Text content streams to the client unchanged. Tool turns and
|
||||||
|
// text turns can both share this path — the OpenAI protocol
|
||||||
|
// never emits text+tool_calls in the same chunk, but it can
|
||||||
|
// emit text before / after a tool_call in the same turn, and
|
||||||
|
// we want the user to see whatever the model decided to say.
|
||||||
|
if (typeof delta.content === 'string' && delta.content) {
|
||||||
|
sse.send('delta', { delta: delta.content });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tool call deltas stream as fragments — `id` arrives once at
|
||||||
|
// the start, `function.name` once at the start, and
|
||||||
|
// `function.arguments` accumulates a chunked JSON string we
|
||||||
|
// have to concatenate. Parallel calls use the `index` field to
|
||||||
|
// distinguish slots. Default to 0 when omitted (older models).
|
||||||
|
if (Array.isArray(delta.tool_calls)) {
|
||||||
|
for (const tc of delta.tool_calls) {
|
||||||
|
const idx = typeof tc?.index === 'number' ? tc.index : 0;
|
||||||
|
if (!accum[idx]) {
|
||||||
|
accum[idx] = { id: '', name: '', arguments: '' };
|
||||||
|
}
|
||||||
|
const slot = accum[idx];
|
||||||
|
if (typeof tc.id === 'string' && tc.id) slot.id = tc.id;
|
||||||
|
if (typeof tc.function?.name === 'string' && tc.function.name) {
|
||||||
|
slot.name = tc.function.name;
|
||||||
|
}
|
||||||
|
if (typeof tc.function?.arguments === 'string') {
|
||||||
|
slot.arguments += tc.function.arguments;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof choice.finish_reason === 'string' && choice.finish_reason) {
|
||||||
|
finishReason = choice.finish_reason;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
});
|
||||||
|
|
||||||
|
if (providerError) {
|
||||||
|
sendProxyError(sse, `Provider error: ${providerError}`, {
|
||||||
|
details: providerError,
|
||||||
|
});
|
||||||
|
return { kind: 'error' };
|
||||||
|
}
|
||||||
|
|
||||||
|
if (finishReason === 'tool_calls' && Object.keys(accum).length > 0) {
|
||||||
|
const indices = Object.keys(accum)
|
||||||
|
.map(Number)
|
||||||
|
.sort((a, b) => a - b);
|
||||||
|
const toolCalls = indices.map((i) => ({
|
||||||
|
id: accum[i]!.id || `call_${i}`,
|
||||||
|
type: 'function' as const,
|
||||||
|
function: {
|
||||||
|
name: accum[i]!.name,
|
||||||
|
arguments: accum[i]!.arguments,
|
||||||
|
},
|
||||||
|
}));
|
||||||
|
return {
|
||||||
|
kind: 'tool_calls',
|
||||||
|
assistantMessage: {
|
||||||
|
role: 'assistant',
|
||||||
|
content: null,
|
||||||
|
tool_calls: toolCalls,
|
||||||
|
},
|
||||||
|
toolCalls,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
return { kind: 'text_end' };
|
||||||
|
};
|
||||||
|
|
||||||
|
const executeOneTool = async (call: {
|
||||||
|
id: string;
|
||||||
|
function: { name: string; arguments: string };
|
||||||
|
}): Promise<{ ok: boolean; url?: string; error?: string; kind?: 'image' | 'video' }> => {
|
||||||
|
const fnName = call?.function?.name ?? '';
|
||||||
|
if (fnName !== 'generate_image' && fnName !== 'generate_video') {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: `unknown tool: ${fnName || 'unnamed'}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
let args: any = {};
|
||||||
|
try {
|
||||||
|
args = JSON.parse(call.function.arguments || '{}');
|
||||||
|
} catch {
|
||||||
|
return { ok: false, error: 'tool arguments were not valid JSON' };
|
||||||
|
}
|
||||||
|
if (fnName === 'generate_image') {
|
||||||
|
const result = await executeGenerateImage(args, toolCtx);
|
||||||
|
return { ...result, kind: 'image' };
|
||||||
|
}
|
||||||
|
// generate_video — longer (up to 5 min), async-with-polling.
|
||||||
|
const result = await executeGenerateVideo(args, toolCtx);
|
||||||
|
return { ...result, kind: 'video' };
|
||||||
|
};
|
||||||
|
|
||||||
|
const sse = createSseResponse(res);
|
||||||
|
sse.send('start', { model });
|
||||||
|
|
||||||
|
// SenseAudio's gateway issues one API key that works for both
|
||||||
|
// /v1/chat/completions and the image / TTS surfaces. Mirror the
|
||||||
|
// BYOK key into media-config so the CLI agent path (`od media
|
||||||
|
// generate`) picks it up automatically — fire-and-forget; the
|
||||||
|
// chat stream must not block on the disk write. seedProviderIfMissing
|
||||||
|
// is idempotent and preserves env-var-resolved keys.
|
||||||
|
seedProviderIfMissing(ctx.paths.PROJECT_ROOT, 'senseaudio', {
|
||||||
|
apiKey,
|
||||||
|
baseUrl: effectiveBaseUrl,
|
||||||
|
})
|
||||||
|
.then((seeded) => {
|
||||||
|
if (seeded) {
|
||||||
|
console.log(
|
||||||
|
'[proxy:senseaudio] seeded media-config.senseaudio from BYOK key',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.catch((err: unknown) => {
|
||||||
|
console.warn(
|
||||||
|
`[proxy:senseaudio] seed media-config failed: ${
|
||||||
|
err instanceof Error ? err.message : String(err)
|
||||||
|
}`,
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
try {
|
||||||
|
for (let loop = 0; loop < MAX_BYOK_TOOL_LOOPS; loop++) {
|
||||||
|
const turn = await runSenseAudioTurn(sse, workingMessages);
|
||||||
|
if (turn.kind === 'error') return sse.end();
|
||||||
|
if (turn.kind === 'text_end') {
|
||||||
|
sse.send('end', {});
|
||||||
|
return sse.end();
|
||||||
|
}
|
||||||
|
// turn.kind === 'tool_calls'
|
||||||
|
workingMessages.push(turn.assistantMessage);
|
||||||
|
for (const call of turn.toolCalls) {
|
||||||
|
const result = await executeOneTool(call);
|
||||||
|
// The tool result is delivered to the model as a `tool` role
|
||||||
|
// message — a structured payload the model can interpret. We
|
||||||
|
// also surface a daemon-side log line so a user reporting "no
|
||||||
|
// image showed up" can grep for the call id. The kind field
|
||||||
|
// distinguishes image vs video so the daemon picks the right
|
||||||
|
// embedding hint for the model (markdown image syntax for
|
||||||
|
// PNG, markdown link for MP4 since the chat renderer doesn't
|
||||||
|
// currently render <video> tags).
|
||||||
|
const toolName = call?.function?.name ?? 'unknown';
|
||||||
|
if (result.ok) {
|
||||||
|
console.log(
|
||||||
|
`[proxy:senseaudio] ${toolName} OK: ${call.id} → ${result.url}`,
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
console.warn(
|
||||||
|
`[proxy:senseaudio] ${toolName} FAILED: ${call.id} — ${result.error}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const content = result.ok
|
||||||
|
? result.kind === 'video'
|
||||||
|
? `Video generated successfully. URL: ${result.url}. Reply to the user with a clickable markdown link, e.g. [▶ Play video](${result.url}). Do NOT use markdown image syntax — the chat renderer does not embed <video> tags.`
|
||||||
|
: `Image generated successfully. URL: ${result.url}. Reply to the user with: `
|
||||||
|
: result.kind === 'video'
|
||||||
|
? `Video generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt or a shorter duration.`
|
||||||
|
: `Image generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt.`;
|
||||||
|
workingMessages.push({
|
||||||
|
role: 'tool',
|
||||||
|
tool_call_id: call.id,
|
||||||
|
content,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Tool loop exhausted — the model still wants to call tools but we
|
||||||
|
// refuse a 4th round. Close the stream gracefully; the last text
|
||||||
|
// delta the model emitted (if any) is already on the wire.
|
||||||
|
console.warn(
|
||||||
|
'[proxy:senseaudio] tool loop bounded at MAX_BYOK_TOOL_LOOPS=3',
|
||||||
|
);
|
||||||
|
sse.send('end', {});
|
||||||
|
return sse.end();
|
||||||
|
} catch (err: any) {
|
||||||
|
console.error(`[proxy:senseaudio] internal error: ${err.message}`);
|
||||||
|
sendProxyError(sse, err.message, { code: 'INTERNAL_ERROR' });
|
||||||
|
sse.end();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -119,6 +119,41 @@ export async function validateBaseUrlResolved(
|
||||||
return sync;
|
return sync;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* SSRF guard for asset URLs handed back inside a successful API
|
||||||
|
* response — typically a `data.url` or `data.video_url` that points
|
||||||
|
* at the gateway's CDN, but is attacker-controllable when the
|
||||||
|
* upstream gateway is compromised or misconfigured. Routes the URL
|
||||||
|
* through `validateBaseUrlResolved` (DNS-resolve → reject loopback,
|
||||||
|
* RFC1918, link-local, CGNAT, metadata-service IPs) and returns a
|
||||||
|
* discriminated union so callers don't have to repeat the
|
||||||
|
* `validated.error || !validated.parsed` plumbing.
|
||||||
|
*
|
||||||
|
* Two callers today:
|
||||||
|
* - `byok-tools.ts` for the chat-tool image/video downloads
|
||||||
|
* - `media.ts` `renderSenseAudioImage` for the CLI agent path
|
||||||
|
* Both hand the URL straight to `fetch(...)` next, so pair this
|
||||||
|
* guard with `redirect: 'error'` on the fetch to also block a
|
||||||
|
* 3xx hop into private space.
|
||||||
|
*/
|
||||||
|
export async function assertExternalAssetUrl(
|
||||||
|
rawUrl: string,
|
||||||
|
): Promise<{ ok: true } | { ok: false; error: string }> {
|
||||||
|
if (typeof rawUrl !== 'string' || !rawUrl) {
|
||||||
|
return { ok: false, error: 'empty download url' };
|
||||||
|
}
|
||||||
|
const validated = await validateBaseUrlResolved(rawUrl);
|
||||||
|
if (validated.error || !validated.parsed) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
error: validated.forbidden
|
||||||
|
? `blocked download url (${validated.error ?? 'internal address'})`
|
||||||
|
: `invalid download url: ${validated.error ?? 'unknown reason'}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return { ok: true };
|
||||||
|
}
|
||||||
|
|
||||||
// Aggressive but not punitive — happy paths usually return in under 2 s.
|
// Aggressive but not punitive — happy paths usually return in under 2 s.
|
||||||
// Override with OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS for slow networks
|
// Override with OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS for slow networks
|
||||||
// or distant providers; invalid values fall back to the default.
|
// or distant providers; invalid values fall back to the default.
|
||||||
|
|
@ -315,10 +350,10 @@ function inspectProviderCompletion(
|
||||||
const obj = data && typeof data === 'object' ? data as Record<string, unknown> : null;
|
const obj = data && typeof data === 'object' ? data as Record<string, unknown> : null;
|
||||||
if (!obj) return { valid: false };
|
if (!obj) return { valid: false };
|
||||||
|
|
||||||
if (protocol === 'openai' || protocol === 'azure') {
|
if (protocol === 'openai' || protocol === 'azure' || protocol === 'senseaudio') {
|
||||||
const responseModel = typeof obj.model === 'string' ? obj.model : '';
|
const responseModel = typeof obj.model === 'string' ? obj.model : '';
|
||||||
if (
|
if (
|
||||||
protocol === 'openai' &&
|
(protocol === 'openai' || protocol === 'senseaudio') &&
|
||||||
enforceResponseModel &&
|
enforceResponseModel &&
|
||||||
responseModel &&
|
responseModel &&
|
||||||
requestedModel &&
|
requestedModel &&
|
||||||
|
|
@ -518,6 +553,12 @@ function buildProviderCall(input: ProviderTestRequest): ProviderCallShape {
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
case 'openai':
|
case 'openai':
|
||||||
|
case 'senseaudio':
|
||||||
|
// SenseAudio is wire-compatible with OpenAI (POST /v1/chat/completions,
|
||||||
|
// Bearer auth, identical body + response shape), so the connection
|
||||||
|
// smoke test reuses the same call shape. We default the base URL
|
||||||
|
// upstream-side in chat-routes; this layer assumes the caller passed
|
||||||
|
// a concrete URL via the BYOK form.
|
||||||
return {
|
return {
|
||||||
url: appendVersionedApiPath(baseUrl, '/chat/completions'),
|
url: appendVersionedApiPath(baseUrl, '/chat/completions'),
|
||||||
headers: {
|
headers: {
|
||||||
|
|
|
||||||
|
|
@ -521,3 +521,53 @@ export async function writeConfig(projectRoot: string, body: unknown) {
|
||||||
await writeStored(projectRoot, next);
|
await writeStored(projectRoot, next);
|
||||||
return readMaskedConfig(projectRoot);
|
return readMaskedConfig(projectRoot);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Idempotent "seed if empty" write for a single provider slot. The chat
|
||||||
|
* proxy uses this to mirror a BYOK key into media-config so the agent's
|
||||||
|
* image / TTS path picks up the same credential without the user having
|
||||||
|
* to paste it twice. Strict rules:
|
||||||
|
* * No-op when an apiKey is ALREADY stored for `providerId` (the user
|
||||||
|
* may have configured Media independently and we never overwrite).
|
||||||
|
* * No-op when an env-var key resolves for `providerId` (env wins
|
||||||
|
* regardless of disk state — seeding would be invisible).
|
||||||
|
* * No-op when the incoming `apiKey` is empty (we only seed values
|
||||||
|
* the chat layer has just verified upstream).
|
||||||
|
* * Otherwise merge `{ [providerId]: entry }` into the existing
|
||||||
|
* provider map and persist. All other provider slots and aliases
|
||||||
|
* are preserved byte-for-byte.
|
||||||
|
*
|
||||||
|
* Returns `true` when a write happened (caller can log), `false` when
|
||||||
|
* the call was a no-op. Errors are surfaced — the caller decides
|
||||||
|
* whether to swallow them (fire-and-forget) or propagate.
|
||||||
|
*/
|
||||||
|
export async function seedProviderIfMissing(
|
||||||
|
projectRoot: string,
|
||||||
|
providerId: string,
|
||||||
|
entry: { apiKey?: string; baseUrl?: string; model?: string },
|
||||||
|
): Promise<boolean> {
|
||||||
|
if (!PROVIDER_IDS.includes(providerId)) return false;
|
||||||
|
const apiKey = entry.apiKey?.trim() ?? '';
|
||||||
|
if (!apiKey) return false;
|
||||||
|
// Env var wins at resolution time, so seeding when env is set would
|
||||||
|
// be invisible to the user. Skip to avoid confusing on-disk state.
|
||||||
|
if (readEnvKey(providerId)) return false;
|
||||||
|
|
||||||
|
const prior = await readStored(projectRoot);
|
||||||
|
const priorApiKey =
|
||||||
|
typeof prior[providerId]?.apiKey === 'string' && prior[providerId].apiKey.trim()
|
||||||
|
? prior[providerId].apiKey.trim()
|
||||||
|
: '';
|
||||||
|
if (priorApiKey) return false;
|
||||||
|
|
||||||
|
const baseUrl = entry.baseUrl?.trim() ?? '';
|
||||||
|
const model = entry.model?.trim() ?? '';
|
||||||
|
const next: ProviderMap = { ...prior };
|
||||||
|
next[providerId] = {
|
||||||
|
apiKey,
|
||||||
|
...(baseUrl ? { baseUrl } : {}),
|
||||||
|
...(model ? { model } : {}),
|
||||||
|
};
|
||||||
|
await writeStored(projectRoot, next);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -60,7 +60,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
|
||||||
{
|
{
|
||||||
id: 'senseaudio',
|
id: 'senseaudio',
|
||||||
label: 'SenseAudio',
|
label: 'SenseAudio',
|
||||||
hint: 'TTS · 70+ system voices · clone',
|
hint: '',
|
||||||
integrated: true,
|
integrated: true,
|
||||||
defaultBaseUrl: 'https://api.senseaudio.cn',
|
defaultBaseUrl: 'https://api.senseaudio.cn',
|
||||||
docsUrl: 'https://docs.senseaudio.cn',
|
docsUrl: 'https://docs.senseaudio.cn',
|
||||||
|
|
@ -80,6 +80,10 @@ export const IMAGE_MODELS: MediaModel[] = [
|
||||||
{ id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] },
|
{ id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] },
|
||||||
{ id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] },
|
{ id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] },
|
||||||
|
|
||||||
|
{ id: 'senseaudio-image-2.0-260319', label: 'senseaudio-image-2.0', hint: 'SenseAudio · multi-aspect, latest', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
|
||||||
|
{ id: 'senseaudio-image-1.0-260319', label: 'senseaudio-image-1.0', hint: 'SenseAudio · standard', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
|
||||||
|
{ id: 'doubao-seedream-5-0-260128', label: 'seedream-5.0', hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
|
||||||
|
|
||||||
{ id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] },
|
{ id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] },
|
||||||
|
|
||||||
{ id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] },
|
{ id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] },
|
||||||
|
|
|
||||||
|
|
@ -57,6 +57,7 @@ import {
|
||||||
findProvider,
|
findProvider,
|
||||||
modelsForSurface,
|
modelsForSurface,
|
||||||
} from './media-models.js';
|
} from './media-models.js';
|
||||||
|
import { assertExternalAssetUrl } from './connectionTest.js';
|
||||||
import { resolveModelAlias, resolveProviderConfig } from './media-config.js';
|
import { resolveModelAlias, resolveProviderConfig } from './media-config.js';
|
||||||
import {
|
import {
|
||||||
ensureProject,
|
ensureProject,
|
||||||
|
|
@ -559,6 +560,11 @@ export async function generateMedia(args: {
|
||||||
bytes = result.bytes;
|
bytes = result.bytes;
|
||||||
providerNote = result.providerNote;
|
providerNote = result.providerNote;
|
||||||
suggestedExt = result.suggestedExt;
|
suggestedExt = result.suggestedExt;
|
||||||
|
} else if (def.provider === 'senseaudio' && surface === 'image') {
|
||||||
|
const result = await renderSenseAudioImage(ctx, credentials);
|
||||||
|
bytes = result.bytes;
|
||||||
|
providerNote = result.providerNote;
|
||||||
|
suggestedExt = result.suggestedExt;
|
||||||
} else if (def.provider === 'fishaudio' && surface === 'audio') {
|
} else if (def.provider === 'fishaudio' && surface === 'audio') {
|
||||||
const result = await renderFishAudioTTS(ctx, credentials);
|
const result = await renderFishAudioTTS(ctx, credentials);
|
||||||
bytes = result.bytes;
|
bytes = result.bytes;
|
||||||
|
|
@ -2243,6 +2249,131 @@ async function renderSenseAudioTTS(ctx: MediaContext, credentials: ProviderConfi
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Provider: SenseAudio image — POST /v1/image/sync (synchronous text-to-image).
|
||||||
|
//
|
||||||
|
// Docs: https://docs.senseaudio.cn/guides/image/overview
|
||||||
|
// * Models: senseaudio-image-2.0-260319 (multi-aspect), senseaudio-image-1.0-260319
|
||||||
|
// (standard), doubao-seedream-5-0-260128 (hi-res). The wire `model` field
|
||||||
|
// accepts the catalog id directly so no alias map is needed.
|
||||||
|
// * Body: { model, prompt (≤2000 chars), size (WxH, required when no
|
||||||
|
// reference), reference (URL or data URI, optional), seed (optional int) }.
|
||||||
|
// * Response: { url: string } pointing at the rendered PNG; we fetch it
|
||||||
|
// once to materialise bytes the dispatcher can write to disk.
|
||||||
|
// * Auth: Authorization: Bearer <API_KEY>; shares the senseaudio provider
|
||||||
|
// slot with the TTS path (OD_SENSEAUDIO_API_KEY / SENSEAUDIO_API_KEY).
|
||||||
|
// We default to the /sync endpoint because the chat runtime already streams
|
||||||
|
// progress and a single round-trip keeps the dispatcher contract identical
|
||||||
|
// to OpenAI / Volcengine image. Switching to /v1/image/async + GET
|
||||||
|
// /v1/image/pending is a future option if the upstream model latency
|
||||||
|
// outgrows the daemon's request timeout.
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
const SENSEAUDIO_IMAGE_PROMPT_LIMIT = 2000;
|
||||||
|
|
||||||
|
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
|
||||||
|
// `参数错误:size`. Keep this table in sync with byok-tools.ts's
|
||||||
|
// ASPECT_TO_SIZE — both paths hit the same /v1/image/sync endpoint.
|
||||||
|
function senseAudioImageSize(aspect?: string): string {
|
||||||
|
if (aspect === '16:9') return '1280x720';
|
||||||
|
if (aspect === '9:16') return '720x1280';
|
||||||
|
if (aspect === '4:3') return '1024x768';
|
||||||
|
if (aspect === '3:4') return '768x1024';
|
||||||
|
return '1024x1024';
|
||||||
|
}
|
||||||
|
|
||||||
|
async function renderSenseAudioImage(ctx: MediaContext, credentials: ProviderConfig): Promise<RenderResult> {
|
||||||
|
if (!credentials.apiKey) {
|
||||||
|
throw new Error(
|
||||||
|
'no SenseAudio API key — configure it in Settings or set OD_SENSEAUDIO_API_KEY',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const baseUrl = (credentials.baseUrl || SENSEAUDIO_DEFAULT_BASE_URL).replace(
|
||||||
|
/\/$/,
|
||||||
|
'',
|
||||||
|
);
|
||||||
|
const promptRaw = (ctx.prompt && ctx.prompt.trim()) || 'A high-quality reference image.';
|
||||||
|
// SenseAudio rejects >2000-char prompts with a 4xx; trim defensively so a
|
||||||
|
// verbose agent plan doesn't dead-end the generation. The truncated tail
|
||||||
|
// surfaces in providerNote so the user sees what was actually sent.
|
||||||
|
const prompt =
|
||||||
|
promptRaw.length > SENSEAUDIO_IMAGE_PROMPT_LIMIT
|
||||||
|
? promptRaw.slice(0, SENSEAUDIO_IMAGE_PROMPT_LIMIT)
|
||||||
|
: promptRaw;
|
||||||
|
const size = senseAudioImageSize(ctx.aspect);
|
||||||
|
const reference = ctx.imageRef?.dataUrl;
|
||||||
|
|
||||||
|
const body: Record<string, unknown> = {
|
||||||
|
model: ctx.wireModel,
|
||||||
|
prompt,
|
||||||
|
size,
|
||||||
|
};
|
||||||
|
if (reference) {
|
||||||
|
// When a reference image is supplied the API documents `size` as
|
||||||
|
// optional; we still send it so the output dimensions stay
|
||||||
|
// deterministic across t2i / i2i runs of the same project.
|
||||||
|
body.reference = reference;
|
||||||
|
}
|
||||||
|
|
||||||
|
const resp = await fetch(`${baseUrl}/v1/image/sync`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
authorization: `Bearer ${credentials.apiKey}`,
|
||||||
|
'content-type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify(body),
|
||||||
|
});
|
||||||
|
const respText = await resp.text();
|
||||||
|
if (!resp.ok) {
|
||||||
|
throw new Error(`senseaudio image ${resp.status}: ${truncate(respText, 240)}`);
|
||||||
|
}
|
||||||
|
let data: any;
|
||||||
|
try {
|
||||||
|
data = JSON.parse(respText);
|
||||||
|
} catch {
|
||||||
|
throw new Error(`senseaudio image non-JSON: ${truncate(respText, 200)}`);
|
||||||
|
}
|
||||||
|
// Mirror the TTS base_resp envelope check: HTTP 200 can still encode an
|
||||||
|
// upstream logical failure. The image API uses the same shape on the
|
||||||
|
// failure path documented for /v1/image/pending (status=failed +
|
||||||
|
// error_message), so surface either source verbatim.
|
||||||
|
if (data?.base_resp && data.base_resp.status_code !== 0) {
|
||||||
|
throw new Error(
|
||||||
|
`senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (typeof data?.error_message === 'string' && data.error_message) {
|
||||||
|
throw new Error(`senseaudio image api error: ${data.error_message}`);
|
||||||
|
}
|
||||||
|
const url = typeof data?.url === 'string' ? data.url : '';
|
||||||
|
if (!url) {
|
||||||
|
throw new Error('senseaudio image response missing url');
|
||||||
|
}
|
||||||
|
// Mirror the chat-tool SSRF guard (byok-tools.ts): the gateway-returned
|
||||||
|
// `url` is attacker-controllable inside a successful response, so DNS-
|
||||||
|
// resolve it through validateBaseUrlResolved and refuse loopback /
|
||||||
|
// RFC1918 / metadata-service hosts. Pair with `redirect: 'error'` so a
|
||||||
|
// 3xx hop into private space is also blocked.
|
||||||
|
const urlCheck = await assertExternalAssetUrl(url);
|
||||||
|
if (!urlCheck.ok) {
|
||||||
|
throw new Error(`senseaudio image ${urlCheck.error}`);
|
||||||
|
}
|
||||||
|
const imgResp = await fetch(url, { redirect: 'error' });
|
||||||
|
if (!imgResp.ok) {
|
||||||
|
throw new Error(`senseaudio image fetch ${imgResp.status}`);
|
||||||
|
}
|
||||||
|
const bytes = Buffer.from(await imgResp.arrayBuffer());
|
||||||
|
if (bytes.length === 0) {
|
||||||
|
throw new Error('senseaudio image fetch returned zero bytes');
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
bytes,
|
||||||
|
providerNote: `senseaudio/${ctx.wireModel} · ${size}${reference ? ' · i2i' : ''} · ${bytes.length} bytes`,
|
||||||
|
suggestedExt: '.png',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Provider: FishAudio — Speech-1.x family text-to-speech (synchronous).
|
// Provider: FishAudio — Speech-1.x family text-to-speech (synchronous).
|
||||||
//
|
//
|
||||||
|
|
|
||||||
|
|
@ -142,6 +142,15 @@ const PROVIDER_DEFAULTS = {
|
||||||
model: 'gemma3:4b',
|
model: 'gemma3:4b',
|
||||||
baseUrl: 'https://ollama.com',
|
baseUrl: 'https://ollama.com',
|
||||||
},
|
},
|
||||||
|
// SenseAudio's chat API is OpenAI-compatible (POST /v1/chat/completions,
|
||||||
|
// Bearer auth), so the extractor falls through to callOpenAI with this
|
||||||
|
// base URL and the user's SenseAudio API key. The default model is the
|
||||||
|
// small/fast variant so auto-pick stays cheap; users can swap in
|
||||||
|
// senseaudio-s2 or any gateway model via the picker.
|
||||||
|
senseaudio: {
|
||||||
|
model: 'senseaudio-s2-flash',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
// Map an explicit override provider to the env var the daemon should
|
// Map an explicit override provider to the env var the daemon should
|
||||||
|
|
@ -169,6 +178,13 @@ function envKeyFor(provider) {
|
||||||
if (provider === 'ollama') {
|
if (provider === 'ollama') {
|
||||||
return process.env.OLLAMA_API_KEY?.trim() || '';
|
return process.env.OLLAMA_API_KEY?.trim() || '';
|
||||||
}
|
}
|
||||||
|
if (provider === 'senseaudio') {
|
||||||
|
return (
|
||||||
|
process.env.OD_SENSEAUDIO_API_KEY?.trim()
|
||||||
|
|| process.env.SENSEAUDIO_API_KEY?.trim()
|
||||||
|
|| ''
|
||||||
|
);
|
||||||
|
}
|
||||||
return '';
|
return '';
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -149,7 +149,9 @@ function extractGoogleModels(data: unknown): ProviderModelOption[] {
|
||||||
}
|
}
|
||||||
|
|
||||||
function providerModelsUrl(protocol: ConnectionTestProtocol, baseUrl: string, apiKey: string): string {
|
function providerModelsUrl(protocol: ConnectionTestProtocol, baseUrl: string, apiKey: string): string {
|
||||||
if (protocol === 'openai') return appendVersionedApiPath(baseUrl, '/models');
|
if (protocol === 'openai' || protocol === 'senseaudio') {
|
||||||
|
return appendVersionedApiPath(baseUrl, '/models');
|
||||||
|
}
|
||||||
if (protocol === 'anthropic') {
|
if (protocol === 'anthropic') {
|
||||||
const url = new URL(appendVersionedApiPath(baseUrl, '/models'));
|
const url = new URL(appendVersionedApiPath(baseUrl, '/models'));
|
||||||
url.searchParams.set('limit', '1000');
|
url.searchParams.set('limit', '1000');
|
||||||
|
|
@ -167,7 +169,9 @@ function providerModelsHeaders(
|
||||||
protocol: ConnectionTestProtocol,
|
protocol: ConnectionTestProtocol,
|
||||||
apiKey: string,
|
apiKey: string,
|
||||||
): Record<string, string> {
|
): Record<string, string> {
|
||||||
if (protocol === 'openai') return { authorization: `Bearer ${apiKey}` };
|
if (protocol === 'openai' || protocol === 'senseaudio') {
|
||||||
|
return { authorization: `Bearer ${apiKey}` };
|
||||||
|
}
|
||||||
if (protocol === 'anthropic') {
|
if (protocol === 'anthropic') {
|
||||||
return {
|
return {
|
||||||
'x-api-key': apiKey,
|
'x-api-key': apiKey,
|
||||||
|
|
@ -178,7 +182,9 @@ function providerModelsHeaders(
|
||||||
}
|
}
|
||||||
|
|
||||||
function extractModels(protocol: ConnectionTestProtocol, data: unknown): ProviderModelOption[] {
|
function extractModels(protocol: ConnectionTestProtocol, data: unknown): ProviderModelOption[] {
|
||||||
if (protocol === 'openai') return extractOpenAiModels(data);
|
// SenseAudio's /v1/models response follows the OpenAI envelope
|
||||||
|
// (`{ data: [{ id, ... }] }`), so the same extractor handles both.
|
||||||
|
if (protocol === 'openai' || protocol === 'senseaudio') return extractOpenAiModels(data);
|
||||||
if (protocol === 'anthropic') return extractAnthropicModels(data);
|
if (protocol === 'anthropic') return extractAnthropicModels(data);
|
||||||
if (protocol === 'google') return extractGoogleModels(data);
|
if (protocol === 'google') return extractGoogleModels(data);
|
||||||
return [];
|
return [];
|
||||||
|
|
|
||||||
|
|
@ -10859,6 +10859,7 @@ export async function startServer({
|
||||||
db,
|
db,
|
||||||
design,
|
design,
|
||||||
http: httpDeps,
|
http: httpDeps,
|
||||||
|
paths: pathDeps,
|
||||||
chat: { startChatRun, submitToolResultToRun },
|
chat: { startChatRun, submitToolResultToRun },
|
||||||
agents: agentDeps,
|
agents: agentDeps,
|
||||||
critique: critiqueDeps,
|
critique: critiqueDeps,
|
||||||
|
|
|
||||||
686
apps/daemon/tests/byok-tools.test.ts
Normal file
686
apps/daemon/tests/byok-tools.test.ts
Normal file
|
|
@ -0,0 +1,686 @@
|
||||||
|
import { mkdir, mkdtemp, readFile, rm } from 'node:fs/promises';
|
||||||
|
import { tmpdir } from 'node:os';
|
||||||
|
import path from 'node:path';
|
||||||
|
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||||
|
|
||||||
|
import {
|
||||||
|
BYOK_SENSEAUDIO_TOOLS,
|
||||||
|
executeGenerateImage,
|
||||||
|
executeGenerateVideo,
|
||||||
|
} from '../src/byok-tools.js';
|
||||||
|
|
||||||
|
describe('BYOK_SENSEAUDIO_TOOLS', () => {
|
||||||
|
it('exports an OpenAI-shaped generate_image tool definition', () => {
|
||||||
|
const tool = BYOK_SENSEAUDIO_TOOLS.find(
|
||||||
|
(t) => t.function.name === 'generate_image',
|
||||||
|
);
|
||||||
|
expect(tool).toBeDefined();
|
||||||
|
expect(tool!.type).toBe('function');
|
||||||
|
expect(tool!.function.parameters.required).toEqual(['prompt']);
|
||||||
|
expect(tool!.function.parameters.properties.aspect_ratio.enum).toEqual([
|
||||||
|
'1:1',
|
||||||
|
'16:9',
|
||||||
|
'9:16',
|
||||||
|
'4:3',
|
||||||
|
'3:4',
|
||||||
|
]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('exposes both generate_image and generate_video tools', () => {
|
||||||
|
const names = BYOK_SENSEAUDIO_TOOLS.map((t) => t.function.name).sort();
|
||||||
|
expect(names).toEqual(['generate_image', 'generate_video']);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('executeGenerateImage', () => {
|
||||||
|
let root: string;
|
||||||
|
let projectsRoot: string;
|
||||||
|
const PROJECT_ID = 'test-project';
|
||||||
|
const realFetch = globalThis.fetch;
|
||||||
|
|
||||||
|
beforeEach(async () => {
|
||||||
|
root = await mkdtemp(path.join(tmpdir(), 'od-byok-tools-'));
|
||||||
|
projectsRoot = path.join(root, 'projects');
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(async () => {
|
||||||
|
globalThis.fetch = realFetch;
|
||||||
|
vi.unstubAllGlobals();
|
||||||
|
await rm(root, { recursive: true, force: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
const baseCtx = () => ({
|
||||||
|
projectRoot: root,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: PROJECT_ID,
|
||||||
|
upstreamApiKey: 'sa-byok-key',
|
||||||
|
upstreamBaseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
|
||||||
|
it('calls /v1/image/sync, downloads the URL, persists bytes, and returns a daemon URL', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
expect(init?.method).toBe('POST');
|
||||||
|
expect(init?.headers).toMatchObject({
|
||||||
|
authorization: 'Bearer sa-byok-key',
|
||||||
|
'content-type': 'application/json',
|
||||||
|
});
|
||||||
|
expect(JSON.parse(String(init?.body))).toEqual({
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'a tabby cat playing with yarn',
|
||||||
|
size: '1024x1024',
|
||||||
|
});
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
url: 'https://cdn.example.test/generated/cat.png',
|
||||||
|
base_resp: { status_code: 0, status_msg: 'success' },
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url === 'https://cdn.example.test/generated/cat.png') {
|
||||||
|
return new Response(pngBytes, {
|
||||||
|
status: 200,
|
||||||
|
headers: { 'content-type': 'image/png' },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'a tabby cat playing with yarn' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
// Returns a relative URL through the project file route so the
|
||||||
|
// chat UI loads same-origin via Next.js's /api/:path* rewrite,
|
||||||
|
// satisfying the strict CSP `img-src 'self'`. Path component is
|
||||||
|
// url-encoded so unusual (but isSafeId-passing) project ids don't
|
||||||
|
// break the URL.
|
||||||
|
expect(result.url).toMatch(
|
||||||
|
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-[a-z0-9-]+\\.png$`),
|
||||||
|
);
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||||
|
|
||||||
|
// Persisted file lives inside the project folder where listFiles /
|
||||||
|
// readProjectFile / archive plumbing will all discover it.
|
||||||
|
const filename = result.url!.split('/').pop()!;
|
||||||
|
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
|
||||||
|
expect(onDisk.equals(pngBytes)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('honours args.model when the LLM picks a SenseAudio image model', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
expect(JSON.parse(String(init?.body)).model).toBe('doubao-seedream-5-0-260128');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/hi.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'wallpaper', model: 'doubao-seedream-5-0-260128' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to ctx.defaultImageModel when args.model is missing', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/std.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'standard' },
|
||||||
|
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('ignores args.model when it is not in the SenseAudio allowlist', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
// Falls through to ctx.defaultImageModel (registry-valid).
|
||||||
|
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'spoofed', model: 'evil-model-id' },
|
||||||
|
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to registry default when both args.model and ctx.defaultImageModel are missing/invalid', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
// Registry default is the first SenseAudio entry — 2.0 today.
|
||||||
|
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-2.0-260319');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/d.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'no model anywhere' },
|
||||||
|
{ ...baseCtx(), defaultImageModel: 'also-bogus' },
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects unsafe projectId before any upstream call', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'x' },
|
||||||
|
{ ...baseCtx(), projectId: '../escape' },
|
||||||
|
);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/invalid projectId/);
|
||||||
|
// ensureProject runs up front so the unsafe id is caught BEFORE
|
||||||
|
// any senseaudio upstream call goes out — no token spent, no
|
||||||
|
// attempt to write outside the project tree.
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('maps aspect_ratio to the SenseAudio size string', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/wide.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'widescreen banner', aspect_ratio: '16:9' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to 1:1 for unknown aspect_ratio values', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50]);
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
expect(JSON.parse(String(init?.body)).size).toBe('1024x1024');
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/square.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage(
|
||||||
|
{ prompt: 'square thing', aspect_ratio: 'something-else' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns { ok: false } on missing prompt', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({}, baseCtx());
|
||||||
|
|
||||||
|
expect(result).toEqual({ ok: false, error: 'prompt is required' });
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns { ok: false } when no API key is available', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const ctx = { ...baseCtx(), upstreamApiKey: '' };
|
||||||
|
const result = await executeGenerateImage({ prompt: 'whatever' }, ctx);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/no SenseAudio API key/);
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces HTTP failures with status code and truncated body', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response('unauthorized', {
|
||||||
|
status: 401,
|
||||||
|
headers: { 'content-type': 'text/plain' },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/senseaudio image 401/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces error_message envelope verbatim', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response(
|
||||||
|
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/sensitive_content_blocked/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces base_resp non-zero status_code', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
base_resp: { status_code: 1004, status_msg: 'quota exhausted' },
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/api error 1004/);
|
||||||
|
expect(result.error).toMatch(/quota exhausted/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns { ok: false } when upstream returns no url', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response(
|
||||||
|
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'ok' } }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/missing url/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns { ok: false } when the image download fails', async () => {
|
||||||
|
const fetchMock = vi.fn(async (input: unknown) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/image/sync')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/will-404.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response('not found', { status: 404 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/image download 404/);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('BYOK_SENSEAUDIO_TOOLS — video', () => {
|
||||||
|
it('exposes a generate_video tool definition with the documented param surface', () => {
|
||||||
|
const video = BYOK_SENSEAUDIO_TOOLS.find(
|
||||||
|
(t) => t.function.name === 'generate_video',
|
||||||
|
);
|
||||||
|
expect(video).toBeDefined();
|
||||||
|
const props = video!.function.parameters.properties as Record<string, any>;
|
||||||
|
expect(video!.function.parameters.required).toEqual(['prompt']);
|
||||||
|
expect(props.aspect_ratio.enum).toEqual(['16:9', '9:16', '4:3', '3:4', '1:1']);
|
||||||
|
expect(props.resolution.enum).toEqual(['480p', '720p', '1080p']);
|
||||||
|
expect(props.duration).toMatchObject({ type: 'integer', minimum: 4, maximum: 15 });
|
||||||
|
expect(props.generate_audio.type).toBe('boolean');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('executeGenerateVideo', () => {
|
||||||
|
let root: string;
|
||||||
|
let projectsRoot: string;
|
||||||
|
const PROJECT_ID = 'test-project';
|
||||||
|
const realFetch = globalThis.fetch;
|
||||||
|
|
||||||
|
beforeEach(async () => {
|
||||||
|
root = await mkdtemp(path.join(tmpdir(), 'od-byok-video-'));
|
||||||
|
projectsRoot = path.join(root, 'projects');
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(async () => {
|
||||||
|
globalThis.fetch = realFetch;
|
||||||
|
vi.unstubAllGlobals();
|
||||||
|
await rm(root, { recursive: true, force: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
const baseCtx = () => ({
|
||||||
|
projectRoot: root,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: PROJECT_ID,
|
||||||
|
upstreamApiKey: 'sa-byok-key',
|
||||||
|
upstreamBaseUrl: 'https://api.senseaudio.cn',
|
||||||
|
// Keep tests fast — 1 ms between polls instead of the production 5 s.
|
||||||
|
videoPollIntervalMs: 1,
|
||||||
|
});
|
||||||
|
|
||||||
|
it('creates, polls until completed, downloads, and writes the mp4 into the project folder', async () => {
|
||||||
|
const mp4Bytes = Buffer.from([0x00, 0x00, 0x00, 0x18, 0x66, 0x74, 0x79, 0x70]);
|
||||||
|
let pollCount = 0;
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/video/create') {
|
||||||
|
expect(init?.method).toBe('POST');
|
||||||
|
expect(init?.headers).toMatchObject({
|
||||||
|
authorization: 'Bearer sa-byok-key',
|
||||||
|
'content-type': 'application/json',
|
||||||
|
});
|
||||||
|
const body = JSON.parse(String(init?.body));
|
||||||
|
expect(body).toEqual({
|
||||||
|
model: 'doubao-seedance-2-0-260128',
|
||||||
|
content: [{ type: 'text', text: 'a sunset over the ocean' }],
|
||||||
|
duration: 8,
|
||||||
|
resolution: '1080p',
|
||||||
|
ratio: '16:9',
|
||||||
|
provider_specific: { generate_audio: true },
|
||||||
|
});
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ task_id: 'task-abc' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (url.startsWith('https://api.senseaudio.cn/v1/video/status?id=task-abc')) {
|
||||||
|
pollCount++;
|
||||||
|
if (pollCount === 1) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ status: 'pending', progress: 0 }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (pollCount === 2) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ status: 'processing', progress: 50 }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
status: 'completed',
|
||||||
|
progress: 100,
|
||||||
|
video_url: 'https://cdn.example.test/video/done.mp4',
|
||||||
|
duration: 8,
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (url === 'https://cdn.example.test/video/done.mp4') {
|
||||||
|
return new Response(mp4Bytes, {
|
||||||
|
status: 200,
|
||||||
|
headers: { 'content-type': 'video/mp4' },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo(
|
||||||
|
{
|
||||||
|
prompt: 'a sunset over the ocean',
|
||||||
|
aspect_ratio: '16:9',
|
||||||
|
duration: 8,
|
||||||
|
resolution: '1080p',
|
||||||
|
generate_audio: true,
|
||||||
|
},
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
expect(result.url).toMatch(
|
||||||
|
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-video-[a-z0-9-]+\\.mp4$`),
|
||||||
|
);
|
||||||
|
|
||||||
|
// 1× create + 3× poll + 1× download = 5 fetches total.
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(5);
|
||||||
|
expect(pollCount).toBe(3);
|
||||||
|
|
||||||
|
const filename = result.url!.split('/').pop()!;
|
||||||
|
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
|
||||||
|
expect(onDisk.equals(mp4Bytes)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('defaults duration / resolution / aspect when caller omits them', async () => {
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/video/create')) {
|
||||||
|
const body = JSON.parse(String(init?.body));
|
||||||
|
expect(body).toMatchObject({
|
||||||
|
duration: 5,
|
||||||
|
resolution: '720p',
|
||||||
|
ratio: '16:9',
|
||||||
|
provider_specific: { generate_audio: false },
|
||||||
|
});
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ task_id: 'task-defaults' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
status: 'completed',
|
||||||
|
video_url: 'https://cdn.example.test/video/d.mp4',
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(Buffer.from([0x01]), { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo({ prompt: 'minimal' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('clamps duration outside the 4–15 range and rejects non-enum aspect_ratio / resolution', async () => {
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/video/create')) {
|
||||||
|
const body = JSON.parse(String(init?.body));
|
||||||
|
// 99 → clamped to 15; 'octagonal' → falls back to '16:9';
|
||||||
|
// '8k' → falls back to '720p'.
|
||||||
|
expect(body).toMatchObject({
|
||||||
|
duration: 15,
|
||||||
|
resolution: '720p',
|
||||||
|
ratio: '16:9',
|
||||||
|
});
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ task_id: 'task-clamp' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
status: 'completed',
|
||||||
|
video_url: 'https://cdn.example.test/clamp.mp4',
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return new Response(Buffer.from([0x02]), { status: 200 });
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo(
|
||||||
|
{
|
||||||
|
prompt: 'overflow',
|
||||||
|
duration: 99,
|
||||||
|
aspect_ratio: 'octagonal',
|
||||||
|
resolution: '8k',
|
||||||
|
},
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces a failed status as a tool error so the model can apologize', async () => {
|
||||||
|
const fetchMock = vi.fn(async (input: unknown) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/video/create')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ task_id: 'task-fail' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
status: 'failed',
|
||||||
|
error_message: 'sensitive_content_blocked',
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo(
|
||||||
|
{ prompt: 'blocked content' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/senseaudio video failed/);
|
||||||
|
expect(result.error).toMatch(/sensitive_content_blocked/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('times out after SENSEAUDIO_VIDEO_MAX_POLLS polls when the job stays pending', async () => {
|
||||||
|
const fetchMock = vi.fn(async (input: unknown) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.endsWith('/v1/video/create')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ task_id: 'task-stuck' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ status: 'pending', progress: 0 }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo(
|
||||||
|
{ prompt: 'stuck job' },
|
||||||
|
baseCtx(),
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/timed out/);
|
||||||
|
// 1× create + 120× poll = 121 fetches (10-min ceiling at 5 s
|
||||||
|
// intervals — kept generous because doubao-seedance frequently
|
||||||
|
// spends 3–8 min on the gateway for 1080p+audio jobs).
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(121);
|
||||||
|
}, 30_000);
|
||||||
|
|
||||||
|
it('returns a tool error when create response is missing task_id', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response('{"oops": true}', {
|
||||||
|
status: 200,
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/missing task_id/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns a tool error when create call returns non-2xx', async () => {
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response('unauthorized', {
|
||||||
|
status: 401,
|
||||||
|
headers: { 'content-type': 'text/plain' },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/senseaudio video create 401/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects an unsafe projectId before any upstream call', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo(
|
||||||
|
{ prompt: 'x' },
|
||||||
|
{ ...baseCtx(), projectId: '../escape' },
|
||||||
|
);
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/invalid projectId/);
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects empty prompt before any upstream call', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await executeGenerateVideo({}, baseCtx());
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
expect(result.error).toMatch(/prompt is required/);
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -8,6 +8,7 @@ import {
|
||||||
readMaskedConfig,
|
readMaskedConfig,
|
||||||
resolveModelAlias,
|
resolveModelAlias,
|
||||||
resolveProviderConfig,
|
resolveProviderConfig,
|
||||||
|
seedProviderIfMissing,
|
||||||
writeConfig,
|
writeConfig,
|
||||||
} from '../src/media-config.js';
|
} from '../src/media-config.js';
|
||||||
|
|
||||||
|
|
@ -868,3 +869,159 @@ describe('media-config model alias resolution (issue #1277)', () => {
|
||||||
).toBe('doubao-seedream-5-0');
|
).toBe('doubao-seedream-5-0');
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('seedProviderIfMissing', () => {
|
||||||
|
let projectRoot: string;
|
||||||
|
const SENSEAUDIO_ENV_KEYS = ['OD_SENSEAUDIO_API_KEY', 'SENSEAUDIO_API_KEY'];
|
||||||
|
const originalEnv = Object.fromEntries(
|
||||||
|
SENSEAUDIO_ENV_KEYS.map((key) => [key, process.env[key]]),
|
||||||
|
);
|
||||||
|
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
const originalDataDir = process.env.OD_DATA_DIR;
|
||||||
|
|
||||||
|
beforeEach(async () => {
|
||||||
|
projectRoot = await mkdtemp(path.join(tmpdir(), 'od-media-seed-'));
|
||||||
|
for (const key of SENSEAUDIO_ENV_KEYS) {
|
||||||
|
delete process.env[key];
|
||||||
|
}
|
||||||
|
delete process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
delete process.env.OD_DATA_DIR;
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(async () => {
|
||||||
|
for (const key of SENSEAUDIO_ENV_KEYS) {
|
||||||
|
if (originalEnv[key] == null) {
|
||||||
|
delete process.env[key];
|
||||||
|
} else {
|
||||||
|
process.env[key] = originalEnv[key];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (originalMediaConfigDir == null) {
|
||||||
|
delete process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
} else {
|
||||||
|
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
|
||||||
|
}
|
||||||
|
if (originalDataDir == null) {
|
||||||
|
delete process.env.OD_DATA_DIR;
|
||||||
|
} else {
|
||||||
|
process.env.OD_DATA_DIR = originalDataDir;
|
||||||
|
}
|
||||||
|
await rm(projectRoot, { recursive: true, force: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
async function writeStored(data: unknown) {
|
||||||
|
const file = path.join(projectRoot, '.od', 'media-config.json');
|
||||||
|
await mkdir(path.dirname(file), { recursive: true });
|
||||||
|
await writeFile(file, JSON.stringify(data), 'utf8');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function readStoredJson(): Promise<unknown> {
|
||||||
|
const file = path.join(projectRoot, '.od', 'media-config.json');
|
||||||
|
const raw = await readFile(file, 'utf8');
|
||||||
|
return JSON.parse(raw);
|
||||||
|
}
|
||||||
|
|
||||||
|
it('writes a fresh entry when the slot is empty', async () => {
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: 'sa-test-key',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(true);
|
||||||
|
const stored = await readStoredJson();
|
||||||
|
expect(stored).toEqual({
|
||||||
|
providers: {
|
||||||
|
senseaudio: {
|
||||||
|
apiKey: 'sa-test-key',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('no-ops and preserves the stored key when one is already configured', async () => {
|
||||||
|
await writeStored({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'pre-existing-key', baseUrl: 'https://existing.example' },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: 'newer-byok-key',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(false);
|
||||||
|
const stored = (await readStoredJson()) as { providers: Record<string, unknown> };
|
||||||
|
expect(stored.providers.senseaudio).toEqual({
|
||||||
|
apiKey: 'pre-existing-key',
|
||||||
|
baseUrl: 'https://existing.example',
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('preserves every other provider and aliases when seeding', async () => {
|
||||||
|
await writeStored({
|
||||||
|
providers: {
|
||||||
|
openai: { apiKey: 'sk-openai', baseUrl: 'https://api.openai.com/v1' },
|
||||||
|
volcengine: { apiKey: 'ark-key', baseUrl: 'https://ark.cn-beijing.volces.com/api/v3' },
|
||||||
|
},
|
||||||
|
aliases: { 'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0' },
|
||||||
|
});
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: 'sa-new',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(true);
|
||||||
|
const stored = (await readStoredJson()) as {
|
||||||
|
providers: Record<string, unknown>;
|
||||||
|
aliases: Record<string, string>;
|
||||||
|
};
|
||||||
|
expect(stored.providers.openai).toEqual({
|
||||||
|
apiKey: 'sk-openai',
|
||||||
|
baseUrl: 'https://api.openai.com/v1',
|
||||||
|
});
|
||||||
|
expect(stored.providers.volcengine).toEqual({
|
||||||
|
apiKey: 'ark-key',
|
||||||
|
baseUrl: 'https://ark.cn-beijing.volces.com/api/v3',
|
||||||
|
});
|
||||||
|
expect(stored.providers.senseaudio).toEqual({ apiKey: 'sa-new' });
|
||||||
|
expect(stored.aliases).toEqual({
|
||||||
|
'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0',
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('no-ops when an env var resolves a key for the provider', async () => {
|
||||||
|
process.env.OD_SENSEAUDIO_API_KEY = 'env-key';
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: 'sa-byok-key',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(false);
|
||||||
|
await expect(readStoredJson()).rejects.toThrow();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('no-ops on empty apiKey', async () => {
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: '',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(false);
|
||||||
|
await expect(readStoredJson()).rejects.toThrow();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('no-ops for unknown provider ids', async () => {
|
||||||
|
const wrote = await seedProviderIfMissing(projectRoot, 'not-a-provider', {
|
||||||
|
apiKey: 'whatever',
|
||||||
|
});
|
||||||
|
expect(wrote).toBe(false);
|
||||||
|
await expect(readStoredJson()).rejects.toThrow();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resolves the seeded key through resolveProviderConfig', async () => {
|
||||||
|
await seedProviderIfMissing(projectRoot, 'senseaudio', {
|
||||||
|
apiKey: 'sa-final',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
const resolved = await resolveProviderConfig(projectRoot, 'senseaudio');
|
||||||
|
expect(resolved).toEqual({
|
||||||
|
apiKey: 'sa-final',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
|
||||||
305
apps/daemon/tests/media-senseaudio-image.test.ts
Normal file
305
apps/daemon/tests/media-senseaudio-image.test.ts
Normal file
|
|
@ -0,0 +1,305 @@
|
||||||
|
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||||
|
import { tmpdir } from 'node:os';
|
||||||
|
import path from 'node:path';
|
||||||
|
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||||
|
|
||||||
|
import { generateMedia } from '../src/media.js';
|
||||||
|
|
||||||
|
const TEST_SENSEAUDIO_BASE_URL = 'https://senseaudio-gateway.example.test';
|
||||||
|
const TEST_IMAGE_URL = 'https://cdn.example.test/generated/abc.png';
|
||||||
|
const TEST_IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
|
||||||
|
|
||||||
|
function buildOkResponse(url = TEST_IMAGE_URL) {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url, base_resp: { status_code: 0, status_msg: 'success' } }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildImageFetchResponse(bytes: Buffer) {
|
||||||
|
return new Response(bytes, {
|
||||||
|
status: 200,
|
||||||
|
headers: { 'content-type': 'image/png' },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('senseaudio image generation', () => {
|
||||||
|
let root: string;
|
||||||
|
let projectRoot: string;
|
||||||
|
let projectsRoot: string;
|
||||||
|
const realFetch = globalThis.fetch;
|
||||||
|
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
const originalDataDir = process.env.OD_DATA_DIR;
|
||||||
|
|
||||||
|
beforeEach(async () => {
|
||||||
|
root = await mkdtemp(path.join(tmpdir(), 'od-senseaudio-image-'));
|
||||||
|
projectRoot = path.join(root, 'project-root');
|
||||||
|
projectsRoot = path.join(projectRoot, '.od', 'projects');
|
||||||
|
await mkdir(projectsRoot, { recursive: true });
|
||||||
|
delete process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
delete process.env.OD_DATA_DIR;
|
||||||
|
delete process.env.OD_SENSEAUDIO_API_KEY;
|
||||||
|
delete process.env.SENSEAUDIO_API_KEY;
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(async () => {
|
||||||
|
globalThis.fetch = realFetch;
|
||||||
|
if (originalMediaConfigDir == null) {
|
||||||
|
delete process.env.OD_MEDIA_CONFIG_DIR;
|
||||||
|
} else {
|
||||||
|
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
|
||||||
|
}
|
||||||
|
if (originalDataDir == null) {
|
||||||
|
delete process.env.OD_DATA_DIR;
|
||||||
|
} else {
|
||||||
|
process.env.OD_DATA_DIR = originalDataDir;
|
||||||
|
}
|
||||||
|
delete process.env.OD_SENSEAUDIO_API_KEY;
|
||||||
|
delete process.env.SENSEAUDIO_API_KEY;
|
||||||
|
await rm(root, { recursive: true, force: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
async function writeConfig(data: unknown) {
|
||||||
|
const file = path.join(projectRoot, '.od', 'media-config.json');
|
||||||
|
await mkdir(path.dirname(file), { recursive: true });
|
||||||
|
await writeFile(file, JSON.stringify(data), 'utf8');
|
||||||
|
}
|
||||||
|
|
||||||
|
it('renders a SenseAudio image with the documented sync defaults', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: {
|
||||||
|
apiKey: 'sense-test-key',
|
||||||
|
baseUrl: TEST_SENSEAUDIO_BASE_URL,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const urlStr = String(input);
|
||||||
|
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
|
||||||
|
expect(init?.method).toBe('POST');
|
||||||
|
expect(init?.headers).toMatchObject({
|
||||||
|
authorization: 'Bearer sense-test-key',
|
||||||
|
'content-type': 'application/json',
|
||||||
|
});
|
||||||
|
expect(JSON.parse(String(init?.body))).toEqual({
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'A magazine-style hero poster.',
|
||||||
|
size: '1024x1024',
|
||||||
|
});
|
||||||
|
return buildOkResponse();
|
||||||
|
}
|
||||||
|
if (urlStr === TEST_IMAGE_URL) {
|
||||||
|
return buildImageFetchResponse(TEST_IMAGE_BYTES);
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${urlStr}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const result = await generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'A magazine-style hero poster.',
|
||||||
|
output: 'sa-hero.png',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||||
|
expect(result.providerId).toBe('senseaudio');
|
||||||
|
expect(result.providerNote).toContain('senseaudio/senseaudio-image-2.0-260319');
|
||||||
|
expect(result.providerNote).toContain('1024x1024');
|
||||||
|
|
||||||
|
const bytes = await readFile(path.join(projectsRoot, 'project-1', 'sa-hero.png'));
|
||||||
|
expect(bytes.equals(TEST_IMAGE_BYTES)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('maps aspect ratios to the SenseAudio size strings', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
const urlStr = String(input);
|
||||||
|
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
|
||||||
|
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
|
||||||
|
return buildOkResponse();
|
||||||
|
}
|
||||||
|
return buildImageFetchResponse(TEST_IMAGE_BYTES);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-1.0-260319',
|
||||||
|
aspect: '16:9',
|
||||||
|
prompt: 'Widescreen banner.',
|
||||||
|
output: 'sa-banner.png',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to the canonical base URL when none is configured', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'sense-test-key' },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async (input: unknown) => {
|
||||||
|
const urlStr = String(input);
|
||||||
|
if (urlStr === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
return buildOkResponse();
|
||||||
|
}
|
||||||
|
return buildImageFetchResponse(TEST_IMAGE_BYTES);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'doubao-seedream-5-0-260128',
|
||||||
|
prompt: 'Default base url.',
|
||||||
|
output: 'sa-default-base.png',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reads the API key from OD_SENSEAUDIO_API_KEY when storage is empty', async () => {
|
||||||
|
process.env.OD_SENSEAUDIO_API_KEY = 'env-sense-key';
|
||||||
|
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
|
||||||
|
if (String(input).endsWith('/v1/image/sync')) {
|
||||||
|
expect(init?.headers).toMatchObject({ authorization: 'Bearer env-sense-key' });
|
||||||
|
return buildOkResponse();
|
||||||
|
}
|
||||||
|
return buildImageFetchResponse(TEST_IMAGE_BYTES);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'Env-only key.',
|
||||||
|
output: 'sa-env.png',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('errors when no API key is configured', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await expect(
|
||||||
|
generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'Should fail.',
|
||||||
|
output: 'sa-no-key.png',
|
||||||
|
}),
|
||||||
|
).rejects.toThrow(/no SenseAudio API key/);
|
||||||
|
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces HTTP-level failures with the status code and truncated body', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response('unauthorized', {
|
||||||
|
status: 401,
|
||||||
|
headers: { 'content-type': 'text/plain' },
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await expect(
|
||||||
|
generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'Bad auth.',
|
||||||
|
output: 'sa-401.png',
|
||||||
|
}),
|
||||||
|
).rejects.toThrow('senseaudio image 401: unauthorized');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces upstream error_message verbatim when the body reports failure', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response(
|
||||||
|
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await expect(
|
||||||
|
generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'Blocked.',
|
||||||
|
output: 'sa-blocked.png',
|
||||||
|
}),
|
||||||
|
).rejects.toThrow('senseaudio image api error: sensitive_content_blocked');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('errors when the response body is missing the image url', async () => {
|
||||||
|
await writeConfig({
|
||||||
|
providers: {
|
||||||
|
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async () =>
|
||||||
|
new Response(
|
||||||
|
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'success' } }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
),
|
||||||
|
);
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await expect(
|
||||||
|
generateMedia({
|
||||||
|
projectRoot,
|
||||||
|
projectsRoot,
|
||||||
|
projectId: 'project-1',
|
||||||
|
surface: 'image',
|
||||||
|
model: 'senseaudio-image-2.0-260319',
|
||||||
|
prompt: 'Missing url.',
|
||||||
|
output: 'sa-missing-url.png',
|
||||||
|
}),
|
||||||
|
).rejects.toThrow('senseaudio image response missing url');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -523,6 +523,497 @@ describe('API proxy routes', () => {
|
||||||
expect(upstreamInit?.redirect).toBe('error');
|
expect(upstreamInit?.redirect).toBe('error');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('streams delta + end for SenseAudio chat completions', async () => {
|
||||||
|
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
return Promise.resolve(sseResponse([
|
||||||
|
'data: {"choices":[{"delta":{"content":"sense"}}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n')));
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'hello' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
await expect(res.text()).resolves.toContain('event: delta\ndata: {"delta":"sense"}');
|
||||||
|
expect(fetchMock).toHaveBeenCalledWith(
|
||||||
|
'https://api.senseaudio.cn/v1/chat/completions',
|
||||||
|
expect.objectContaining({
|
||||||
|
headers: expect.objectContaining({ Authorization: 'Bearer sa-test' }),
|
||||||
|
redirect: 'error',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('defaults SenseAudio base URL to api.senseaudio.cn when caller omits it', async () => {
|
||||||
|
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(String(fetchMock.mock.calls[0]![0])).toBe(
|
||||||
|
'https://api.senseaudio.cn/v1/chat/completions',
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects SenseAudio requests that omit apiKey or model', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const missingKey = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
expect(missingKey.status).toBe(400);
|
||||||
|
|
||||||
|
const missingModel = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
expect(missingModel.status).toBe(400);
|
||||||
|
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('disables upstream redirects for senseaudio proxy requests', async () => {
|
||||||
|
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'model-one',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
|
||||||
|
!String(input).startsWith(baseUrl),
|
||||||
|
);
|
||||||
|
expect(upstreamCall).toBeDefined();
|
||||||
|
const upstreamInit = upstreamCall![1] as FetchInit;
|
||||||
|
expect(upstreamInit?.redirect).toBe('error');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('injects generate_image tool definition on every SenseAudio request', async () => {
|
||||||
|
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
return Promise.resolve(sseResponse([
|
||||||
|
'data: {"choices":[{"delta":{"content":"ok"}}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n')));
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
|
||||||
|
!String(input).startsWith(baseUrl),
|
||||||
|
);
|
||||||
|
expect(upstreamCall).toBeDefined();
|
||||||
|
const body = JSON.parse(String((upstreamCall![1] as FetchInit)?.body));
|
||||||
|
expect(body.tool_choice).toBe('auto');
|
||||||
|
expect(Array.isArray(body.tools)).toBe(true);
|
||||||
|
expect(body.tools[0]).toMatchObject({
|
||||||
|
type: 'function',
|
||||||
|
function: { name: 'generate_image' },
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('runs the BYOK image tool loop end-to-end', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
|
||||||
|
const upstreamChatBodies: any[] = [];
|
||||||
|
let chatCallIndex = 0;
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
|
||||||
|
// SenseAudio image generation
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
url: 'https://cdn.example.test/cat.png',
|
||||||
|
base_resp: { status_code: 0, status_msg: 'success' },
|
||||||
|
}),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Image bytes download (initiated by the tool, not via the proxy)
|
||||||
|
if (url === 'https://cdn.example.test/cat.png') {
|
||||||
|
return new Response(pngBytes, {
|
||||||
|
status: 200,
|
||||||
|
headers: { 'content-type': 'image/png' },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Upstream chat completions — capture bodies, return different SSE per call
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
|
||||||
|
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
|
||||||
|
chatCallIndex++;
|
||||||
|
if (chatCallIndex === 1) {
|
||||||
|
// First turn: model decides to call generate_image
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"a cat\\"}"}}]},"finish_reason":null}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
// Second turn: model summarises with image embedded in markdown
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"content":"Here is your cat: "}}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"content":""}}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'draw a cat' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(res.status).toBe(200);
|
||||||
|
const body = await res.text();
|
||||||
|
|
||||||
|
// Final assistant text streams through to the client
|
||||||
|
expect(body).toContain('event: delta');
|
||||||
|
expect(body).toContain('Here is your cat');
|
||||||
|
expect(body).toContain('');
|
||||||
|
expect(body).toContain('event: end');
|
||||||
|
|
||||||
|
// Two upstream chat completions calls happened (loop ran exactly once)
|
||||||
|
expect(upstreamChatBodies).toHaveLength(2);
|
||||||
|
|
||||||
|
// Second upstream call includes assistant{tool_calls} + tool{result}
|
||||||
|
const secondMessages = upstreamChatBodies[1].messages;
|
||||||
|
expect(secondMessages).toHaveLength(3);
|
||||||
|
expect(secondMessages[0]).toEqual({ role: 'user', content: 'draw a cat' });
|
||||||
|
expect(secondMessages[1]).toMatchObject({
|
||||||
|
role: 'assistant',
|
||||||
|
content: null,
|
||||||
|
tool_calls: [
|
||||||
|
{
|
||||||
|
id: 'call_abc',
|
||||||
|
type: 'function',
|
||||||
|
function: {
|
||||||
|
name: 'generate_image',
|
||||||
|
arguments: '{"prompt":"a cat"}',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
expect(secondMessages[2]).toMatchObject({
|
||||||
|
role: 'tool',
|
||||||
|
tool_call_id: 'call_abc',
|
||||||
|
content: expect.stringMatching(
|
||||||
|
/Image generated successfully\. URL: \/api\/projects\/test-project\/files\/byok-[a-z0-9-]+\.png/,
|
||||||
|
),
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('feeds a tool error message back to the model when generate_image fails', async () => {
|
||||||
|
const upstreamChatBodies: any[] = [];
|
||||||
|
let chatCallIndex = 0;
|
||||||
|
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
|
||||||
|
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
|
||||||
|
chatCallIndex++;
|
||||||
|
if (chatCallIndex === 1) {
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_err","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"...\\"}"}}]},"finish_reason":null}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"content":"Sorry, that one was blocked."}}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'draw something blocked' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(res.status).toBe(200);
|
||||||
|
const body = await res.text();
|
||||||
|
expect(body).toContain('Sorry, that one was blocked');
|
||||||
|
|
||||||
|
expect(upstreamChatBodies).toHaveLength(2);
|
||||||
|
const toolMsg = upstreamChatBodies[1].messages[2];
|
||||||
|
expect(toolMsg.role).toBe('tool');
|
||||||
|
expect(toolMsg.tool_call_id).toBe('call_err');
|
||||||
|
expect(toolMsg.content).toMatch(/Image generation failed/);
|
||||||
|
expect(toolMsg.content).toMatch(/sensitive_content_blocked/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('bounds the BYOK tool loop at MAX_BYOK_TOOL_LOOPS=3', async () => {
|
||||||
|
let chatCallIndex = 0;
|
||||||
|
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url === 'https://cdn.example.test/x.png') {
|
||||||
|
return new Response(Buffer.from([0x89, 0x50]), { status: 200 });
|
||||||
|
}
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
|
||||||
|
chatCallIndex++;
|
||||||
|
// Always return tool_calls — the model never returns text
|
||||||
|
return sseResponse([
|
||||||
|
`data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_${chatCallIndex}","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"x\\"}"}}]},"finish_reason":null}]}`,
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'infinite' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(res.status).toBe(200);
|
||||||
|
const body = await res.text();
|
||||||
|
expect(body).toContain('event: end');
|
||||||
|
// Loop ran exactly MAX_BYOK_TOOL_LOOPS times before bailing.
|
||||||
|
expect(chatCallIndex).toBe(3);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('writes the generated image into the project folder and serves it via /api/projects/:id/files/*', async () => {
|
||||||
|
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x42, 0x59]);
|
||||||
|
let capturedUrl: string | undefined;
|
||||||
|
|
||||||
|
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
|
||||||
|
const url = String(input);
|
||||||
|
if (url.startsWith(baseUrl)) return realFetch(input, init);
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ url: 'https://cdn.example.test/served.png' }),
|
||||||
|
{ status: 200, headers: { 'content-type': 'application/json' } },
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (url === 'https://cdn.example.test/served.png') {
|
||||||
|
return new Response(pngBytes, { status: 200 });
|
||||||
|
}
|
||||||
|
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
|
||||||
|
const body = JSON.parse(String(init?.body || '{}'));
|
||||||
|
// Capture URL the tool produced from the second turn's tool message.
|
||||||
|
const toolMsg = body.messages?.find((m: any) => m.role === 'tool');
|
||||||
|
if (toolMsg) {
|
||||||
|
const match = /URL: (\/api\/projects\/[A-Za-z0-9._-]+\/files\/byok-[a-z0-9-]+\.png)/.exec(toolMsg.content);
|
||||||
|
if (match) capturedUrl = match[1];
|
||||||
|
}
|
||||||
|
const isToolTurn = !toolMsg;
|
||||||
|
if (isToolTurn) {
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_serve","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"s\\"}"}}]},"finish_reason":null}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
return sseResponse([
|
||||||
|
'data: {"choices":[{"index":0,"delta":{"content":"done"}}]}',
|
||||||
|
'',
|
||||||
|
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
|
||||||
|
'',
|
||||||
|
'data: [DONE]',
|
||||||
|
'',
|
||||||
|
].join('\n'));
|
||||||
|
}
|
||||||
|
throw new Error(`unexpected fetch: ${url}`);
|
||||||
|
});
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const proxyRes = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
projectId: 'test-project',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'gen' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
// Drain the SSE body so the tool loop fully completes before we assert.
|
||||||
|
await proxyRes.text();
|
||||||
|
|
||||||
|
expect(capturedUrl).toBeDefined();
|
||||||
|
// The URL the tool emits is relative — same-origin via Next.js
|
||||||
|
// rewrite in production, hits this test server directly here.
|
||||||
|
// We GET the captured URL through the standard project file route
|
||||||
|
// and assert the bytes come back. This proves both halves:
|
||||||
|
// (1) the image landed in <projectsRoot>/<projectId>/ as expected
|
||||||
|
// (so listFiles / FileViewer / archive will find it), and
|
||||||
|
// (2) /api/projects/:id/files/* serves it without needing any
|
||||||
|
// byok-specific route.
|
||||||
|
const imgRes = await realFetch(`${baseUrl}${capturedUrl!}`);
|
||||||
|
expect(imgRes.status).toBe(200);
|
||||||
|
expect(imgRes.headers.get('content-type')).toMatch(/^image\/png/);
|
||||||
|
const served = Buffer.from(await imgRes.arrayBuffer());
|
||||||
|
expect(served.equals(pngBytes)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects senseaudio chat requests without a projectId', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
// no projectId — should 400
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(res.status).toBe(400);
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects senseaudio chat requests with an unsafe projectId', async () => {
|
||||||
|
const fetchMock = vi.fn();
|
||||||
|
vi.stubGlobal('fetch', fetchMock);
|
||||||
|
|
||||||
|
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
apiKey: 'sa-test',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
projectId: '../etc/passwd',
|
||||||
|
messages: [{ role: 'user', content: 'hi' }],
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(res.status).toBe(400);
|
||||||
|
expect(fetchMock).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
// Plan §3.A4 / spec §11.8 (e2e-7): the API-fallback proxy paths must
|
// Plan §3.A4 / spec §11.8 (e2e-7): the API-fallback proxy paths must
|
||||||
// never carry plugin context. The web sidecar's fallback mode bypasses
|
// never carry plugin context. The web sidecar's fallback mode bypasses
|
||||||
// the daemon snapshot bus, so any pluginId / appliedPluginSnapshotId in
|
// the daemon snapshot bus, so any pluginId / appliedPluginSnapshotId in
|
||||||
|
|
@ -534,6 +1025,7 @@ describe('API proxy routes', () => {
|
||||||
'/api/proxy/openai/stream',
|
'/api/proxy/openai/stream',
|
||||||
'/api/proxy/azure/stream',
|
'/api/proxy/azure/stream',
|
||||||
'/api/proxy/google/stream',
|
'/api/proxy/google/stream',
|
||||||
|
'/api/proxy/senseaudio/stream',
|
||||||
];
|
];
|
||||||
|
|
||||||
for (const path of proxies) {
|
for (const path of proxies) {
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,7 @@ import {
|
||||||
trackStudioClickChatComposer,
|
trackStudioClickChatComposer,
|
||||||
trackStudioViewChatPanel,
|
trackStudioViewChatPanel,
|
||||||
} from '../analytics/events';
|
} from '../analytics/events';
|
||||||
|
import { IMAGE_MODELS } from "../media/models";
|
||||||
import { projectRawUrl, uploadProjectFiles, openFolderDialog, fetchConnectors } from "../providers/registry";
|
import { projectRawUrl, uploadProjectFiles, openFolderDialog, fetchConnectors } from "../providers/registry";
|
||||||
import { patchProject } from "../state/projects";
|
import { patchProject } from "../state/projects";
|
||||||
import { fetchMcpServers } from "../state/mcp";
|
import { fetchMcpServers } from "../state/mcp";
|
||||||
|
|
@ -126,6 +127,14 @@ interface Props {
|
||||||
researchAvailable?: boolean;
|
researchAvailable?: boolean;
|
||||||
projectMetadata?: ProjectMetadata;
|
projectMetadata?: ProjectMetadata;
|
||||||
onProjectMetadataChange?: (metadata: ProjectMetadata) => void;
|
onProjectMetadataChange?: (metadata: ProjectMetadata) => void;
|
||||||
|
// SenseAudio BYOK image-model picker shown above the textarea. Hidden
|
||||||
|
// when the active chat protocol is anything other than 'senseaudio',
|
||||||
|
// so the composer stays clean for every other BYOK tab. The state
|
||||||
|
// owner is ProjectView (per-session, reset on refresh); ChatComposer
|
||||||
|
// is a fully controlled select.
|
||||||
|
byokApiProtocol?: AppConfig['apiProtocol'];
|
||||||
|
byokImageModel?: string;
|
||||||
|
onChangeByokImageModel?: (model: string) => void;
|
||||||
currentSkillId?: string | null;
|
currentSkillId?: string | null;
|
||||||
onProjectSkillChange?: (skillId: string | null) => void;
|
onProjectSkillChange?: (skillId: string | null) => void;
|
||||||
// Set when the project was created with a plugin already pinned
|
// Set when the project was created with a plugin already pinned
|
||||||
|
|
@ -188,6 +197,9 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
|
||||||
researchAvailable = false,
|
researchAvailable = false,
|
||||||
projectMetadata,
|
projectMetadata,
|
||||||
onProjectMetadataChange,
|
onProjectMetadataChange,
|
||||||
|
byokApiProtocol,
|
||||||
|
byokImageModel,
|
||||||
|
onChangeByokImageModel,
|
||||||
currentSkillId = null,
|
currentSkillId = null,
|
||||||
onProjectSkillChange,
|
onProjectSkillChange,
|
||||||
pinnedPluginId = null,
|
pinnedPluginId = null,
|
||||||
|
|
@ -1186,6 +1198,53 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
|
||||||
t={t}
|
t={t}
|
||||||
/>
|
/>
|
||||||
) : null}
|
) : null}
|
||||||
|
{byokApiProtocol === 'senseaudio' && onChangeByokImageModel ? (
|
||||||
|
<div
|
||||||
|
className="composer-byok-image-model"
|
||||||
|
data-testid="composer-byok-image-model"
|
||||||
|
style={{
|
||||||
|
display: 'flex',
|
||||||
|
alignItems: 'center',
|
||||||
|
gap: 8,
|
||||||
|
padding: '4px 8px',
|
||||||
|
fontSize: 12,
|
||||||
|
color: 'var(--text-muted, #888)',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Icon name="image" size={13} />
|
||||||
|
<label
|
||||||
|
htmlFor="composer-byok-image-model-select"
|
||||||
|
style={{ flexShrink: 0 }}
|
||||||
|
>
|
||||||
|
{t('settings.byokImageModel')}
|
||||||
|
</label>
|
||||||
|
<select
|
||||||
|
id="composer-byok-image-model-select"
|
||||||
|
value={byokImageModel ?? ''}
|
||||||
|
onChange={(e) => onChangeByokImageModel(e.target.value)}
|
||||||
|
style={{
|
||||||
|
background: 'transparent',
|
||||||
|
border: '1px solid var(--border, #444)',
|
||||||
|
borderRadius: 4,
|
||||||
|
padding: '2px 6px',
|
||||||
|
color: 'inherit',
|
||||||
|
fontSize: 12,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<option value="">
|
||||||
|
{(IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
|
||||||
|
?? 'senseaudio-image-2.0') + ' (default)'}
|
||||||
|
</option>
|
||||||
|
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
|
||||||
|
(m) => (
|
||||||
|
<option key={m.id} value={m.id}>
|
||||||
|
{m.label}
|
||||||
|
</option>
|
||||||
|
),
|
||||||
|
)}
|
||||||
|
</select>
|
||||||
|
</div>
|
||||||
|
) : null}
|
||||||
{/*
|
{/*
|
||||||
Spec §8.4 — context bar above the composer input. The
|
Spec §8.4 — context bar above the composer input. The
|
||||||
section now behaves as a pure context bar: it renders the
|
section now behaves as a pure context bar: it renders the
|
||||||
|
|
|
||||||
|
|
@ -279,6 +279,12 @@ interface Props {
|
||||||
// message" without forcing a separate side widget.
|
// message" without forcing a separate side widget.
|
||||||
activePluginSnapshot?: AppliedPluginSnapshot | null;
|
activePluginSnapshot?: AppliedPluginSnapshot | null;
|
||||||
onCollapse?: () => void;
|
onCollapse?: () => void;
|
||||||
|
// SenseAudio BYOK only — wired straight through to ChatComposer for the
|
||||||
|
// in-composer image-model picker. Active protocol is read so the picker
|
||||||
|
// hides when the user is on any other BYOK tab (azure / openai / …).
|
||||||
|
byokApiProtocol?: AppConfig['apiProtocol'];
|
||||||
|
byokImageModel?: string;
|
||||||
|
onChangeByokImageModel?: (model: string) => void;
|
||||||
}
|
}
|
||||||
|
|
||||||
type Tab = 'chat' | 'comments';
|
type Tab = 'chat' | 'comments';
|
||||||
|
|
@ -327,6 +333,9 @@ export function ChatPane({
|
||||||
activePluginSnapshot,
|
activePluginSnapshot,
|
||||||
skills = [],
|
skills = [],
|
||||||
onCollapse,
|
onCollapse,
|
||||||
|
byokApiProtocol,
|
||||||
|
byokImageModel,
|
||||||
|
onChangeByokImageModel,
|
||||||
}: Props) {
|
}: Props) {
|
||||||
const t = useT();
|
const t = useT();
|
||||||
const logRef = useRef<HTMLDivElement | null>(null);
|
const logRef = useRef<HTMLDivElement | null>(null);
|
||||||
|
|
@ -872,6 +881,9 @@ export function ChatPane({
|
||||||
researchAvailable={researchAvailable}
|
researchAvailable={researchAvailable}
|
||||||
projectMetadata={projectMetadata}
|
projectMetadata={projectMetadata}
|
||||||
onProjectMetadataChange={onProjectMetadataChange}
|
onProjectMetadataChange={onProjectMetadataChange}
|
||||||
|
byokApiProtocol={byokApiProtocol}
|
||||||
|
byokImageModel={byokImageModel}
|
||||||
|
onChangeByokImageModel={onChangeByokImageModel}
|
||||||
currentSkillId={currentSkillId}
|
currentSkillId={currentSkillId}
|
||||||
onProjectSkillChange={onProjectSkillChange}
|
onProjectSkillChange={onProjectSkillChange}
|
||||||
pinnedPluginId={activePluginSnapshot?.pluginId ?? null}
|
pinnedPluginId={activePluginSnapshot?.pluginId ?? null}
|
||||||
|
|
|
||||||
|
|
@ -1192,7 +1192,14 @@ export function DesignFilesPanel({
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
{preview && previewFile ? (
|
{preview && previewFile ? (
|
||||||
|
// Key on the file name so React unmounts the previous DfPreview
|
||||||
|
// (and its iframe / image element) when the user clicks a
|
||||||
|
// different file. Without this, React diffing reuses the same
|
||||||
|
// iframe DOM node and the browser keeps showing the first
|
||||||
|
// file's contents — only the `src` prop changes but the iframe
|
||||||
|
// never actually navigates.
|
||||||
<DfPreview
|
<DfPreview
|
||||||
|
key={previewFile.name}
|
||||||
projectId={projectId}
|
projectId={projectId}
|
||||||
file={previewFile}
|
file={previewFile}
|
||||||
onOpen={() => onOpenFile(previewFile.name)}
|
onOpen={() => onOpenFile(previewFile.name)}
|
||||||
|
|
|
||||||
|
|
@ -486,6 +486,15 @@ export function ProjectView({
|
||||||
const [liveArtifacts, setLiveArtifacts] = useState<LiveArtifactSummary[]>([]);
|
const [liveArtifacts, setLiveArtifacts] = useState<LiveArtifactSummary[]>([]);
|
||||||
const [liveArtifactEvents, setLiveArtifactEvents] = useState<LiveArtifactEventItem[]>([]);
|
const [liveArtifactEvents, setLiveArtifactEvents] = useState<LiveArtifactEventItem[]>([]);
|
||||||
const [workspaceFocused, setWorkspaceFocused] = useState(false);
|
const [workspaceFocused, setWorkspaceFocused] = useState(false);
|
||||||
|
// Per-session override for the BYOK SenseAudio chat's generate_image
|
||||||
|
// tool. Seeded once from Settings (config.byokImageModel) so the
|
||||||
|
// composer dropdown opens on the user's chosen default; subsequent
|
||||||
|
// selections live only in this component's state — page refresh /
|
||||||
|
// project switch resets to the Settings default. Persistent defaults
|
||||||
|
// live in Settings → BYOK → SenseAudio → Image generation model.
|
||||||
|
const [byokImageModelOverride, setByokImageModelOverride] = useState<string>(
|
||||||
|
config.byokImageModel ?? '',
|
||||||
|
);
|
||||||
// `closed` → no surface; `review` → read-only saved-state panel with a
|
// `closed` → no surface; `review` → read-only saved-state panel with a
|
||||||
// preview + reopen-to-edit action (#1822); `edit` → the textarea editor.
|
// preview + reopen-to-edit action (#1822); `edit` → the textarea editor.
|
||||||
const [instructionsMode, setInstructionsMode] = useState<'closed' | 'review' | 'edit'>('closed');
|
const [instructionsMode, setInstructionsMode] = useState<'closed' | 'review' | 'edit'>('closed');
|
||||||
|
|
@ -2202,6 +2211,13 @@ export function ProjectView({
|
||||||
});
|
});
|
||||||
},
|
},
|
||||||
onError: handlers.onError,
|
onError: handlers.onError,
|
||||||
|
}, {
|
||||||
|
projectId: project.id,
|
||||||
|
// SenseAudio BYOK chat reads this to pre-fill the tool param's
|
||||||
|
// default model. Prefer the live composer override; fall back
|
||||||
|
// to the Settings default when the composer dropdown is on
|
||||||
|
// "use default". Other protocols ignore unknown body fields.
|
||||||
|
byokImageModel: byokImageModelOverride || config.byokImageModel,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
@ -3375,6 +3391,9 @@ export function ProjectView({
|
||||||
onTogglePet={onTogglePet}
|
onTogglePet={onTogglePet}
|
||||||
onOpenPetSettings={onOpenPetSettings}
|
onOpenPetSettings={onOpenPetSettings}
|
||||||
researchAvailable={config.mode === 'daemon'}
|
researchAvailable={config.mode === 'daemon'}
|
||||||
|
byokApiProtocol={config.apiProtocol}
|
||||||
|
byokImageModel={byokImageModelOverride}
|
||||||
|
onChangeByokImageModel={setByokImageModelOverride}
|
||||||
projectMetadata={project.metadata}
|
projectMetadata={project.metadata}
|
||||||
onProjectMetadataChange={(metadata) => {
|
onProjectMetadataChange={(metadata) => {
|
||||||
onProjectChange({ ...project, metadata });
|
onProjectChange({ ...project, metadata });
|
||||||
|
|
|
||||||
|
|
@ -68,7 +68,7 @@ import type {
|
||||||
import { testAgent, testApiProvider } from '../providers/connection-test';
|
import { testAgent, testApiProvider } from '../providers/connection-test';
|
||||||
import { fetchProviderModels } from '../providers/provider-models';
|
import { fetchProviderModels } from '../providers/provider-models';
|
||||||
import { fetchConnectors, fetchDesignTemplates } from '../providers/registry';
|
import { fetchConnectors, fetchDesignTemplates } from '../providers/registry';
|
||||||
import { MEDIA_PROVIDERS } from '../media/models';
|
import { IMAGE_MODELS, MEDIA_PROVIDERS } from '../media/models';
|
||||||
import { XaiOAuthControl } from './XaiOAuthControl';
|
import { XaiOAuthControl } from './XaiOAuthControl';
|
||||||
import type { MediaProvider } from '../media/models';
|
import type { MediaProvider } from '../media/models';
|
||||||
import { Toast } from './Toast';
|
import { Toast } from './Toast';
|
||||||
|
|
@ -444,6 +444,7 @@ function currentApiProtocolConfig(config: AppConfig): ApiProtocolConfig {
|
||||||
model: config.model,
|
model: config.model,
|
||||||
apiVersion: config.apiVersion ?? '',
|
apiVersion: config.apiVersion ?? '',
|
||||||
apiProviderBaseUrl: config.apiProviderBaseUrl ?? null,
|
apiProviderBaseUrl: config.apiProviderBaseUrl ?? null,
|
||||||
|
byokImageModel: config.byokImageModel ?? '',
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -460,6 +461,11 @@ function applyApiProtocolConfig(
|
||||||
model: apiConfig.model,
|
model: apiConfig.model,
|
||||||
apiProviderBaseUrl: apiConfig.apiProviderBaseUrl ?? null,
|
apiProviderBaseUrl: apiConfig.apiProviderBaseUrl ?? null,
|
||||||
apiVersion: protocol === 'azure' ? (apiConfig.apiVersion ?? '') : '',
|
apiVersion: protocol === 'azure' ? (apiConfig.apiVersion ?? '') : '',
|
||||||
|
// byokImageModel is SenseAudio-only — flipping to another BYOK tab
|
||||||
|
// shouldn't carry a SenseAudio image-model choice into, say, the
|
||||||
|
// OpenAI form. Mirrors the apiVersion guarding above.
|
||||||
|
byokImageModel:
|
||||||
|
protocol === 'senseaudio' ? (apiConfig.byokImageModel ?? '') : '',
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -2683,6 +2689,34 @@ export function SettingsDialog({
|
||||||
/>
|
/>
|
||||||
</label>
|
</label>
|
||||||
) : null}
|
) : null}
|
||||||
|
{apiProtocol === 'senseaudio' ? (
|
||||||
|
<label className="field">
|
||||||
|
<span className="field-label">{t('settings.byokImageModel')}</span>
|
||||||
|
<select
|
||||||
|
value={cfg.byokImageModel ?? ''}
|
||||||
|
onChange={(e) =>
|
||||||
|
updateApiConfig({ byokImageModel: e.target.value })
|
||||||
|
}
|
||||||
|
>
|
||||||
|
{/* Default-empty option resolves to the registry default
|
||||||
|
on the daemon side (senseaudio-image-2.0-260319 today).
|
||||||
|
Listing it explicitly lets the picker show what the
|
||||||
|
unconfigured state actually means. */}
|
||||||
|
<option value="">
|
||||||
|
{IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
|
||||||
|
?? 'senseaudio-image-2.0'}
|
||||||
|
{' (default)'}
|
||||||
|
</option>
|
||||||
|
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
|
||||||
|
(m) => (
|
||||||
|
<option key={m.id} value={m.id}>
|
||||||
|
{m.label}
|
||||||
|
</option>
|
||||||
|
),
|
||||||
|
)}
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
) : null}
|
||||||
<p className="hint">{t('settings.apiHint')}</p>
|
<p className="hint">{t('settings.apiHint')}</p>
|
||||||
</section>
|
</section>
|
||||||
)}
|
)}
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const ar: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'في Azure OpenAI، يُستخدم هذا الحقل كاسم النشر في /openai/deployments/<model>. أدخل اسم النشر الذي أنشأته في Azure.',
|
'في Azure OpenAI، يُستخدم هذا الحقل كاسم النشر في /openai/deployments/<model>. أدخل اسم النشر الذي أنشأته في Azure.',
|
||||||
'settings.apiVersion': 'إصدار API',
|
'settings.apiVersion': 'إصدار API',
|
||||||
|
'settings.byokImageModel': 'نموذج إنشاء الصور',
|
||||||
'settings.maxTokens': 'أقصى عدد من الرموز (اختياري)',
|
'settings.maxTokens': 'أقصى عدد من الرموز (اختياري)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'الحد الأقصى لطول الاستجابة. لكل نموذج قيمة افتراضية؛ اتركها فارغة لاستخدامها، أو أدخل رقماً للتجاوز.',
|
'الحد الأقصى لطول الاستجابة. لكل نموذج قيمة افتراضية؛ اتركها فارغة لاستخدامها، أو أدخل رقماً للتجاوز.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const de: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Fuer Azure OpenAI wird dieses Feld als Deployment-Name in /openai/deployments/<model> verwendet. Geben Sie den in Azure angelegten Deployment-Namen ein.',
|
'Fuer Azure OpenAI wird dieses Feld als Deployment-Name in /openai/deployments/<model> verwendet. Geben Sie den in Azure angelegten Deployment-Namen ein.',
|
||||||
'settings.apiVersion': 'API-Version',
|
'settings.apiVersion': 'API-Version',
|
||||||
|
'settings.byokImageModel': 'Bilderzeugungsmodell',
|
||||||
'settings.maxTokens': 'Max. Tokens (optional)',
|
'settings.maxTokens': 'Max. Tokens (optional)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Obergrenze für die Antwortlänge. Jedes Modell hat einen abgestimmten Standardwert (im Platzhalter sichtbar); leer lassen, um ihn zu verwenden, oder eine Zahl eingeben, um ihn zu überschreiben.',
|
'Obergrenze für die Antwortlänge. Jedes Modell hat einen abgestimmten Standardwert (im Platzhalter sichtbar); leer lassen, um ihn zu verwenden, oder eine Zahl eingeben, um ihn zu überschreiben.',
|
||||||
|
|
|
||||||
|
|
@ -227,6 +227,7 @@ export const en: Dict = {
|
||||||
'settings.azureModelFetchHint':
|
'settings.azureModelFetchHint':
|
||||||
'For Azure OpenAI, enter the deployment name you created in Azure. Automatic deployment discovery is not available from this BYOK endpoint.',
|
'For Azure OpenAI, enter the deployment name you created in Azure. Automatic deployment discovery is not available from this BYOK endpoint.',
|
||||||
'settings.apiVersion': 'API version',
|
'settings.apiVersion': 'API version',
|
||||||
|
'settings.byokImageModel': 'Image generation model',
|
||||||
'settings.maxTokens': 'Max tokens (optional)',
|
'settings.maxTokens': 'Max tokens (optional)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Cap on the response length. Each model has a tuned default (shown as a placeholder); leave blank to use it, or enter a number to override.',
|
'Cap on the response length. Each model has a tuned default (shown as a placeholder); leave blank to use it, or enter a number to override.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const esES: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Para Azure OpenAI, este campo se usa como nombre del despliegue en /openai/deployments/<model>. Introduce el nombre del despliegue que creaste en Azure.',
|
'Para Azure OpenAI, este campo se usa como nombre del despliegue en /openai/deployments/<model>. Introduce el nombre del despliegue que creaste en Azure.',
|
||||||
'settings.apiVersion': 'Versión de API',
|
'settings.apiVersion': 'Versión de API',
|
||||||
|
'settings.byokImageModel': 'Modelo de generación de imágenes',
|
||||||
'settings.maxTokens': 'Tokens máx. (opcional)',
|
'settings.maxTokens': 'Tokens máx. (opcional)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Tope para la longitud de la respuesta. Cada modelo tiene un valor por defecto ajustado (visible en el placeholder); déjalo vacío para usarlo o introduce un número para anularlo.',
|
'Tope para la longitud de la respuesta. Cada modelo tiene un valor por defecto ajustado (visible en el placeholder); déjalo vacío para usarlo o introduce un número para anularlo.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const fa: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'در Azure OpenAI، این فیلد به عنوان نام استقرار در /openai/deployments/<model> استفاده میشود. نام استقراری را که در Azure ساختهاید وارد کنید.',
|
'در Azure OpenAI، این فیلد به عنوان نام استقرار در /openai/deployments/<model> استفاده میشود. نام استقراری را که در Azure ساختهاید وارد کنید.',
|
||||||
'settings.apiVersion': 'نسخه API',
|
'settings.apiVersion': 'نسخه API',
|
||||||
|
'settings.byokImageModel': 'مدل تولید تصویر',
|
||||||
'settings.maxTokens': 'حداکثر توکن (اختیاری)',
|
'settings.maxTokens': 'حداکثر توکن (اختیاری)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'سقف طول پاسخ. هر مدل مقدار پیشفرض تنظیمشدهٔ خود را دارد (در placeholder نمایش داده میشود)؛ برای استفاده از آن خالی بگذارید، یا برای جایگزینی، عددی وارد کنید.',
|
'سقف طول پاسخ. هر مدل مقدار پیشفرض تنظیمشدهٔ خود را دارد (در placeholder نمایش داده میشود)؛ برای استفاده از آن خالی بگذارید، یا برای جایگزینی، عددی وارد کنید.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const fr: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Pour Azure OpenAI, ce champ est utilisé comme nom du déploiement dans /openai/deployments/<model>. Saisissez le nom du déploiement créé dans Azure.',
|
'Pour Azure OpenAI, ce champ est utilisé comme nom du déploiement dans /openai/deployments/<model>. Saisissez le nom du déploiement créé dans Azure.',
|
||||||
'settings.apiVersion': 'Version API',
|
'settings.apiVersion': 'Version API',
|
||||||
|
'settings.byokImageModel': "Modèle de génération d'images",
|
||||||
'settings.maxTokens': 'Tokens max (optionnel)',
|
'settings.maxTokens': 'Tokens max (optionnel)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Limite de la longueur de réponse. Chaque modèle a une valeur par défaut (affichée à titre indicatif) ; laissez vide pour l\'utiliser, ou entrez un nombre pour la remplacer.',
|
'Limite de la longueur de réponse. Chaque modèle a une valeur par défaut (affichée à titre indicatif) ; laissez vide pour l\'utiliser, ou entrez un nombre pour la remplacer.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const hu: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Azure OpenAI esetén ez a mező a /openai/deployments/<model> deployment neveként szerepel. Add meg az Azure-ban létrehozott deployment nevét.',
|
'Azure OpenAI esetén ez a mező a /openai/deployments/<model> deployment neveként szerepel. Add meg az Azure-ban létrehozott deployment nevét.',
|
||||||
'settings.apiVersion': 'API-verzió',
|
'settings.apiVersion': 'API-verzió',
|
||||||
|
'settings.byokImageModel': 'Képgenerálási modell',
|
||||||
'settings.maxTokens': 'Max tokenek (opcionális)',
|
'settings.maxTokens': 'Max tokenek (opcionális)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'A válasz hosszának felső határa. Minden modellnek van hangolt alapértelmezése (placeholderként látható); hagyd üresen az alkalmazásához, vagy adj meg számot a felülíráshoz.',
|
'A válasz hosszának felső határa. Minden modellnek van hangolt alapértelmezése (placeholderként látható); hagyd üresen az alkalmazásához, vagy adj meg számot a felülíráshoz.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const id: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Untuk Azure OpenAI, field ini digunakan sebagai nama deployment di /openai/deployments/<model>. Masukkan nama deployment yang kamu buat di Azure.',
|
'Untuk Azure OpenAI, field ini digunakan sebagai nama deployment di /openai/deployments/<model>. Masukkan nama deployment yang kamu buat di Azure.',
|
||||||
'settings.apiVersion': 'Versi API',
|
'settings.apiVersion': 'Versi API',
|
||||||
|
'settings.byokImageModel': 'Model pembuatan gambar',
|
||||||
'settings.maxTokens': 'Token maks (opsional)',
|
'settings.maxTokens': 'Token maks (opsional)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Batas panjang respons. Setiap model punya default sendiri; kosongkan untuk memakainya, atau isi angka untuk menimpa.',
|
'Batas panjang respons. Setiap model punya default sendiri; kosongkan untuk memakainya, atau isi angka untuk menimpa.',
|
||||||
|
|
|
||||||
|
|
@ -199,6 +199,7 @@ export const it: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Per Azure OpenAI, questo campo viene utilizzato come nome del deployment in /openai/deployments/<model>. Inserisci il nome del deployment creato in Azure.',
|
'Per Azure OpenAI, questo campo viene utilizzato come nome del deployment in /openai/deployments/<model>. Inserisci il nome del deployment creato in Azure.',
|
||||||
'settings.apiVersion': 'Versione API',
|
'settings.apiVersion': 'Versione API',
|
||||||
|
'settings.byokImageModel': 'Modello di generazione immagini',
|
||||||
'settings.maxTokens': 'Token massimi (opzionale)',
|
'settings.maxTokens': 'Token massimi (opzionale)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Limite della lunghezza della risposta. Ogni modello ha un valore predefinito (mostrato nel placeholder); lascia vuoto per usarlo, o inserisci un numero per sostituirlo.',
|
'Limite della lunghezza della risposta. Ogni modello ha un valore predefinito (mostrato nel placeholder); lascia vuoto per usarlo, o inserisci un numero per sostituirlo.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const ja: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Azure OpenAI では、このフィールドが /openai/deployments/<model> のデプロイ名として使われます。Azure で作成したデプロイ名を入力してください。',
|
'Azure OpenAI では、このフィールドが /openai/deployments/<model> のデプロイ名として使われます。Azure で作成したデプロイ名を入力してください。',
|
||||||
'settings.apiVersion': 'API バージョン',
|
'settings.apiVersion': 'API バージョン',
|
||||||
|
'settings.byokImageModel': '画像生成モデル',
|
||||||
'settings.maxTokens': '最大トークン(任意)',
|
'settings.maxTokens': '最大トークン(任意)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'応答長の上限。各モデルにチューニング済みのデフォルト値があります(プレースホルダーに表示)。空のままにすればそれを使用し、数値を入力すれば上書きされます。',
|
'応答長の上限。各モデルにチューニング済みのデフォルト値があります(プレースホルダーに表示)。空のままにすればそれを使用し、数値を入力すれば上書きされます。',
|
||||||
|
|
|
||||||
|
|
@ -205,6 +205,7 @@ export const ko: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Azure OpenAI에서는 이 필드가 /openai/deployments/<model>의 배포 이름으로 사용됩니다. Azure에서 만든 배포 이름을 입력하세요.',
|
'Azure OpenAI에서는 이 필드가 /openai/deployments/<model>의 배포 이름으로 사용됩니다. Azure에서 만든 배포 이름을 입력하세요.',
|
||||||
'settings.apiVersion': 'API 버전',
|
'settings.apiVersion': 'API 버전',
|
||||||
|
'settings.byokImageModel': '이미지 생성 모델',
|
||||||
'settings.apiHint': '요청은 로컬 daemon 프록시를 통해 설정한 Base URL로 전송됩니다. 키는 이 브라우저에만 저장되며 제공자 요청과 함께 전송됩니다.',
|
'settings.apiHint': '요청은 로컬 daemon 프록시를 통해 설정한 Base URL로 전송됩니다. 키는 이 브라우저에만 저장되며 제공자 요청과 함께 전송됩니다.',
|
||||||
'settings.skipForNow': '지금은 건너뛰기',
|
'settings.skipForNow': '지금은 건너뛰기',
|
||||||
'settings.getStarted': '시작하기',
|
'settings.getStarted': '시작하기',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const pl: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Dla Azure OpenAI to pole jest używane jako nazwa wdrożenia w /openai/deployments/<model>. Wpisz nazwę wdrożenia utworzonego w Azure.',
|
'Dla Azure OpenAI to pole jest używane jako nazwa wdrożenia w /openai/deployments/<model>. Wpisz nazwę wdrożenia utworzonego w Azure.',
|
||||||
'settings.apiVersion': 'Wersja API',
|
'settings.apiVersion': 'Wersja API',
|
||||||
|
'settings.byokImageModel': 'Model generowania obrazów',
|
||||||
'settings.maxTokens': 'Maks. liczba tokenów (opcjonalnie)',
|
'settings.maxTokens': 'Maks. liczba tokenów (opcjonalnie)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Limit długości odpowiedzi. Każdy model ma dostrojony domyślny limit (widoczny jako placeholder); pozostaw puste, aby go użyć, lub wpisz liczbę.',
|
'Limit długości odpowiedzi. Każdy model ma dostrojony domyślny limit (widoczny jako placeholder); pozostaw puste, aby go użyć, lub wpisz liczbę.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const ptBR: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'No Azure OpenAI, este campo e usado como nome do deployment em /openai/deployments/<model>. Informe o nome do deployment criado no Azure.',
|
'No Azure OpenAI, este campo e usado como nome do deployment em /openai/deployments/<model>. Informe o nome do deployment criado no Azure.',
|
||||||
'settings.apiVersion': 'Versão da API',
|
'settings.apiVersion': 'Versão da API',
|
||||||
|
'settings.byokImageModel': 'Modelo de geração de imagens',
|
||||||
'settings.maxTokens': 'Tokens máx. (opcional)',
|
'settings.maxTokens': 'Tokens máx. (opcional)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Limite para o comprimento da resposta. Cada modelo tem um valor padrão ajustado (visível no placeholder); deixe em branco para usá-lo ou insira um número para substituí-lo.',
|
'Limite para o comprimento da resposta. Cada modelo tem um valor padrão ajustado (visível no placeholder); deixe em branco para usá-lo ou insira um número para substituí-lo.',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const ru: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Для Azure OpenAI это поле используется как имя развертывания в /openai/deployments/<model>. Укажите имя развертывания, созданного в Azure.',
|
'Для Azure OpenAI это поле используется как имя развертывания в /openai/deployments/<model>. Укажите имя развертывания, созданного в Azure.',
|
||||||
'settings.apiVersion': 'Версия API',
|
'settings.apiVersion': 'Версия API',
|
||||||
|
'settings.byokImageModel': 'Модель генерации изображений',
|
||||||
'settings.maxTokens': 'Макс. токенов (опционально)',
|
'settings.maxTokens': 'Макс. токенов (опционально)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Ограничение длины ответа. У каждой модели свой настроенный дефолт (виден в плейсхолдере); оставьте поле пустым, чтобы использовать его, или введите число, чтобы переопределить.',
|
'Ограничение длины ответа. У каждой модели свой настроенный дефолт (виден в плейсхолдере); оставьте поле пустым, чтобы использовать его, или введите число, чтобы переопределить.',
|
||||||
|
|
|
||||||
|
|
@ -198,6 +198,7 @@ export const th: Dict = {
|
||||||
'settings.azureDeploymentModel': 'ชื่อ Deployment',
|
'settings.azureDeploymentModel': 'ชื่อ Deployment',
|
||||||
'settings.azureDeploymentModelHint': 'สำหรับ Azure OpenAI ฟิลด์นี้ใช้เป็นชื่อ Deployment ใน /openai/deployments/<model> ป้อนชื่อ Deployment ที่คุณสร้างใน Azure',
|
'settings.azureDeploymentModelHint': 'สำหรับ Azure OpenAI ฟิลด์นี้ใช้เป็นชื่อ Deployment ใน /openai/deployments/<model> ป้อนชื่อ Deployment ที่คุณสร้างใน Azure',
|
||||||
'settings.apiVersion': 'เวอร์ชัน API',
|
'settings.apiVersion': 'เวอร์ชัน API',
|
||||||
|
'settings.byokImageModel': 'โมเดลสร้างภาพ',
|
||||||
'settings.maxTokens': 'Max tokens (เลือกได้)',
|
'settings.maxTokens': 'Max tokens (เลือกได้)',
|
||||||
'settings.maxTokensHint': 'ขีดจำกัดความยาวในการตอบกลับ',
|
'settings.maxTokensHint': 'ขีดจำกัดความยาวในการตอบกลับ',
|
||||||
'settings.apiHint': 'คำสั่งจะถูกส่งผ่าน local daemon proxy ไปยัง base URL ที่คุณตั้งไว้ API Key จะถูกเก็บในเบราว์เซอร์นี้เท่านั้น',
|
'settings.apiHint': 'คำสั่งจะถูกส่งผ่าน local daemon proxy ไปยัง base URL ที่คุณตั้งไว้ API Key จะถูกเก็บในเบราว์เซอร์นี้เท่านั้น',
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ export const tr: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Azure OpenAI icin bu alan /openai/deployments/<model> icindeki dagitim adi olarak kullanilir. Azureda olusturdugunuz dagitim adini girin.',
|
'Azure OpenAI icin bu alan /openai/deployments/<model> icindeki dagitim adi olarak kullanilir. Azureda olusturdugunuz dagitim adini girin.',
|
||||||
'settings.apiVersion': 'API sürümü',
|
'settings.apiVersion': 'API sürümü',
|
||||||
|
'settings.byokImageModel': 'Görüntü oluşturma modeli',
|
||||||
'settings.maxTokens': 'Maks. token (isteğe bağlı)',
|
'settings.maxTokens': 'Maks. token (isteğe bağlı)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Yanıt uzunluğu sınırı. Her modelin ayarlanmış bir varsayılanı vardır (yer tutucuda görünür); kullanmak için boş bırakın, üzerine yazmak için bir sayı girin.',
|
'Yanıt uzunluğu sınırı. Her modelin ayarlanmış bir varsayılanı vardır (yer tutucuda görünür); kullanmak için boş bırakın, üzerine yazmak için bir sayı girin.',
|
||||||
|
|
|
||||||
|
|
@ -203,6 +203,7 @@ export const uk: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'Для Azure OpenAI це поле використовується як назва розгортання в /openai/deployments/<model>. Введіть назву розгортання, створену в Azure.',
|
'Для Azure OpenAI це поле використовується як назва розгортання в /openai/deployments/<model>. Введіть назву розгортання, створену в Azure.',
|
||||||
'settings.apiVersion': 'Версія API',
|
'settings.apiVersion': 'Версія API',
|
||||||
|
'settings.byokImageModel': 'Модель генерації зображень',
|
||||||
'settings.maxTokens': 'Макс. токенів (необов\'язково)',
|
'settings.maxTokens': 'Макс. токенів (необов\'язково)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'Обмеження на довжину відповіді. Кожна модель має налаштовану за замовчуванням (показано в заповнювачі); залиште поле порожнім, щоб використовувати її, або введіть число, щоб переопрацювати.',
|
'Обмеження на довжину відповіді. Кожна модель має налаштовану за замовчуванням (показано в заповнювачі); залиште поле порожнім, щоб використовувати її, або введіть число, щоб переопрацювати.',
|
||||||
|
|
|
||||||
|
|
@ -227,6 +227,7 @@ export const zhCN: Dict = {
|
||||||
'settings.azureModelFetchHint':
|
'settings.azureModelFetchHint':
|
||||||
'对于 Azure OpenAI,请填写你在 Azure 中创建的部署名称。当前 BYOK 端点无法自动发现 deployment。',
|
'对于 Azure OpenAI,请填写你在 Azure 中创建的部署名称。当前 BYOK 端点无法自动发现 deployment。',
|
||||||
'settings.apiVersion': 'API 版本',
|
'settings.apiVersion': 'API 版本',
|
||||||
|
'settings.byokImageModel': '图片生成模型',
|
||||||
'settings.maxTokens': '最大 tokens(可选)',
|
'settings.maxTokens': '最大 tokens(可选)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'响应长度上限。每个 model 有调优过的默认值(在 placeholder 里显示),留空即使用,输入数字则覆盖。',
|
'响应长度上限。每个 model 有调优过的默认值(在 placeholder 里显示),留空即使用,输入数字则覆盖。',
|
||||||
|
|
|
||||||
|
|
@ -201,6 +201,7 @@ export const zhTW: Dict = {
|
||||||
'settings.azureDeploymentModelHint':
|
'settings.azureDeploymentModelHint':
|
||||||
'對於 Azure OpenAI,此欄位會作為 /openai/deployments/<model> 中的部署名稱使用。請填入你在 Azure 中建立的部署名稱。',
|
'對於 Azure OpenAI,此欄位會作為 /openai/deployments/<model> 中的部署名稱使用。請填入你在 Azure 中建立的部署名稱。',
|
||||||
'settings.apiVersion': 'API 版本',
|
'settings.apiVersion': 'API 版本',
|
||||||
|
'settings.byokImageModel': '圖片生成模型',
|
||||||
'settings.maxTokens': '最大 tokens(可選)',
|
'settings.maxTokens': '最大 tokens(可選)',
|
||||||
'settings.maxTokensHint':
|
'settings.maxTokensHint':
|
||||||
'回應長度上限。每個 model 有調過的預設值(在 placeholder 顯示),留空即使用,輸入數字則覆蓋。',
|
'回應長度上限。每個 model 有調過的預設值(在 placeholder 顯示),留空即使用,輸入數字則覆蓋。',
|
||||||
|
|
|
||||||
|
|
@ -252,6 +252,7 @@ export interface Dict {
|
||||||
'settings.azureDeploymentModelHint': string;
|
'settings.azureDeploymentModelHint': string;
|
||||||
'settings.azureModelFetchHint': string;
|
'settings.azureModelFetchHint': string;
|
||||||
'settings.apiVersion': string;
|
'settings.apiVersion': string;
|
||||||
|
'settings.byokImageModel': string;
|
||||||
'settings.apiHint': string;
|
'settings.apiHint': string;
|
||||||
'settings.skipForNow': string;
|
'settings.skipForNow': string;
|
||||||
'settings.getStarted': string;
|
'settings.getStarted': string;
|
||||||
|
|
|
||||||
|
|
@ -234,7 +234,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
|
||||||
{
|
{
|
||||||
id: 'senseaudio',
|
id: 'senseaudio',
|
||||||
label: 'SenseAudio',
|
label: 'SenseAudio',
|
||||||
hint: 'TTS · 70+ system voices · clone',
|
hint: '',
|
||||||
integrated: true,
|
integrated: true,
|
||||||
defaultBaseUrl: 'https://api.senseaudio.cn',
|
defaultBaseUrl: 'https://api.senseaudio.cn',
|
||||||
docsUrl: 'https://docs.senseaudio.cn',
|
docsUrl: 'https://docs.senseaudio.cn',
|
||||||
|
|
@ -344,6 +344,29 @@ export const IMAGE_MODELS: MediaModel[] = [
|
||||||
caps: ['i2i'],
|
caps: ['i2i'],
|
||||||
},
|
},
|
||||||
|
|
||||||
|
// SenseAudio — synchronous /v1/image/sync, Bearer auth, reference URL or data URI.
|
||||||
|
{
|
||||||
|
id: 'senseaudio-image-2.0-260319',
|
||||||
|
label: 'senseaudio-image-2.0',
|
||||||
|
hint: 'SenseAudio · multi-aspect, latest',
|
||||||
|
provider: 'senseaudio',
|
||||||
|
caps: ['t2i', 'i2i'],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'senseaudio-image-1.0-260319',
|
||||||
|
label: 'senseaudio-image-1.0',
|
||||||
|
hint: 'SenseAudio · standard',
|
||||||
|
provider: 'senseaudio',
|
||||||
|
caps: ['t2i', 'i2i'],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'doubao-seedream-5-0-260128',
|
||||||
|
label: 'seedream-5.0',
|
||||||
|
hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res',
|
||||||
|
provider: 'senseaudio',
|
||||||
|
caps: ['t2i', 'i2i'],
|
||||||
|
},
|
||||||
|
|
||||||
// xAI Grok Imagine — text-to-image (1k/2k, 11+ aspect ratios).
|
// xAI Grok Imagine — text-to-image (1k/2k, 11+ aspect ratios).
|
||||||
{
|
{
|
||||||
id: 'grok-imagine-image',
|
id: 'grok-imagine-image',
|
||||||
|
|
|
||||||
|
|
@ -11,10 +11,12 @@ import Anthropic from '@anthropic-ai/sdk';
|
||||||
import { effectiveMaxTokens } from '../state/maxTokens';
|
import { effectiveMaxTokens } from '../state/maxTokens';
|
||||||
import type { AppConfig, ChatMessage } from '../types';
|
import type { AppConfig, ChatMessage } from '../types';
|
||||||
import { streamMessageAnthropicProxy } from './anthropic-compatible';
|
import { streamMessageAnthropicProxy } from './anthropic-compatible';
|
||||||
|
import type { ProxyContext } from './api-proxy';
|
||||||
import { streamMessageAzure } from './azure-compatible';
|
import { streamMessageAzure } from './azure-compatible';
|
||||||
import { streamMessageGoogle } from './google-compatible';
|
import { streamMessageGoogle } from './google-compatible';
|
||||||
import { streamMessageOllama } from './ollama-compatible';
|
import { streamMessageOllama } from './ollama-compatible';
|
||||||
import { isOpenAICompatible, streamMessageOpenAI } from './openai-compatible';
|
import { isOpenAICompatible, streamMessageOpenAI } from './openai-compatible';
|
||||||
|
import { streamMessageSenseAudio } from './senseaudio-compatible';
|
||||||
|
|
||||||
// Re-export for convenience
|
// Re-export for convenience
|
||||||
export { isOpenAICompatible } from './openai-compatible';
|
export { isOpenAICompatible } from './openai-compatible';
|
||||||
|
|
@ -39,6 +41,12 @@ export async function streamMessage(
|
||||||
history: ChatMessage[],
|
history: ChatMessage[],
|
||||||
signal: AbortSignal,
|
signal: AbortSignal,
|
||||||
handlers: StreamHandlers,
|
handlers: StreamHandlers,
|
||||||
|
// Only the senseaudio branch reads `context.projectId` today (so the
|
||||||
|
// daemon-side `generate_image` tool can write into the active
|
||||||
|
// project's folder). Other branches accept and ignore — keeping the
|
||||||
|
// signature uniform means the single call site in ProjectView passes
|
||||||
|
// the same shape regardless of protocol.
|
||||||
|
context?: ProxyContext,
|
||||||
): Promise<void> {
|
): Promise<void> {
|
||||||
// Prefer the explicit Settings protocol; keep the legacy heuristic as a
|
// Prefer the explicit Settings protocol; keep the legacy heuristic as a
|
||||||
// fallback for configs saved before apiProtocol existed.
|
// fallback for configs saved before apiProtocol existed.
|
||||||
|
|
@ -51,6 +59,9 @@ export async function streamMessage(
|
||||||
if (cfg.apiProtocol === 'google') {
|
if (cfg.apiProtocol === 'google') {
|
||||||
return streamMessageGoogle(cfg, system, history, signal, handlers);
|
return streamMessageGoogle(cfg, system, history, signal, handlers);
|
||||||
}
|
}
|
||||||
|
if (cfg.apiProtocol === 'senseaudio') {
|
||||||
|
return streamMessageSenseAudio(cfg, system, history, signal, handlers, context);
|
||||||
|
}
|
||||||
if (cfg.apiProtocol === 'openai' || (!cfg.apiProtocol && isOpenAICompatible(cfg.model, cfg.baseUrl))) {
|
if (cfg.apiProtocol === 'openai' || (!cfg.apiProtocol && isOpenAICompatible(cfg.model, cfg.baseUrl))) {
|
||||||
return streamMessageOpenAI(cfg, system, history, signal, handlers);
|
return streamMessageOpenAI(cfg, system, history, signal, handlers);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -3,6 +3,22 @@ import type { AppConfig, ChatMessage } from '../types';
|
||||||
import type { StreamHandlers } from './anthropic';
|
import type { StreamHandlers } from './anthropic';
|
||||||
import { parseSseFrame } from './sse';
|
import { parseSseFrame } from './sse';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Optional per-request context that some protocols thread into the
|
||||||
|
* proxy body. Today only the senseaudio proxy reads these fields:
|
||||||
|
* - `projectId` lets the `generate_image` tool write into the active
|
||||||
|
* project's folder instead of a daemon-global cache.
|
||||||
|
* - `byokImageModel` is the user's BYOK Settings default for the
|
||||||
|
* image tool. The LLM can still override per-call via the tool's
|
||||||
|
* `model` arg; this is just the fallback when it omits one.
|
||||||
|
* Other protocols ignore unknown body fields, so callers are free to
|
||||||
|
* pass this for every protocol.
|
||||||
|
*/
|
||||||
|
export interface ProxyContext {
|
||||||
|
projectId?: string;
|
||||||
|
byokImageModel?: string;
|
||||||
|
}
|
||||||
|
|
||||||
export async function streamProxyEndpoint(
|
export async function streamProxyEndpoint(
|
||||||
endpoint: string,
|
endpoint: string,
|
||||||
cfg: AppConfig,
|
cfg: AppConfig,
|
||||||
|
|
@ -10,6 +26,7 @@ export async function streamProxyEndpoint(
|
||||||
history: ChatMessage[],
|
history: ChatMessage[],
|
||||||
signal: AbortSignal,
|
signal: AbortSignal,
|
||||||
handlers: StreamHandlers,
|
handlers: StreamHandlers,
|
||||||
|
context?: ProxyContext,
|
||||||
): Promise<void> {
|
): Promise<void> {
|
||||||
if (!cfg.apiKey) {
|
if (!cfg.apiKey) {
|
||||||
handlers.onError(new Error('Missing API key — open Settings and paste one in.'));
|
handlers.onError(new Error('Missing API key — open Settings and paste one in.'));
|
||||||
|
|
@ -30,6 +47,10 @@ export async function streamProxyEndpoint(
|
||||||
messages: history.map((m) => ({ role: m.role, content: m.content })),
|
messages: history.map((m) => ({ role: m.role, content: m.content })),
|
||||||
maxTokens: effectiveMaxTokens(cfg),
|
maxTokens: effectiveMaxTokens(cfg),
|
||||||
apiVersion: cfg.apiVersion,
|
apiVersion: cfg.apiVersion,
|
||||||
|
...(context?.projectId ? { projectId: context.projectId } : {}),
|
||||||
|
...(context?.byokImageModel
|
||||||
|
? { byokImageModel: context.byokImageModel }
|
||||||
|
: {}),
|
||||||
}),
|
}),
|
||||||
signal,
|
signal,
|
||||||
});
|
});
|
||||||
|
|
|
||||||
33
apps/web/src/providers/senseaudio-compatible.ts
Normal file
33
apps/web/src/providers/senseaudio-compatible.ts
Normal file
|
|
@ -0,0 +1,33 @@
|
||||||
|
/**
|
||||||
|
* SenseAudio chat completions provider. Wire-compatible with OpenAI
|
||||||
|
* (POST /v1/chat/completions, Bearer auth, SSE delta frames + [DONE]),
|
||||||
|
* so the only thing that differs from streamMessageOpenAI is the
|
||||||
|
* daemon proxy endpoint — keeping a dedicated client makes the picker
|
||||||
|
* tab → daemon log line → upstream call chain readable end-to-end and
|
||||||
|
* leaves room for SenseAudio-specific divergence in the future.
|
||||||
|
*
|
||||||
|
* Routes through the daemon proxy to avoid browser CORS issues.
|
||||||
|
* BYOK — the key stays on the user's machine.
|
||||||
|
*/
|
||||||
|
import type { AppConfig, ChatMessage } from '../types';
|
||||||
|
import type { StreamHandlers } from './anthropic';
|
||||||
|
import { streamProxyEndpoint, type ProxyContext } from './api-proxy';
|
||||||
|
|
||||||
|
export async function streamMessageSenseAudio(
|
||||||
|
cfg: AppConfig,
|
||||||
|
system: string,
|
||||||
|
history: ChatMessage[],
|
||||||
|
signal: AbortSignal,
|
||||||
|
handlers: StreamHandlers,
|
||||||
|
context?: ProxyContext,
|
||||||
|
): Promise<void> {
|
||||||
|
return streamProxyEndpoint(
|
||||||
|
'/api/proxy/senseaudio/stream',
|
||||||
|
cfg,
|
||||||
|
system,
|
||||||
|
history,
|
||||||
|
signal,
|
||||||
|
handlers,
|
||||||
|
context,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
@ -262,6 +262,24 @@ function renderBlock(block: Block, key: number): ReactNode {
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Allowed schemes / forms for image `src` attributes. The BYOK chat
|
||||||
|
// tool loop emits relative URLs like `/api/byok-image/<id>.png` which
|
||||||
|
// the web's Next.js rewrites proxy to the daemon — that's the common
|
||||||
|
// case. data: + blob: cover inline / generated images. http(s):// is
|
||||||
|
// allowed so a model can reference public images. Anything else
|
||||||
|
// (javascript:, file:, vbscript:, …) is rejected so a hallucinated
|
||||||
|
// or adversarial URL cannot exfiltrate or execute.
|
||||||
|
function isSafeMarkdownImageSrc(src: string): boolean {
|
||||||
|
if (!src) return false;
|
||||||
|
if (src.startsWith('/') && !src.startsWith('//')) return true;
|
||||||
|
return (
|
||||||
|
src.startsWith('http://')
|
||||||
|
|| src.startsWith('https://')
|
||||||
|
|| src.startsWith('data:image/')
|
||||||
|
|| src.startsWith('blob:')
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
// Inline pass: tokenize into runs of `code`, **bold**, *italic*, links,
|
// Inline pass: tokenize into runs of `code`, **bold**, *italic*, links,
|
||||||
// and plain text. We walk the string with a regex that matches whichever
|
// and plain text. We walk the string with a regex that matches whichever
|
||||||
// delimiter shows up next; everything between delimiters becomes a text
|
// delimiter shows up next; everything between delimiters becomes a text
|
||||||
|
|
@ -270,14 +288,19 @@ function renderInline(text: string): ReactNode {
|
||||||
const out: ReactNode[] = [];
|
const out: ReactNode[] = [];
|
||||||
// Order matters:
|
// Order matters:
|
||||||
// 1. inline code first so its contents are not re-tokenized as bold/italic.
|
// 1. inline code first so its contents are not re-tokenized as bold/italic.
|
||||||
// 2. explicit `[text](url)` markdown links before bare URL autolink so the
|
// 2. image syntax `` BEFORE the link branch. Both share
|
||||||
|
// `[…](…)` and the image is only distinguished by the leading `!`;
|
||||||
|
// letting the link branch win would render `[alt](url)` as a text
|
||||||
|
// link with `!` stranded as a sibling text node and the user would
|
||||||
|
// see the link copy but never the image.
|
||||||
|
// 3. explicit `[text](url)` markdown links before bare URL autolink so the
|
||||||
// autolink does not greedily swallow the closing paren.
|
// autolink does not greedily swallow the closing paren.
|
||||||
// 3. bare http(s) URL autolink BEFORE italic markers — chat output often
|
// 4. bare http(s) URL autolink BEFORE italic markers — chat output often
|
||||||
// contains OAuth-style links with `_type=` / `_id=` query params, and
|
// contains OAuth-style links with `_type=` / `_id=` query params, and
|
||||||
// leaving italic to win turns the URL into an italic-fragmented mess.
|
// leaving italic to win turns the URL into an italic-fragmented mess.
|
||||||
// 4. bold (**a** / __a__) before italic (*a* / _a_).
|
// 5. bold (**a** / __a__) before italic (*a* / _a_).
|
||||||
const re =
|
const re =
|
||||||
/(`[^`]+`)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g;
|
/(`[^`]+`)|!\[([^\]]*)\]\(([^)\s]+)\)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g;
|
||||||
let lastIndex = 0;
|
let lastIndex = 0;
|
||||||
let m: RegExpExecArray | null;
|
let m: RegExpExecArray | null;
|
||||||
let key = 0;
|
let key = 0;
|
||||||
|
|
@ -291,40 +314,61 @@ function renderInline(text: string): ReactNode {
|
||||||
{m[1].slice(1, -1)}
|
{m[1].slice(1, -1)}
|
||||||
</code>,
|
</code>,
|
||||||
);
|
);
|
||||||
} else if (m[2] && m[3]) {
|
} else if (m[3] !== undefined) {
|
||||||
|
// Image: m[2] = alt (may be empty), m[3] = src
|
||||||
|
const src = m[3];
|
||||||
|
const alt = m[2] || '';
|
||||||
|
if (isSafeMarkdownImageSrc(src)) {
|
||||||
|
out.push(
|
||||||
|
<img
|
||||||
|
key={key++}
|
||||||
|
className="md-image"
|
||||||
|
src={src}
|
||||||
|
alt={alt}
|
||||||
|
loading="lazy"
|
||||||
|
referrerPolicy="no-referrer"
|
||||||
|
style={{ maxWidth: '100%', height: 'auto', borderRadius: 6 }}
|
||||||
|
/>,
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
// Unsafe scheme — drop the image tag but keep the alt text so
|
||||||
|
// the user sees what the model meant to show.
|
||||||
|
pushText(out, alt, key++);
|
||||||
|
}
|
||||||
|
} else if (m[4] && m[5]) {
|
||||||
out.push(
|
out.push(
|
||||||
<a
|
<a
|
||||||
key={key++}
|
key={key++}
|
||||||
className="md-link"
|
className="md-link"
|
||||||
href={m[3]}
|
href={m[5]}
|
||||||
target="_blank"
|
|
||||||
rel="noreferrer noopener"
|
|
||||||
>
|
|
||||||
{m[2]}
|
|
||||||
</a>,
|
|
||||||
);
|
|
||||||
} else if (m[4]) {
|
|
||||||
// Bare URL — autolink with the URL as both href and visible text,
|
|
||||||
// matching the Markdown `<https://…>` autolink convention.
|
|
||||||
out.push(
|
|
||||||
<a
|
|
||||||
key={key++}
|
|
||||||
className="md-link md-link-bare"
|
|
||||||
href={m[4]}
|
|
||||||
target="_blank"
|
target="_blank"
|
||||||
rel="noreferrer noopener"
|
rel="noreferrer noopener"
|
||||||
>
|
>
|
||||||
{m[4]}
|
{m[4]}
|
||||||
</a>,
|
</a>,
|
||||||
);
|
);
|
||||||
} else if (m[5]) {
|
|
||||||
out.push(<strong key={key++}>{m[5].slice(2, -2)}</strong>);
|
|
||||||
} else if (m[6]) {
|
} else if (m[6]) {
|
||||||
out.push(<strong key={key++}>{m[6].slice(2, -2)}</strong>);
|
// Bare URL — autolink with the URL as both href and visible text,
|
||||||
|
// matching the Markdown `<https://…>` autolink convention.
|
||||||
|
out.push(
|
||||||
|
<a
|
||||||
|
key={key++}
|
||||||
|
className="md-link md-link-bare"
|
||||||
|
href={m[6]}
|
||||||
|
target="_blank"
|
||||||
|
rel="noreferrer noopener"
|
||||||
|
>
|
||||||
|
{m[6]}
|
||||||
|
</a>,
|
||||||
|
);
|
||||||
} else if (m[7]) {
|
} else if (m[7]) {
|
||||||
out.push(<em key={key++}>{m[7].slice(1, -1)}</em>);
|
out.push(<strong key={key++}>{m[7].slice(2, -2)}</strong>);
|
||||||
} else if (m[8]) {
|
} else if (m[8]) {
|
||||||
out.push(<em key={key++}>{m[8].slice(1, -1)}</em>);
|
out.push(<strong key={key++}>{m[8].slice(2, -2)}</strong>);
|
||||||
|
} else if (m[9]) {
|
||||||
|
out.push(<em key={key++}>{m[9].slice(1, -1)}</em>);
|
||||||
|
} else if (m[10]) {
|
||||||
|
out.push(<em key={key++}>{m[10].slice(1, -1)}</em>);
|
||||||
}
|
}
|
||||||
lastIndex = re.lastIndex;
|
lastIndex = re.lastIndex;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -65,6 +65,22 @@ export const SUGGESTED_MODELS_BY_PROTOCOL: Record<ApiProtocol, readonly string[]
|
||||||
'gemini-1.5-pro',
|
'gemini-1.5-pro',
|
||||||
'gemini-1.5-flash',
|
'gemini-1.5-flash',
|
||||||
],
|
],
|
||||||
|
senseaudio: [
|
||||||
|
// SenseAudio is an OpenAI-compatible gateway that fronts both its own
|
||||||
|
// models (senseaudio-s2 family) and aggregator routes to deepseek /
|
||||||
|
// glm / kimi / minimax. Listing the headline house models first keeps
|
||||||
|
// the picker's default selection on a SenseAudio-native checkpoint;
|
||||||
|
// the aggregator IDs trail so users who arrived for a specific
|
||||||
|
// upstream still find it in this tab without retyping it.
|
||||||
|
'senseaudio-s2',
|
||||||
|
'senseaudio-s2-flash',
|
||||||
|
'deepseek-v4-flash',
|
||||||
|
'deepseek-v4-pro',
|
||||||
|
'glm-5.1',
|
||||||
|
'kimi-k2.6',
|
||||||
|
'MiniMax-M2.7-highspeed',
|
||||||
|
'MiniMax-M2.7',
|
||||||
|
],
|
||||||
ollama: [
|
ollama: [
|
||||||
'cogito-2.1:671b',
|
'cogito-2.1:671b',
|
||||||
'deepseek-v3.1:671b',
|
'deepseek-v3.1:671b',
|
||||||
|
|
@ -123,6 +139,7 @@ export const FAST_MODEL_BY_PROTOCOL: Record<ApiProtocol, string> = {
|
||||||
// pick produces a deterministic answer; users who care can override
|
// pick produces a deterministic answer; users who care can override
|
||||||
// through the Memory model picker.
|
// through the Memory model picker.
|
||||||
ollama: 'gemma3:4b',
|
ollama: 'gemma3:4b',
|
||||||
|
senseaudio: 'senseaudio-s2-flash',
|
||||||
};
|
};
|
||||||
|
|
||||||
export const API_PROTOCOL_TABS: ReadonlyArray<{
|
export const API_PROTOCOL_TABS: ReadonlyArray<{
|
||||||
|
|
@ -134,6 +151,7 @@ export const API_PROTOCOL_TABS: ReadonlyArray<{
|
||||||
{ id: 'azure', title: 'Azure OpenAI' },
|
{ id: 'azure', title: 'Azure OpenAI' },
|
||||||
{ id: 'google', title: 'Google Gemini' },
|
{ id: 'google', title: 'Google Gemini' },
|
||||||
{ id: 'ollama', title: 'Ollama Cloud' },
|
{ id: 'ollama', title: 'Ollama Cloud' },
|
||||||
|
{ id: 'senseaudio', title: 'SenseAudio' },
|
||||||
];
|
];
|
||||||
|
|
||||||
export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
|
export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
|
||||||
|
|
@ -142,6 +160,7 @@ export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
|
||||||
azure: 'Azure OpenAI',
|
azure: 'Azure OpenAI',
|
||||||
google: 'Google Gemini',
|
google: 'Google Gemini',
|
||||||
ollama: 'Ollama Cloud API',
|
ollama: 'Ollama Cloud API',
|
||||||
|
senseaudio: 'SenseAudio API',
|
||||||
};
|
};
|
||||||
|
|
||||||
export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
|
export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
|
||||||
|
|
@ -150,6 +169,7 @@ export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
|
||||||
azure: 'azure key',
|
azure: 'azure key',
|
||||||
google: 'AIza...',
|
google: 'AIza...',
|
||||||
ollama: 'Ollama API key',
|
ollama: 'Ollama API key',
|
||||||
|
senseaudio: 'SenseAudio API key',
|
||||||
};
|
};
|
||||||
|
|
||||||
// Default base URL the daemon assumes when the user leaves the field
|
// Default base URL the daemon assumes when the user leaves the field
|
||||||
|
|
@ -161,4 +181,5 @@ export const DEFAULT_BASE_URL_BY_PROTOCOL: Record<ApiProtocol, string> = {
|
||||||
azure: '',
|
azure: '',
|
||||||
google: 'https://generativelanguage.googleapis.com',
|
google: 'https://generativelanguage.googleapis.com',
|
||||||
ollama: 'https://ollama.com',
|
ollama: 'https://ollama.com',
|
||||||
|
senseaudio: 'https://api.senseaudio.cn',
|
||||||
};
|
};
|
||||||
|
|
|
||||||
|
|
@ -249,6 +249,22 @@ export const KNOWN_PROVIDERS: KnownProvider[] = [
|
||||||
model: 'mimo-v2.5-pro',
|
model: 'mimo-v2.5-pro',
|
||||||
models: ['mimo-v2.5-pro'],
|
models: ['mimo-v2.5-pro'],
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
label: 'SenseAudio',
|
||||||
|
protocol: 'senseaudio',
|
||||||
|
baseUrl: 'https://api.senseaudio.cn',
|
||||||
|
model: 'senseaudio-s2',
|
||||||
|
models: [
|
||||||
|
'senseaudio-s2',
|
||||||
|
'senseaudio-s2-flash',
|
||||||
|
'deepseek-v4-flash',
|
||||||
|
'deepseek-v4-pro',
|
||||||
|
'glm-5.1',
|
||||||
|
'kimi-k2.6',
|
||||||
|
'MiniMax-M2.7-highspeed',
|
||||||
|
'MiniMax-M2.7',
|
||||||
|
],
|
||||||
|
},
|
||||||
];
|
];
|
||||||
|
|
||||||
function normalizePet(input: Partial<PetConfig> | undefined): PetConfig {
|
function normalizePet(input: Partial<PetConfig> | undefined): PetConfig {
|
||||||
|
|
@ -290,6 +306,10 @@ function inferApiProtocol(model: string, baseUrl: string): ApiProtocol {
|
||||||
// protocol so both chat and the connection test hit the native Ollama
|
// protocol so both chat and the connection test hit the native Ollama
|
||||||
// proxy instead of the Anthropic or OpenAI paths.
|
// proxy instead of the Anthropic or OpenAI paths.
|
||||||
if (normalized.includes('ollama.com')) return 'ollama';
|
if (normalized.includes('ollama.com')) return 'ollama';
|
||||||
|
// SenseAudio host gets routed to its own proxy so the daemon log line
|
||||||
|
// and the BYOK tab UI stay consistent with the protocol the user
|
||||||
|
// picked — even though the on-wire shape is OpenAI-compatible.
|
||||||
|
if (normalized.includes('senseaudio.cn')) return 'senseaudio';
|
||||||
return isOpenAICompatible(model, baseUrl) ? 'openai' : 'anthropic';
|
return isOpenAICompatible(model, baseUrl) ? 'openai' : 'anthropic';
|
||||||
} catch {
|
} catch {
|
||||||
// Preserve the rest of the user's settings even if an old saved base URL is
|
// Preserve the rest of the user's settings even if an old saved base URL is
|
||||||
|
|
|
||||||
|
|
@ -91,7 +91,7 @@ export type {
|
||||||
} from '@open-design/contracts';
|
} from '@open-design/contracts';
|
||||||
|
|
||||||
export type ExecMode = 'daemon' | 'api';
|
export type ExecMode = 'daemon' | 'api';
|
||||||
export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama';
|
export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
|
||||||
|
|
||||||
export type LiveArtifactTabId = `live:${string}`;
|
export type LiveArtifactTabId = `live:${string}`;
|
||||||
export type ProjectWorkspaceTabId = string | LiveArtifactTabId;
|
export type ProjectWorkspaceTabId = string | LiveArtifactTabId;
|
||||||
|
|
@ -180,6 +180,13 @@ export interface ApiProtocolConfig {
|
||||||
model: string;
|
model: string;
|
||||||
apiVersion?: string;
|
apiVersion?: string;
|
||||||
apiProviderBaseUrl?: string | null;
|
apiProviderBaseUrl?: string | null;
|
||||||
|
/** SenseAudio BYOK only — default image model the daemon-side
|
||||||
|
* `generate_image` tool uses when the LLM doesn't pass one. Carries
|
||||||
|
* one of the SenseAudio image model ids (`senseaudio-image-2.0-260319`,
|
||||||
|
* `senseaudio-image-1.0-260319`, `doubao-seedream-5-0-260128`). Stored
|
||||||
|
* per-protocol so flipping between BYOK tabs doesn't reset the
|
||||||
|
* SenseAudio image-model choice. */
|
||||||
|
byokImageModel?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per-CLI model + reasoning the user picked in the model menu. Each agent
|
// Per-CLI model + reasoning the user picked in the model menu. Each agent
|
||||||
|
|
@ -294,6 +301,11 @@ export interface AppConfig {
|
||||||
model: string;
|
model: string;
|
||||||
apiProtocol?: ApiProtocol;
|
apiProtocol?: ApiProtocol;
|
||||||
apiVersion?: string;
|
apiVersion?: string;
|
||||||
|
/** SenseAudio BYOK only — default image model for the daemon-side
|
||||||
|
* generate_image tool. Mirrors apiProtocolConfigs.senseaudio.byokImageModel
|
||||||
|
* so the active protocol's value lives at the top level (consistent
|
||||||
|
* with how apiKey / baseUrl / model are projected onto AppConfig). */
|
||||||
|
byokImageModel?: string;
|
||||||
apiProtocolConfigs?: Partial<Record<ApiProtocol, ApiProtocolConfig>>;
|
apiProtocolConfigs?: Partial<Record<ApiProtocol, ApiProtocolConfig>>;
|
||||||
/** Internal config schema/migration version for localStorage upgrades. */
|
/** Internal config schema/migration version for localStorage upgrades. */
|
||||||
configMigrationVersion?: number;
|
configMigrationVersion?: number;
|
||||||
|
|
|
||||||
|
|
@ -6,6 +6,7 @@ const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
|
||||||
azure: 'Azure OpenAI',
|
azure: 'Azure OpenAI',
|
||||||
google: 'Google Gemini',
|
google: 'Google Gemini',
|
||||||
ollama: 'Ollama Cloud API',
|
ollama: 'Ollama Cloud API',
|
||||||
|
senseaudio: 'SenseAudio API',
|
||||||
};
|
};
|
||||||
|
|
||||||
const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
|
const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
|
||||||
|
|
@ -14,6 +15,7 @@ const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
|
||||||
azure: 'azure-openai-api',
|
azure: 'azure-openai-api',
|
||||||
google: 'google-gemini-api',
|
google: 'google-gemini-api',
|
||||||
ollama: 'ollama-cloud-api',
|
ollama: 'ollama-cloud-api',
|
||||||
|
senseaudio: 'senseaudio-api',
|
||||||
};
|
};
|
||||||
|
|
||||||
export function apiProtocolLabel(protocol: ApiProtocol | undefined): string {
|
export function apiProtocolLabel(protocol: ApiProtocol | undefined): string {
|
||||||
|
|
|
||||||
|
|
@ -105,4 +105,67 @@ describe('renderMarkdown', () => {
|
||||||
const bodyTd = (out.match(/<tbody>[\s\S]*<\/tbody>/)?.[0] ?? '').match(/<td/g) ?? [];
|
const bodyTd = (out.match(/<tbody>[\s\S]*<\/tbody>/)?.[0] ?? '').match(/<td/g) ?? [];
|
||||||
expect(bodyTd.length).toBe(2);
|
expect(bodyTd.length).toBe(2);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('renders  as <img> for relative BYOK image URLs', () => {
|
||||||
|
const out = html('Here is your cat: ');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
expect(out).toContain('class="md-image"');
|
||||||
|
expect(out).toContain('src="/api/byok-image/abc-123.png"');
|
||||||
|
expect(out).toContain('alt="cute kitten"');
|
||||||
|
expect(out).toContain('loading="lazy"');
|
||||||
|
expect(out).toContain('referrerPolicy="no-referrer"');
|
||||||
|
// Image syntax must NOT be turned into an <a> link — `[alt](url)`
|
||||||
|
// with a leading `!` is image, not link.
|
||||||
|
expect(out).not.toContain('<a class="md-link"');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('renders  with empty alt text', () => {
|
||||||
|
const out = html('');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
expect(out).toContain('alt=""');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('renders https image URLs', () => {
|
||||||
|
const out = html('');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
expect(out).toContain('src="https://example.com/logo.png"');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('renders data: image URIs', () => {
|
||||||
|
const out = html('');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
expect(out).toContain('src="data:image/png;base64,iVBORw0KGgo="');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('drops image tags with unsafe schemes and keeps alt text as plain text', () => {
|
||||||
|
const out = html(')');
|
||||||
|
expect(out).not.toContain('<img');
|
||||||
|
expect(out).not.toContain('javascript:');
|
||||||
|
expect(out).toContain('hacked');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects protocol-relative image URLs (could load cross-origin)', () => {
|
||||||
|
// `//evil.com/track.png` would inherit the page protocol; not in our
|
||||||
|
// allowlist. Should fall through to alt-as-text.
|
||||||
|
const out = html('');
|
||||||
|
expect(out).not.toContain('<img');
|
||||||
|
expect(out).toContain('track');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('keeps regular [text](url) links working alongside image syntax', () => {
|
||||||
|
const out = html('Click [here](https://example.com) and look ');
|
||||||
|
expect(out).toContain('<a class="md-link"');
|
||||||
|
expect(out).toContain('href="https://example.com"');
|
||||||
|
expect(out).toContain('>here</a>');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
expect(out).toContain('src="/api/byok-image/a.png"');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('preserves bold + italic + code after the image regex addition', () => {
|
||||||
|
const out = html('**b** and *i* and `c` and ');
|
||||||
|
expect(out).toContain('<strong>b</strong>');
|
||||||
|
expect(out).toContain('<em>i</em>');
|
||||||
|
expect(out).toContain('<code class="md-inline-code">c</code>');
|
||||||
|
expect(out).toContain('<img');
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
|
||||||
|
|
@ -229,7 +229,7 @@ export interface SettingsClickByokProviderOptionProps {
|
||||||
// Tracking doc names azure/google/ollama as azure_openai/google_gemini/
|
// Tracking doc names azure/google/ollama as azure_openai/google_gemini/
|
||||||
// ollama_cloud — we forward the code value verbatim and let dashboards
|
// ollama_cloud — we forward the code value verbatim and let dashboards
|
||||||
// map; see tracking-doc-issues.md §2.5.
|
// map; see tracking-doc-issues.md §2.5.
|
||||||
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
|
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
|
||||||
// True when the clicked chip was already the active protocol (no-op
|
// True when the clicked chip was already the active protocol (no-op
|
||||||
// toggle); false when the click switches protocol.
|
// toggle); false when the click switches protocol.
|
||||||
is_selected: boolean;
|
is_selected: boolean;
|
||||||
|
|
@ -242,10 +242,10 @@ export interface SettingsClickByokFieldProps {
|
||||||
action: 'focus_byok_field';
|
action: 'focus_byok_field';
|
||||||
field_id: 'api_key' | 'base_url' | 'model';
|
field_id: 'api_key' | 'base_url' | 'model';
|
||||||
// Code's `apiProtocol` is wider than the CSV's BYOK provider enum
|
// Code's `apiProtocol` is wider than the CSV's BYOK provider enum
|
||||||
// (anthropic|openai|azure|ollama|google). We forward the code value
|
// (anthropic|openai|azure|ollama|google|senseaudio). We forward the code
|
||||||
// verbatim so dashboards can group by the actual protocol; the CSV enum
|
// value verbatim so dashboards can group by the actual protocol; the CSV
|
||||||
// is a strict subset the product team can revise.
|
// enum is a strict subset the product team can revise.
|
||||||
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
|
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
|
||||||
has_value: boolean;
|
has_value: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -261,7 +261,7 @@ export interface SettingsCliTestResultProps {
|
||||||
export interface SettingsByokTestResultProps {
|
export interface SettingsByokTestResultProps {
|
||||||
page: 'settings';
|
page: 'settings';
|
||||||
area: 'execution_model';
|
area: 'execution_model';
|
||||||
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google';
|
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
|
||||||
result: 'success' | 'failed' | 'timeout';
|
result: 'success' | 'failed' | 'timeout';
|
||||||
error_code?: string;
|
error_code?: string;
|
||||||
duration_ms: number;
|
duration_ms: number;
|
||||||
|
|
|
||||||
|
|
@ -139,7 +139,7 @@ export type ConnectionTestKind =
|
||||||
| 'agent_spawn_failed'
|
| 'agent_spawn_failed'
|
||||||
| 'unknown';
|
| 'unknown';
|
||||||
|
|
||||||
export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama';
|
export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
|
||||||
|
|
||||||
export interface ProviderTestRequest {
|
export interface ProviderTestRequest {
|
||||||
protocol: ConnectionTestProtocol;
|
protocol: ConnectionTestProtocol;
|
||||||
|
|
|
||||||
|
|
@ -80,16 +80,19 @@ export interface MemoryListResponse {
|
||||||
/** Provider/protocol the memory extractor calls. Mirrors the chat
|
/** Provider/protocol the memory extractor calls. Mirrors the chat
|
||||||
* BYOK form's protocols — anthropic + openai-compatible + azure
|
* BYOK form's protocols — anthropic + openai-compatible + azure
|
||||||
* (openai-compatible at a different URL/header) + google gemini +
|
* (openai-compatible at a different URL/header) + google gemini +
|
||||||
* ollama (also openai-compatible, just hosted on Ollama Cloud) — so
|
* ollama (also openai-compatible, just hosted on Ollama Cloud) +
|
||||||
* the memory picker can offer the same options as the chat picker
|
* senseaudio (also openai-compatible, SenseAudio's OpenAI-shaped
|
||||||
* above it. The daemon routes ollama through the same callOpenAI
|
* /v1/chat/completions gateway) — so the memory picker can offer the
|
||||||
* path since the wire protocol is identical. */
|
* same options as the chat picker above it. The daemon routes both
|
||||||
|
* ollama and senseaudio through the same callOpenAI path since the
|
||||||
|
* wire protocol is identical. */
|
||||||
export type MemoryExtractionProvider =
|
export type MemoryExtractionProvider =
|
||||||
| 'anthropic'
|
| 'anthropic'
|
||||||
| 'openai'
|
| 'openai'
|
||||||
| 'azure'
|
| 'azure'
|
||||||
| 'google'
|
| 'google'
|
||||||
| 'ollama';
|
| 'ollama'
|
||||||
|
| 'senseaudio';
|
||||||
|
|
||||||
/** Masked version of MemoryExtractionConfig returned by GET endpoints —
|
/** Masked version of MemoryExtractionConfig returned by GET endpoints —
|
||||||
* the api key field is replaced with a 4-char tail so the settings UI
|
* the api key field is replaced with a 4-char tail so the settings UI
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue