open-design/apps/daemon/tests/media-senseaudio-image.test.ts
mzl163 210b94069a
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
2026-05-19 23:14:56 +08:00

305 lines
9.3 KiB
TypeScript

import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { generateMedia } from '../src/media.js';
const TEST_SENSEAUDIO_BASE_URL = 'https://senseaudio-gateway.example.test';
const TEST_IMAGE_URL = 'https://cdn.example.test/generated/abc.png';
const TEST_IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
function buildOkResponse(url = TEST_IMAGE_URL) {
return new Response(
JSON.stringify({ url, base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
function buildImageFetchResponse(bytes: Buffer) {
return new Response(bytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
describe('senseaudio image generation', () => {
let root: string;
let projectRoot: string;
let projectsRoot: string;
const realFetch = globalThis.fetch;
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
const originalDataDir = process.env.OD_DATA_DIR;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-senseaudio-image-'));
projectRoot = path.join(root, 'project-root');
projectsRoot = path.join(projectRoot, '.od', 'projects');
await mkdir(projectsRoot, { recursive: true });
delete process.env.OD_MEDIA_CONFIG_DIR;
delete process.env.OD_DATA_DIR;
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
});
afterEach(async () => {
globalThis.fetch = realFetch;
if (originalMediaConfigDir == null) {
delete process.env.OD_MEDIA_CONFIG_DIR;
} else {
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
}
if (originalDataDir == null) {
delete process.env.OD_DATA_DIR;
} else {
process.env.OD_DATA_DIR = originalDataDir;
}
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
await rm(root, { recursive: true, force: true });
});
async function writeConfig(data: unknown) {
const file = path.join(projectRoot, '.od', 'media-config.json');
await mkdir(path.dirname(file), { recursive: true });
await writeFile(file, JSON.stringify(data), 'utf8');
}
it('renders a SenseAudio image with the documented sync defaults', async () => {
await writeConfig({
providers: {
senseaudio: {
apiKey: 'sense-test-key',
baseUrl: TEST_SENSEAUDIO_BASE_URL,
},
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sense-test-key',
'content-type': 'application/json',
});
expect(JSON.parse(String(init?.body))).toEqual({
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
size: '1024x1024',
});
return buildOkResponse();
}
if (urlStr === TEST_IMAGE_URL) {
return buildImageFetchResponse(TEST_IMAGE_BYTES);
}
throw new Error(`unexpected fetch: ${urlStr}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
output: 'sa-hero.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
expect(result.providerId).toBe('senseaudio');
expect(result.providerNote).toContain('senseaudio/senseaudio-image-2.0-260319');
expect(result.providerNote).toContain('1024x1024');
const bytes = await readFile(path.join(projectsRoot, 'project-1', 'sa-hero.png'));
expect(bytes.equals(TEST_IMAGE_BYTES)).toBe(true);
});
it('maps aspect ratios to the SenseAudio size strings', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-1.0-260319',
aspect: '16:9',
prompt: 'Widescreen banner.',
output: 'sa-banner.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('falls back to the canonical base URL when none is configured', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key' },
},
});
const fetchMock = vi.fn(async (input: unknown) => {
const urlStr = String(input);
if (urlStr === 'https://api.senseaudio.cn/v1/image/sync') {
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'doubao-seedream-5-0-260128',
prompt: 'Default base url.',
output: 'sa-default-base.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('reads the API key from OD_SENSEAUDIO_API_KEY when storage is empty', async () => {
process.env.OD_SENSEAUDIO_API_KEY = 'env-sense-key';
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
if (String(input).endsWith('/v1/image/sync')) {
expect(init?.headers).toMatchObject({ authorization: 'Bearer env-sense-key' });
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Env-only key.',
output: 'sa-env.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('errors when no API key is configured', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Should fail.',
output: 'sa-no-key.png',
}),
).rejects.toThrow(/no SenseAudio API key/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('surfaces HTTP-level failures with the status code and truncated body', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Bad auth.',
output: 'sa-401.png',
}),
).rejects.toThrow('senseaudio image 401: unauthorized');
});
it('surfaces upstream error_message verbatim when the body reports failure', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Blocked.',
output: 'sa-blocked.png',
}),
).rejects.toThrow('senseaudio image api error: sensitive_content_blocked');
});
it('errors when the response body is missing the image url', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Missing url.',
output: 'sa-missing-url.png',
}),
).rejects.toThrow('senseaudio image response missing url');
});
});