open-design/apps/daemon/tests/critique-artifact-endpoint.test.ts
Nagendhra Madishetti 32fa0c23bb
feat(daemon): Critique Theater Phase 6.2 (artifact extraction + endpoint) (#1085)
The orchestrator was leaving artifactPath = null on every shipped run because
the SHIP <ARTIFACT> body never made it past the parser. Reviewers caught this
on PR #1006: a rerun-style endpoint built on top of that null could not return
a usable prior-art reference, and tests that synthesized artifactPath via
insertCritiqueRun were hiding the gap rather than covering the feature.

This PR closes that gap. The parser now hands the orchestrator a
ShipArtifactPayload (round, mime, body) through a side-channel callback, and
the orchestrator writes the bytes to <artifactsDir>/<projectId>/<runId>/
artifact.<ext> via a new artifact-writer module. The row's artifactPath is
the absolute on-disk path. The web layer never sees that path: it fetches
the bytes through GET /api/projects/:projectId/critique/:runId/artifact,
which the new artifact-handler module serves with a mime-derived
Content-Type, X-Content-Type-Options: nosniff, a CSP header for HTML and
SVG, and the same cross-project leak guard pattern the interrupt handler
uses.

The body and mime intentionally never travel on the SSE wire. The SHIP
PanelEvent (which doubles as the SSE payload shape) keeps its lightweight
artifactRef, and the orchestrator strips body/mime before bus.emit, so a
multi-megabyte artifact does not broadcast to every subscriber. The new
orchestrator test asserts this explicitly.

Defense in depth in the writer + handler:

  - mime allowlist with text/html, text/css, text/markdown, text/plain,
    application/json, image/svg+xml; everything else falls through to
    application/octet-stream + .bin so unknown payloads can't be
    misinterpreted as a known type;
  - UTF-8 byte-length cap, configurable via cfg.parserMaxBlockBytes, so
    multi-byte payloads can't sneak past a JS .length check;
  - atomic write through a sibling tmp file + rename so a daemon crash
    mid-write can't leave a half-written artifact under the canonical
    name;
  - path-traversal guard on the GET endpoint that resolves the row's
    artifactPath against the artifacts root and refuses anything that
    escapes it, refuses non-regular files (symlinks, dirs), and refuses
    files larger than the response cap.

Folded in two non-blocking notes lefarcen left on PR #1016 (the contracts
move) since persistence.ts was already in scope here:

  - P2: introduced CritiquePersistedStatus = CritiqueRunStatus | 'running'
    in the contracts package. CritiqueRunRow.status and CritiqueRunInsert.
    status now use it, and the inline `as CritiqueRunStatus | 'running'`
    widen in interrupt-handler.ts is gone. Public DTOs continue to use the
    terminal-only CritiqueRunStatus so a future endpoint can't leak a
    'running' row through the wire.
  - P3: added AssertExhaustiveValues + a compile-time assertion that
    CRITIQUE_RUN_STATUSES covers every CritiqueRunStatus variant.
    Adding a value to ShipStatus or CritiqueRunStatus without updating
    the array now fails the build with a tuple naming the missing
    variants instead of silently dropping out of UI filters.

Coverage: 174 critique tests across 14 files pass locally, including the
new critique-artifact-writer (13 cases) and critique-artifact-endpoint
(11 cases) suites, the inverted critique-lifecycle artifact-persistence
test, and the orchestrator happy-path that asserts the SSE ship payload
does NOT carry body or mime.

Validated: pnpm guard, pnpm --filter @open-design/contracts build,
pnpm --filter @open-design/daemon build (full tsc), pnpm --filter
@open-design/web typecheck, pnpm --filter @open-design/daemon exec
vitest run tests/critique (all green).

This is step (b) of the four-step plan that PR #1006's closing comment
laid out. Step (a) was the contracts move in PR #1016. Steps (c)
(persist original_message_id / agent_id / model_id) and (d) (real
rerun endpoint on top of (a)+(b)+(c)) follow.

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-10 23:59:04 +08:00

343 lines
11 KiB
TypeScript

/**
* Tests for GET /api/projects/:projectId/critique/:runId/artifact.
*
* Each test mounts the handler on a fresh Express mini-app with an
* in-memory SQLite database and a real on-disk artifacts root, so the
* full handler logic (cross-project leak guard, path-traversal guard,
* missing-file 404, mime/Content-Type response) is exercised without
* starting the full daemon server.
*
* @see specs/current/critique-theater.md § rerun endpoint (Task 6.2)
*/
import http from 'node:http';
import { mkdirSync, mkdtempSync, writeFileSync, symlinkSync } from 'node:fs';
import { rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import express from 'express';
import Database from 'better-sqlite3';
import {
insertCritiqueRun,
migrateCritique,
updateCritiqueRun,
} from '../src/critique/persistence.js';
import { handleCritiqueArtifact } from '../src/critique/artifact-handler.js';
// ---------------------------------------------------------------------------
// Test infrastructure
// ---------------------------------------------------------------------------
function freshDb(): Database.Database {
const db = new Database(':memory:');
db.pragma('journal_mode = WAL');
db.pragma('foreign_keys = ON');
db.exec(`
CREATE TABLE projects (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
FOREIGN KEY(project_id) REFERENCES projects(id) ON DELETE CASCADE
);
INSERT INTO projects (id, name, created_at, updated_at) VALUES ('p1', 'Project 1', 0, 0);
INSERT INTO projects (id, name, created_at, updated_at) VALUES ('p2', 'Project 2', 0, 0);
`);
migrateCritique(db);
return db;
}
function startMiniServer(
db: Database.Database,
artifactsRoot: string,
): Promise<{ baseUrl: string; server: http.Server }> {
const app = express();
app.get(
'/api/projects/:projectId/critique/:runId/artifact',
handleCritiqueArtifact(db, { artifactsRoot }),
);
return new Promise((resolve, reject) => {
const server = app.listen(0, '127.0.0.1', () => {
const addr = server.address();
if (!addr || typeof addr !== 'object') {
reject(new Error('could not bind'));
return;
}
resolve({ baseUrl: `http://127.0.0.1:${addr.port}`, server });
});
server.on('error', reject);
});
}
interface FetchResult {
status: number;
headers: Headers;
body: string;
}
async function get(url: string): Promise<FetchResult> {
const res = await fetch(url);
const body = await res.text();
return { status: res.status, headers: res.headers, body };
}
/** Insert a row + place a real artifact file under the artifacts root,
* matching what the orchestrator would have written. */
function seedRowWithArtifact(
db: Database.Database,
artifactsRoot: string,
args: {
runId: string;
projectId: string;
body: string;
extension: 'html' | 'css' | 'svg' | 'bin' | 'json' | 'txt' | 'md';
},
): { absPath: string } {
const dir = join(artifactsRoot, args.projectId, args.runId);
mkdirSync(dir, { recursive: true });
const absPath = join(dir, `artifact.${args.extension}`);
writeFileSync(absPath, args.body, 'utf8');
insertCritiqueRun(db, {
id: args.runId,
projectId: args.projectId,
status: 'shipped',
score: 9,
rounds: [{ n: 1, composite: 9, mustFix: 0, decision: 'ship' }],
artifactPath: absPath,
protocolVersion: 1,
});
return { absPath };
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('GET /api/projects/:projectId/critique/:runId/artifact', () => {
let db: Database.Database;
let artifactsRoot: string;
let baseUrl: string;
let server: http.Server;
beforeEach(async () => {
db = freshDb();
artifactsRoot = mkdtempSync(join(tmpdir(), 'od-artifact-endpoint-'));
({ baseUrl, server } = await startMiniServer(db, artifactsRoot));
});
afterEach(async () => {
db.close();
await new Promise<void>((resolve) => server.close(() => resolve()));
await rm(artifactsRoot, { recursive: true, force: true });
});
// ---- 200: happy path -----------------------------------------------------
it('streams the body with text/html and the right Content-Length', async () => {
seedRowWithArtifact(db, artifactsRoot, {
runId: 'crun_html',
projectId: 'p1',
body: '<html><body>final</body></html>',
extension: 'html',
});
const { status, headers, body } = await get(
`${baseUrl}/api/projects/p1/critique/crun_html/artifact`,
);
expect(status).toBe(200);
expect(headers.get('content-type')).toBe('text/html');
expect(headers.get('content-length')).toBe('31');
expect(headers.get('x-content-type-options')).toBe('nosniff');
expect(headers.get('content-security-policy')).toContain("default-src 'none'");
expect(body).toBe('<html><body>final</body></html>');
});
it('streams an SVG with image/svg+xml and the same CSP that html gets', async () => {
seedRowWithArtifact(db, artifactsRoot, {
runId: 'crun_svg',
projectId: 'p1',
body: '<svg xmlns="http://www.w3.org/2000/svg"></svg>',
extension: 'svg',
});
const { status, headers } = await get(
`${baseUrl}/api/projects/p1/critique/crun_svg/artifact`,
);
expect(status).toBe(200);
expect(headers.get('content-type')).toBe('image/svg+xml');
expect(headers.get('content-security-policy')).toContain("default-src 'none'");
});
it('streams an unknown extension as application/octet-stream without a CSP header', async () => {
seedRowWithArtifact(db, artifactsRoot, {
runId: 'crun_bin',
projectId: 'p1',
body: 'binary-ish',
extension: 'bin',
});
const { status, headers, body } = await get(
`${baseUrl}/api/projects/p1/critique/crun_bin/artifact`,
);
expect(status).toBe(200);
expect(headers.get('content-type')).toBe('application/octet-stream');
expect(headers.get('content-security-policy')).toBeNull();
expect(body).toBe('binary-ish');
});
// ---- 404: unknown / cross-project / missing artifact ---------------------
it('returns 404 when the run does not exist', async () => {
const { status, body } = await get(
`${baseUrl}/api/projects/p1/critique/ghost/artifact`,
);
expect(status).toBe(404);
expect(JSON.parse(body)).toMatchObject({ error: { code: 'NOT_FOUND' } });
});
it('returns 404 when the runId belongs to a different project (cross-project leak guard)', async () => {
seedRowWithArtifact(db, artifactsRoot, {
runId: 'crun_other',
projectId: 'p2',
body: '<html>p2</html>',
extension: 'html',
});
const { status } = await get(
`${baseUrl}/api/projects/p1/critique/crun_other/artifact`,
);
expect(status).toBe(404);
});
it('returns 404 when the row has no artifactPath (still running, or never shipped)', async () => {
insertCritiqueRun(db, {
id: 'crun_no_artifact',
projectId: 'p1',
status: 'running',
protocolVersion: 1,
});
const { status, body } = await get(
`${baseUrl}/api/projects/p1/critique/crun_no_artifact/artifact`,
);
expect(status).toBe(404);
expect(JSON.parse(body)).toMatchObject({
error: { code: 'NOT_FOUND', currentStatus: 'running' },
});
});
it('returns 404 when artifactPath points outside the artifacts root', async () => {
const outsideDir = mkdtempSync(join(tmpdir(), 'od-artifact-outside-'));
try {
const outsidePath = join(outsideDir, 'leaked.html');
writeFileSync(outsidePath, '<html>leaked</html>', 'utf8');
insertCritiqueRun(db, {
id: 'crun_traversal',
projectId: 'p1',
status: 'shipped',
score: 9,
rounds: [{ n: 1, composite: 9, mustFix: 0, decision: 'ship' }],
artifactPath: outsidePath,
protocolVersion: 1,
});
const { status } = await get(
`${baseUrl}/api/projects/p1/critique/crun_traversal/artifact`,
);
expect(status).toBe(404);
} finally {
await rm(outsideDir, { recursive: true, force: true });
}
});
it('returns 404 when the file on disk has been deleted (row points at a vanished path)', async () => {
const { absPath } = seedRowWithArtifact(db, artifactsRoot, {
runId: 'crun_vanished',
projectId: 'p1',
body: '<html></html>',
extension: 'html',
});
await rm(absPath);
const { status } = await get(
`${baseUrl}/api/projects/p1/critique/crun_vanished/artifact`,
);
expect(status).toBe(404);
});
it('returns 404 when artifactPath resolves to a symlink (refuses non-regular files)', async function symlinkCase() {
// Symlinks fail with EPERM on Windows runners that lack the privilege.
// Skip the assertion there rather than mark the suite flaky.
const target = join(artifactsRoot, 'target.html');
writeFileSync(target, '<html>real</html>', 'utf8');
const linkDir = join(artifactsRoot, 'p1', 'crun_symlink');
mkdirSync(linkDir, { recursive: true });
const linkPath = join(linkDir, 'artifact.html');
try {
symlinkSync(target, linkPath);
} catch (err) {
if ((err as NodeJS.ErrnoException).code === 'EPERM') return;
throw err;
}
insertCritiqueRun(db, {
id: 'crun_symlink',
projectId: 'p1',
status: 'shipped',
score: 9,
rounds: [{ n: 1, composite: 9, mustFix: 0, decision: 'ship' }],
artifactPath: linkPath,
protocolVersion: 1,
});
const { status } = await get(
`${baseUrl}/api/projects/p1/critique/crun_symlink/artifact`,
);
expect(status).toBe(404);
});
// ---- 400: bad input ------------------------------------------------------
it('returns 400 when runId is whitespace-only', async () => {
const { status } = await get(
`${baseUrl}/api/projects/p1/critique/%20/artifact`,
);
expect(status).toBe(400);
});
// ---- terminal-state coverage --------------------------------------------
it('serves the artifact for every terminal status that ever wrote one', async () => {
const cases: Array<['shipped' | 'below_threshold' | 'interrupted' | 'timed_out', string]> = [
['shipped', '<html>shipped</html>'],
['below_threshold', '<html>below</html>'],
['interrupted', '<html>cut</html>'],
['timed_out', '<html>tick</html>'],
];
for (const [state, body] of cases) {
const id = `crun_${state}`;
const { absPath } = seedRowWithArtifact(db, artifactsRoot, {
runId: id,
projectId: 'p1',
body,
extension: 'html',
});
// Override status to the case-specific value.
updateCritiqueRun(db, id, { status: state, artifactPath: absPath });
const { status, body: served } = await get(
`${baseUrl}/api/projects/p1/critique/${id}/artifact`,
);
expect(status).toBe(200);
expect(served).toBe(body);
}
});
});