open-design/apps/daemon/tests/critique-lifecycle.test.ts
Nagendhra Madishetti 32fa0c23bb
feat(daemon): Critique Theater Phase 6.2 (artifact extraction + endpoint) (#1085)
The orchestrator was leaving artifactPath = null on every shipped run because
the SHIP <ARTIFACT> body never made it past the parser. Reviewers caught this
on PR #1006: a rerun-style endpoint built on top of that null could not return
a usable prior-art reference, and tests that synthesized artifactPath via
insertCritiqueRun were hiding the gap rather than covering the feature.

This PR closes that gap. The parser now hands the orchestrator a
ShipArtifactPayload (round, mime, body) through a side-channel callback, and
the orchestrator writes the bytes to <artifactsDir>/<projectId>/<runId>/
artifact.<ext> via a new artifact-writer module. The row's artifactPath is
the absolute on-disk path. The web layer never sees that path: it fetches
the bytes through GET /api/projects/:projectId/critique/:runId/artifact,
which the new artifact-handler module serves with a mime-derived
Content-Type, X-Content-Type-Options: nosniff, a CSP header for HTML and
SVG, and the same cross-project leak guard pattern the interrupt handler
uses.

The body and mime intentionally never travel on the SSE wire. The SHIP
PanelEvent (which doubles as the SSE payload shape) keeps its lightweight
artifactRef, and the orchestrator strips body/mime before bus.emit, so a
multi-megabyte artifact does not broadcast to every subscriber. The new
orchestrator test asserts this explicitly.

Defense in depth in the writer + handler:

  - mime allowlist with text/html, text/css, text/markdown, text/plain,
    application/json, image/svg+xml; everything else falls through to
    application/octet-stream + .bin so unknown payloads can't be
    misinterpreted as a known type;
  - UTF-8 byte-length cap, configurable via cfg.parserMaxBlockBytes, so
    multi-byte payloads can't sneak past a JS .length check;
  - atomic write through a sibling tmp file + rename so a daemon crash
    mid-write can't leave a half-written artifact under the canonical
    name;
  - path-traversal guard on the GET endpoint that resolves the row's
    artifactPath against the artifacts root and refuses anything that
    escapes it, refuses non-regular files (symlinks, dirs), and refuses
    files larger than the response cap.

Folded in two non-blocking notes lefarcen left on PR #1016 (the contracts
move) since persistence.ts was already in scope here:

  - P2: introduced CritiquePersistedStatus = CritiqueRunStatus | 'running'
    in the contracts package. CritiqueRunRow.status and CritiqueRunInsert.
    status now use it, and the inline `as CritiqueRunStatus | 'running'`
    widen in interrupt-handler.ts is gone. Public DTOs continue to use the
    terminal-only CritiqueRunStatus so a future endpoint can't leak a
    'running' row through the wire.
  - P3: added AssertExhaustiveValues + a compile-time assertion that
    CRITIQUE_RUN_STATUSES covers every CritiqueRunStatus variant.
    Adding a value to ShipStatus or CritiqueRunStatus without updating
    the array now fails the build with a tuple naming the missing
    variants instead of silently dropping out of UI filters.

Coverage: 174 critique tests across 14 files pass locally, including the
new critique-artifact-writer (13 cases) and critique-artifact-endpoint
(11 cases) suites, the inverted critique-lifecycle artifact-persistence
test, and the orchestrator happy-path that asserts the SSE ship payload
does NOT carry body or mime.

Validated: pnpm guard, pnpm --filter @open-design/contracts build,
pnpm --filter @open-design/daemon build (full tsc), pnpm --filter
@open-design/web typecheck, pnpm --filter @open-design/daemon exec
vitest run tests/critique (all green).

This is step (b) of the four-step plan that PR #1006's closing comment
laid out. Step (a) was the contracts move in PR #1016. Steps (c)
(persist original_message_id / agent_id / model_id) and (d) (real
rerun endpoint on top of (a)+(b)+(c)) follow.

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-10 23:59:04 +08:00

180 lines
7.1 KiB
TypeScript

/**
* Regression tests for round 3 review feedback on PR #481, plus the
* Phase 6.2 artifact-extraction expansion:
* - A signal-terminated child (e.g. SIGTERM from /api/runs/:id/cancel)
* finalizes the critique row as 'interrupted', not 'below_threshold'.
* The synthetic ship event for the best-so-far round carries
* status='interrupted' so transcripts and SSE clients see the real cause.
* - Shipped runs now persist the SHIP <ARTIFACT> body to disk and pin
* the absolute path on the row, so the artifact endpoint can stream
* the bytes the agent shipped (CDATA wrapper stripped).
*/
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { mkdtempSync, existsSync, readFileSync } from 'node:fs';
import { rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import Database from 'better-sqlite3';
import { migrateCritique, getCritiqueRun } from '../src/critique/persistence.js';
import { runOrchestrator, type CritiqueSseBus } from '../src/critique/orchestrator.js';
import type { CritiqueSseEvent } from '@open-design/contracts/critique';
import { defaultCritiqueConfig } from '@open-design/contracts/critique';
function freshDb(): Database.Database {
const db = new Database(':memory:');
db.pragma('journal_mode = WAL');
db.pragma('foreign_keys = ON');
db.exec(`
CREATE TABLE projects (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
FOREIGN KEY(project_id) REFERENCES projects(id) ON DELETE CASCADE
);
INSERT INTO projects (id, name, created_at, updated_at) VALUES ('p1', 'p1', 0, 0);
INSERT INTO conversations (id, project_id, created_at, updated_at) VALUES ('c1', 'p1', 0, 0);
`);
migrateCritique(db);
return db;
}
function makeBus(): { bus: CritiqueSseBus; events: CritiqueSseEvent[] } {
const events: CritiqueSseEvent[] = [];
const bus: CritiqueSseBus = { emit: (e) => { events.push(e); } };
return { bus, events };
}
let tmpDir: string;
let db: Database.Database;
beforeEach(() => {
tmpDir = mkdtempSync(join(tmpdir(), 'od-lifecycle-test-'));
db = freshDb();
});
afterEach(async () => {
db.close();
await rm(tmpDir, { recursive: true, force: true });
});
/** A stream that yields a complete round 1 then awaits forever, emulating a
* CLI that produced partial output before being killed. */
async function* roundOneThenStall(): AsyncIterable<string> {
yield `<CRITIQUE_RUN version="1" maxRounds="3" threshold="8.0" scale="10">
<ROUND n="1">
<PANELIST role="designer">
<NOTES>v1</NOTES>
<ARTIFACT mime="text/html"><![CDATA[<html></html>]]></ARTIFACT>
</PANELIST>
<PANELIST role="critic" score="9.0"><DIM name="h" score="9">ok</DIM></PANELIST>
<PANELIST role="brand" score="9.0"><DIM name="v" score="9">ok</DIM></PANELIST>
<PANELIST role="a11y" score="9.0"><DIM name="c" score="9">ok</DIM></PANELIST>
<PANELIST role="copy" score="9.0"><DIM name="x" score="9">ok</DIM></PANELIST>
<ROUND_END n="1" composite="9.0" must_fix="0" decision="continue"><REASON>continue</REASON></ROUND_END>
</ROUND>
`;
// Stall indefinitely so the orchestrator must rely on the child-exit race.
await new Promise(() => { /* never resolves */ });
}
describe('orchestrator lifecycle (PR #481 round 3 review)', () => {
it('child killed with SIGTERM after 1 closed round persists interrupted, not below_threshold', async () => {
const { bus, events } = makeBus();
const artifactDir = join(tmpDir, 'sigterm-1');
let resolveExit!: (v: { code: number | null; signal: string | null }) => void;
const childExitPromise = new Promise<{ code: number | null; signal: string | null }>((r) => { resolveExit = r; });
const child = { kill: (): boolean => true };
// Schedule the SIGTERM to arrive shortly after the parser closes round 1.
setTimeout(() => resolveExit({ code: null, signal: 'SIGTERM' }), 75);
const result = await runOrchestrator({
runId: 'r-sigterm',
projectId: 'p1',
conversationId: null,
artifactId: 'a1',
artifactDir,
adapter: 'claude',
cfg: defaultCritiqueConfig(),
db,
bus,
stdout: roundOneThenStall(),
child,
childExitPromise,
});
expect(result.status).toBe('interrupted');
const row = getCritiqueRun(db, 'r-sigterm');
expect(row?.status).toBe('interrupted');
// Synthetic ship event must carry status='interrupted' (not below_threshold).
const shipEvents = events.filter((e) => e.event === 'critique.ship');
expect(shipEvents).toHaveLength(1);
const shipPayload = shipEvents[0]?.data as { status: string } | undefined;
expect(shipPayload?.status).toBe('interrupted');
// Round 1 closed with composite ~9.0, so the fallback round should hold.
expect(result.composite).not.toBeNull();
expect(result.composite!).toBeGreaterThan(8.0);
});
it('shipped run persists the SHIP <ARTIFACT> body to disk and pins the absolute path on the row', async () => {
const { bus } = makeBus();
const artifactDir = join(tmpDir, 'no-artifact');
const stream = `<CRITIQUE_RUN version="1" maxRounds="3" threshold="8.0" scale="10">
<ROUND n="1">
<PANELIST role="designer">
<NOTES>v1</NOTES>
<ARTIFACT mime="text/html"><![CDATA[<html></html>]]></ARTIFACT>
</PANELIST>
<PANELIST role="critic" score="9.0"><DIM name="h" score="9">ok</DIM></PANELIST>
<PANELIST role="brand" score="9.0"><DIM name="v" score="9">ok</DIM></PANELIST>
<PANELIST role="a11y" score="9.0"><DIM name="c" score="9">ok</DIM></PANELIST>
<PANELIST role="copy" score="9.0"><DIM name="x" score="9">ok</DIM></PANELIST>
<ROUND_END n="1" composite="9.0" must_fix="0" decision="ship"><REASON>ok</REASON></ROUND_END>
</ROUND>
<SHIP round="1" composite="9.0" status="shipped">
<ARTIFACT mime="text/html"><![CDATA[<html><body>final</body></html>]]></ARTIFACT>
<SUMMARY>Done.</SUMMARY>
</SHIP>
</CRITIQUE_RUN>`;
async function* streamOf(text: string): AsyncIterable<string> {
for (let i = 0; i < text.length; i += 64) yield text.slice(i, i + 64);
}
const result = await runOrchestrator({
runId: 'r-shipped',
projectId: 'p1',
conversationId: null,
artifactId: 'a1',
artifactDir,
adapter: 'claude',
cfg: defaultCritiqueConfig(),
db,
bus,
stdout: streamOf(stream),
});
expect(result.status).toBe('shipped');
expect(result.artifactPath).toBeTruthy();
expect(result.artifactPath!.endsWith('artifact.html')).toBe(true);
const row = getCritiqueRun(db, 'r-shipped');
expect(row?.artifactPath).toBe(result.artifactPath);
// The bytes on disk are exactly what the agent shipped, with the
// CDATA wrapper from the SHIP <ARTIFACT> stripped.
expect(existsSync(result.artifactPath!)).toBe(true);
expect(readFileSync(result.artifactPath!, 'utf8')).toBe('<html><body>final</body></html>');
});
});