feat(plugins): od plugin simulate <id> pipeline dry-run (Phase 4)

Plan EE1.

apps/daemon/src/plugins/simulate.ts ships two pure helpers:

  simulatePipeline({ pipeline, signals, iterationCap? })
    \u2192 SimulatePipelineResult { stages[], totalIterations, outcome }

  parseSignalKv(['critique.score=4', 'build.passing=true', ...])
    \u2192 { signals, warnings[] } honouring the closed UntilSignals
      vocabulary (critique.score / iterations / user.confirmed /
      preview.ok / build.passing / tests.passing).

The simulator walks every stage in pipeline.stages:
  - Single-shot stages (repeat=false) run exactly once \u2192
    outcome='single'.
  - Repeat:true stages without an until expression also collapse
    to outcome='single' (single iteration).
  - Repeat:true stages with an until expression iterate until the
    expression satisfies (\u2192 'converged'), the iterationCap is
    hit (\u2192 'cap'), or the expression fails to parse (\u2192
    'unparsable'). Cap defaults to 10.

Caller-supplied signals can be a constant snapshot OR a generator
function (stageId, iteration) \u2192 signals so authors can model
'score grows over iterations' / 'build.passing flips on iter 3'
scenarios deterministically.

Aggregate outcome on the result rolls up to one of:
  'all-converged' | 'all-single' | 'mixed' | 'cap-hit' | 'unparsable'.

CLI: `od plugin simulate <pluginId> [-s key=value ...] [--cap <n>] [--json]`.
  - Repeatable -s flags so a shell can pin every signal in one
    call. Unknown signal keys surface as warnings (typo guard).
  - Exit 4 on cap-hit / unparsable so CI can wire into a pipeline
    check easily.
  - --json emits the full SimulatePipelineResult.

printPluginHelp() updated.

Daemon tests: 1753 \u2192 1766 (+13 cases on plugins-simulate:
single-shot stages, repeat=true converged, repeat=true cap-hit,
unparsable until, signal generator function (per-iteration drift),
boolean compound until (build.passing == true && tests.passing
== true), aggregate 'mixed' outcome, totalIterations sum,
parseSignalKv vocabulary + warnings on unknown / malformed /
type-mismatch keys).

Co-authored-by: Tom Huang <1043269994@qq.com>
This commit is contained in:
Cursor Agent 2026-05-09 17:25:51 +00:00
parent 2da68ba0a6
commit 4a89736919
No known key found for this signature in database
4 changed files with 493 additions and 0 deletions

View file

@ -835,6 +835,7 @@ async function runPlugin(args) {
case 'replay': return runPluginReplay(rest);
case 'trust': return runPluginTrust(rest);
case 'snapshots': return runPluginSnapshots(rest);
case 'simulate': return runPluginSimulate(rest);
case 'run': return runPluginRun(rest);
case 'scaffold': return runPluginScaffold(rest);
case 'validate': return runPluginValidate(rest);
@ -1749,6 +1750,110 @@ async function runPluginInstall(rest) {
// Plan §3.Z2 — `od plugin upgrade <id>`. Re-installs the plugin
// from its recorded source. Streams the same SSE event shape as
// install, so 'progress' / 'success' / 'error' arrive verbatim.
// Plan §3.EE1 — `od plugin simulate <pluginId> [-s key=value ...]`.
//
// Walks the plugin's pipeline against caller-supplied signals and
// reports per-stage convergence (iterations + outcome). No LLM is
// invoked — this is a pure devloop dry-run for testing 'until'
// expressions.
//
// Signals are supplied via repeatable -s key=value flags. The
// closed UntilSignals vocabulary applies (critique.score /
// iterations / user.confirmed / preview.ok / build.passing /
// tests.passing); unknown keys surface as warnings.
async function runPluginSimulate(rest) {
const flags = parseFlags(rest, {
string: new Set([...PLUGIN_STRING_FLAGS, 's', 'cap']),
boolean: PLUGIN_BOOLEAN_FLAGS,
});
const positional = rest.filter((a) => !a.startsWith('-'));
const id = positional[0];
if (flags.help || flags.h || !id) {
console.log(`Usage:
od plugin simulate <pluginId> [-s key=value ...] [--cap <n>] [--json]
Walks the plugin's pipeline against caller-supplied signals and
reports per-stage convergence. No LLM is invoked.
Examples:
# critique-theater stage that exits when score >= 4
od plugin simulate my-plugin -s critique.score=5
# build-test devloop where both signals must hold
od plugin simulate code-migration \\
-s build.passing=true -s tests.passing=true
# raise the per-stage iteration cap (default 10)
od plugin simulate my-plugin -s critique.score=2 --cap 20
Closed signal vocabulary:
critique.score (number)
iterations (number)
user.confirmed (boolean)
preview.ok (boolean)
build.passing (boolean)
tests.passing (boolean)`);
process.exit(id ? 0 : 2);
}
// Collect every -s value (parseFlags returns the last only).
const sValues = [];
for (let i = 0; i < rest.length; i++) {
if ((rest[i] === '-s' || rest[i] === '--signal') && typeof rest[i + 1] === 'string') {
sValues.push(rest[i + 1]);
}
}
// Fetch the plugin from the daemon so we get the resolved
// manifest (including pipeline).
const base = pluginDaemonUrl(flags).replace(/\/$/, '');
const resp = await fetch(`${base}/api/plugins/${encodeURIComponent(id)}`);
if (resp.status === 404) {
console.error(`plugin ${id} not found`);
process.exit(65);
}
if (!resp.ok) {
console.error(`GET /api/plugins/${id} failed: ${resp.status} ${await resp.text()}`);
process.exit(1);
}
const plugin = await resp.json();
const pipeline = plugin?.manifest?.od?.pipeline;
if (!pipeline || !Array.isArray(pipeline.stages) || pipeline.stages.length === 0) {
if (flags.json) {
process.stdout.write(JSON.stringify({ outcome: 'no-pipeline', stages: [] }, null, 2) + '\n');
} else {
console.log(`[simulate] plugin ${id} has no od.pipeline (or it is empty); nothing to walk.`);
}
return;
}
const { simulatePipeline, parseSignalKv } = await import('./plugins/simulate.js');
const parsedSignals = parseSignalKv(sValues);
for (const w of parsedSignals.warnings) console.warn(`[simulate] warn: ${w}`);
const cap = typeof flags.cap === 'string' ? Number(flags.cap) : undefined;
const result = simulatePipeline({
pipeline,
signals: parsedSignals.signals,
...(Number.isFinite(cap) && cap > 0 ? { iterationCap: cap } : {}),
});
if (flags.json) {
process.stdout.write(JSON.stringify(result, null, 2) + '\n');
return;
}
console.log(`[simulate] plugin ${id} \u2014 outcome: ${result.outcome}, totalIterations: ${result.totalIterations}`);
for (const stage of result.stages) {
const tag = stage.outcome === 'converged' ? '\u2713'
: stage.outcome === 'cap' ? '\u2717'
: stage.outcome === 'unparsable' ? '!'
: '\u2014';
const reason = stage.reason ? ` (${stage.reason})` : '';
const matched = stage.matched && stage.matched.length > 0
? ` matched=[${stage.matched.map((c) => `${c.signal}${c.op}${c.value}`).join(' && ')}]`
: '';
console.log(` ${tag} ${stage.stageId}: ${stage.outcome} (${stage.iterations} iter)${reason}${matched}`);
}
// Exit non-zero on cap-hit / unparsable so CI can wire this
// into a pipeline check easily.
if (result.outcome === 'cap-hit' || result.outcome === 'unparsable') process.exit(4);
}
// Plan §3.CC1 / §3.DD2 — `od plugin canon <snapshotId>`. Prints the
// canonical `## Active plugin` block a snapshot will splice into
// the system prompt. Useful for understanding what the agent
@ -2514,6 +2619,9 @@ function printPluginHelp() {
od plugin doctor <id> Lint a plugin's manifest, atoms and resolved refs.
od plugin canon <snapshotId> Print the canonical system-prompt block for a snapshot.
(--check <file> for byte-equality fixtures.)
od plugin simulate <pluginId> [-s k=v] Walk the plugin's pipeline against caller-supplied
signals; report stage convergence + iterations
(no LLM in the loop).
od plugin diff <a> <b> [--json] Compare two installed plugins by id.
od plugin replay <runId> --snapshot-id <id>
Re-emit the immutable snapshot a run launched against.

View file

@ -40,6 +40,14 @@ export {
type SnapshotInventoryStats,
type SnapshotStatsRow,
} from './stats.js';
export {
simulatePipeline,
parseSignalKv,
type SimulatePipelineInput,
type SimulatePipelineResult,
type SimulateStageOutcome,
type StageSignalProvider,
} from './simulate.js';
export * from './atoms/build-test.js';
export * from './atoms/code-import.js';
export * from './atoms/design-extract.js';

View file

@ -0,0 +1,208 @@
// Phase 4 / spec §10.1 / plan §3.EE1 — pipeline simulator.
//
// Pure helper that walks a `PluginPipeline` against a caller-
// supplied signal stream and reports per-stage convergence
// without running an actual LLM / agent. Useful for plugin
// authors testing devloop convergence rules:
//
// - 'does my critique-theater stage exit on score>=4?'
// - 'does my build-test stage timeout after 8 iterations?'
// - 'does my partial-decision diff-review re-prompt forever?'
//
// The simulator is deterministic: callers supply per-iteration
// signals as either a constant SignalSnapshot or a generator
// function `(iteration) => signals`. The pipeline's `until`
// expression is evaluated via the same evaluator the runtime
// uses (no parallel implementation drift).
//
// Sister of plugins/pipeline-runner.ts (which drives a live
// run); this module is the dry-run counterpart with no SQLite
// or SSE side effects.
import type { PluginPipeline } from '@open-design/contracts';
import { evaluateUntil, parseUntil, type UntilSignals } from './until.js';
export interface StageSignalProvider {
/**
* Returns the signals snapshot for a given (stageId, iteration)
* tuple. The caller defines the simulation policy; common
* choices:
*
* - constant signals across every stage (`() => ({ 'critique.score': 5 })`)
* - per-stage tables (`(stageId) => stageSignals[stageId] ?? {}`)
* - increasing signals across iterations (e.g. score grows
* toward convergence)
*/
(stageId: string, iteration: number): UntilSignals;
}
export interface SimulatePipelineInput {
pipeline: PluginPipeline;
signals: StageSignalProvider | UntilSignals;
// Per-stage iteration cap. Defaults to 10. The simulator clamps
// 'repeat: true' stages at this ceiling even when the until
// expression never satisfies — surfaces a 'never converges'
// bug as a hit-cap event rather than an infinite loop.
iterationCap?: number;
}
export interface SimulateStageOutcome {
stageId: string;
iterations: number;
// 'converged' = until satisfied. 'cap' = iterationCap hit before
// until satisfied (for repeat:true stages). 'unparsable' = until
// expression failed to parse (the simulator records this as a
// single-iteration unparsable run rather than re-throwing).
// 'single' = repeat:false stage; ran exactly once.
outcome: 'converged' | 'cap' | 'unparsable' | 'single';
// The signals snapshot from the iteration that triggered the
// exit, for audit. For 'single' / 'cap', the last iteration's
// snapshot.
finalSignals: UntilSignals;
// The matched conjunction (when outcome='converged'). Empty
// otherwise. Useful when a stage has multiple OR branches and
// the author wants to know which one fired.
matched?: ReturnType<typeof evaluateUntil>['matched'];
// Reason text for 'unparsable' / 'cap' outcomes. Absent on
// converged / single.
reason?: string;
}
export interface SimulatePipelineResult {
stages: SimulateStageOutcome[];
// Aggregate: total iterations across every stage.
totalIterations: number;
// 'all-converged' / 'all-single' / 'mixed' / 'cap-hit' /
// 'unparsable'. Quick-look outcome the CLI prints first.
outcome: 'all-converged' | 'all-single' | 'mixed' | 'cap-hit' | 'unparsable';
}
const DEFAULT_ITERATION_CAP = 10;
export function simulatePipeline(input: SimulatePipelineInput): SimulatePipelineResult {
const cap = input.iterationCap ?? DEFAULT_ITERATION_CAP;
const provider: StageSignalProvider = typeof input.signals === 'function'
? input.signals as StageSignalProvider
: () => input.signals as UntilSignals;
const stages: SimulateStageOutcome[] = [];
for (const stage of input.pipeline.stages) {
const stageOutcome = simulateStage(stage, provider, cap);
stages.push(stageOutcome);
}
const totalIterations = stages.reduce((acc, s) => acc + s.iterations, 0);
const outcome = aggregateOutcome(stages);
return { stages, totalIterations, outcome };
}
function simulateStage(
stage: PluginPipeline['stages'][number],
provider: StageSignalProvider,
cap: number,
): SimulateStageOutcome {
const stageId = stage.id;
const repeat = stage.repeat === true;
const untilSource = stage.until;
if (!repeat || !untilSource) {
// Single-shot stage: one iteration, no until eval.
const finalSignals = provider(stageId, 0);
const out: SimulateStageOutcome = {
stageId,
iterations: 1,
outcome: 'single',
finalSignals,
};
return out;
}
let parsed;
try {
parsed = parseUntil(untilSource);
} catch (err) {
const finalSignals = provider(stageId, 0);
return {
stageId,
iterations: 1,
outcome: 'unparsable',
finalSignals,
reason: (err as Error).message,
};
}
let lastSignals: UntilSignals = {};
for (let i = 0; i < cap; i++) {
const signals = provider(stageId, i);
lastSignals = signals;
const eval_ = evaluateUntil(parsed, signals);
if (eval_.satisfied) {
const out: SimulateStageOutcome = {
stageId,
iterations: i + 1,
outcome: 'converged',
finalSignals: signals,
};
if (eval_.matched.length > 0) out.matched = eval_.matched;
return out;
}
}
return {
stageId,
iterations: cap,
outcome: 'cap',
finalSignals: lastSignals,
reason: `until expression never satisfied within iterationCap=${cap}`,
};
}
function aggregateOutcome(stages: SimulateStageOutcome[]): SimulatePipelineResult['outcome'] {
if (stages.length === 0) return 'all-single';
if (stages.some((s) => s.outcome === 'unparsable')) return 'unparsable';
if (stages.some((s) => s.outcome === 'cap')) return 'cap-hit';
if (stages.every((s) => s.outcome === 'single')) return 'all-single';
if (stages.every((s) => s.outcome === 'converged' || s.outcome === 'single')) {
return stages.every((s) => s.outcome === 'converged') ? 'all-converged' : 'mixed';
}
return 'mixed';
}
// Convenience helper: parse a key=value list of signals from CLI
// flags. 'critique.score=4 build.passing=true tests.passing=false'
// → { 'critique.score': 4, 'build.passing': true, 'tests.passing': false }
//
// Respects the closed UntilSignals vocabulary: unknown keys are
// dropped with a warning so a typo doesn't silently make the
// simulator pass.
export function parseSignalKv(args: ReadonlyArray<string>): { signals: UntilSignals; warnings: string[] } {
const out: UntilSignals = {};
const warnings: string[] = [];
const knownNumeric: Array<keyof UntilSignals> = ['critique.score', 'iterations'];
const knownBoolean: Array<keyof UntilSignals> = ['user.confirmed', 'preview.ok', 'build.passing', 'tests.passing'];
for (const arg of args) {
const eq = arg.indexOf('=');
if (eq <= 0) {
warnings.push(`signal must be 'key=value', got '${arg}'`);
continue;
}
const key = arg.slice(0, eq).trim();
const raw = arg.slice(eq + 1).trim();
if (knownNumeric.includes(key as keyof UntilSignals)) {
const num = Number(raw);
if (Number.isFinite(num)) {
(out as Record<string, number>)[key] = num;
} else {
warnings.push(`signal ${key} expected number; got '${raw}'`);
}
continue;
}
if (knownBoolean.includes(key as keyof UntilSignals)) {
if (raw === 'true' || raw === '1') (out as Record<string, boolean>)[key] = true;
else if (raw === 'false' || raw === '0') (out as Record<string, boolean>)[key] = false;
else warnings.push(`signal ${key} expected boolean; got '${raw}'`);
continue;
}
warnings.push(`unknown signal '${key}' (allowed: critique.score / iterations / user.confirmed / preview.ok / build.passing / tests.passing)`);
}
return { signals: out, warnings };
}

View file

@ -0,0 +1,169 @@
// Plan §3.EE1 — simulatePipeline + parseSignalKv pure helpers.
import { describe, expect, it } from 'vitest';
import type { PluginPipeline } from '@open-design/contracts';
import { parseSignalKv, simulatePipeline } from '../src/plugins/simulate.js';
const pipe = (stages: PluginPipeline['stages']): PluginPipeline => ({ stages });
describe('simulatePipeline — single-shot stages', () => {
it("emits outcome='single' for repeat:false stages", () => {
const result = simulatePipeline({
pipeline: pipe([{ id: 'plan', atoms: ['todo-write'] }]),
signals: {},
});
expect(result.outcome).toBe('all-single');
expect(result.stages[0]?.outcome).toBe('single');
expect(result.stages[0]?.iterations).toBe(1);
});
it("emits outcome='single' even when until is set but repeat:false", () => {
const result = simulatePipeline({
pipeline: pipe([{ id: 'plan', atoms: ['todo-write'], until: 'critique.score>=4' }]),
signals: { 'critique.score': 0 },
});
expect(result.stages[0]?.outcome).toBe('single');
});
});
describe('simulatePipeline — repeat: true stages', () => {
it("emits outcome='converged' when the until expression is satisfied", () => {
const result = simulatePipeline({
pipeline: pipe([{
id: 'critique', atoms: ['critique-theater'], repeat: true,
until: 'critique.score>=4',
}]),
signals: { 'critique.score': 5 },
});
expect(result.outcome).toBe('all-converged');
expect(result.stages[0]?.outcome).toBe('converged');
expect(result.stages[0]?.iterations).toBe(1);
expect(result.stages[0]?.matched).toBeDefined();
});
it("emits outcome='cap' when the until never satisfies within iterationCap", () => {
const result = simulatePipeline({
pipeline: pipe([{
id: 'critique', atoms: ['critique-theater'], repeat: true,
until: 'critique.score>=4',
}]),
signals: { 'critique.score': 2 },
iterationCap: 3,
});
expect(result.outcome).toBe('cap-hit');
expect(result.stages[0]?.outcome).toBe('cap');
expect(result.stages[0]?.iterations).toBe(3);
expect(result.stages[0]?.reason).toMatch(/never satisfied/);
});
it("emits outcome='unparsable' when the until expression fails to parse", () => {
const result = simulatePipeline({
pipeline: pipe([{
id: 'critique', atoms: ['critique-theater'], repeat: true,
until: 'this is not a valid expression',
}]),
signals: {},
});
expect(result.outcome).toBe('unparsable');
expect(result.stages[0]?.outcome).toBe('unparsable');
expect(result.stages[0]?.reason).toBeTruthy();
});
it('honours per-iteration signal providers (signals can change as iterations run)', () => {
const result = simulatePipeline({
pipeline: pipe([{
id: 'critique', atoms: ['critique-theater'], repeat: true,
until: 'critique.score>=4',
}]),
// Score grows across iterations; converges on iteration 4 (i=3).
signals: (_stageId, iteration) => ({ 'critique.score': iteration + 1 }),
});
expect(result.stages[0]?.outcome).toBe('converged');
expect(result.stages[0]?.iterations).toBe(4);
});
it('honours boolean compound until (build.passing == true && tests.passing == true)', () => {
const both = simulatePipeline({
pipeline: pipe([{
id: 'verify', atoms: ['patch-edit', 'build-test'], repeat: true,
until: 'build.passing == true && tests.passing == true',
}]),
signals: { 'build.passing': true, 'tests.passing': true },
});
expect(both.stages[0]?.outcome).toBe('converged');
const oneShort = simulatePipeline({
pipeline: pipe([{
id: 'verify', atoms: ['patch-edit', 'build-test'], repeat: true,
until: 'build.passing == true && tests.passing == true',
}]),
signals: { 'build.passing': true, 'tests.passing': false },
iterationCap: 2,
});
expect(oneShort.stages[0]?.outcome).toBe('cap');
});
});
describe('simulatePipeline — aggregate outcome', () => {
it("'mixed' when some stages are single and some are converged", () => {
const result = simulatePipeline({
pipeline: pipe([
{ id: 'plan', atoms: ['todo-write'] },
{ id: 'critique', atoms: ['critique-theater'], repeat: true, until: 'critique.score>=4' },
]),
signals: { 'critique.score': 5 },
});
expect(result.outcome).toBe('mixed');
});
it('totalIterations sums per-stage iterations', () => {
const result = simulatePipeline({
pipeline: pipe([
{ id: 'plan', atoms: ['todo-write'] },
{ id: 'verify', atoms: ['patch-edit', 'build-test'], repeat: true, until: 'build.passing == true && tests.passing == true' },
]),
signals: (_id, i) => i >= 2 ? { 'build.passing': true, 'tests.passing': true } : { 'build.passing': false, 'tests.passing': false },
});
// plan = 1, verify = 3 (converges on i=2 → iteration 3).
expect(result.totalIterations).toBe(4);
});
});
describe('parseSignalKv', () => {
it('parses numbers and booleans into the closed UntilSignals vocabulary', () => {
const r = parseSignalKv([
'critique.score=4',
'iterations=2',
'user.confirmed=true',
'preview.ok=false',
'build.passing=1',
'tests.passing=0',
]);
expect(r.signals).toEqual({
'critique.score': 4,
'iterations': 2,
'user.confirmed': true,
'preview.ok': false,
'build.passing': true,
'tests.passing': false,
});
expect(r.warnings).toEqual([]);
});
it('warns on unknown signal keys (typo guard)', () => {
const r = parseSignalKv(['critic.score=4']);
expect(r.signals).toEqual({});
expect(r.warnings[0]).toMatch(/unknown signal/);
});
it("warns on missing '=' separator", () => {
const r = parseSignalKv(['critique.score4']);
expect(r.warnings[0]).toMatch(/key=value/);
});
it('warns on type mismatch (numeric signal with non-number value)', () => {
const r = parseSignalKv(['critique.score=high']);
expect(r.signals).toEqual({});
expect(r.warnings[0]).toMatch(/expected number/);
});
});