mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
* feat(daemon): add Antigravity agent adapter
Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection
picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a
single non-interactive turn, and the assistant text reply streams back
on stdout. OAuth is shared with the Antigravity IDE through the system
keyring, so users who have signed into the desktop app are authenticated
on first run with no extra step.
`agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue
#119), no `--model` flag (issue #35), and no MCP forwarding hook yet —
the adapter ships with `streamFormat: 'plain'` and a single `default`
fallback model so the model picker doesn't mislead users into thinking
their choice is wired through. We will upgrade buildArgs + add a
dedicated event parser when upstream ships structured output.
Also gitignores `.antigravitycli/`, the project-local config directory
`agy` auto-creates on every run (upstream issue #175).
* fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard
- Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the
30KB maxPromptArgBytes limit that blocked real-world composed prompts
- Add official Antigravity brand SVG icon to agent picker
- Fix repeated question-form loop for plain agents by injecting an
OVERRIDE block when form answers are already present in the transcript
- Add empty-output guard for plain agents so expired auth or silent
failures surface a user-visible error instead of a blank "Done" turn
* feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification
PR #3157 follow-up integrating four iterations from end-to-end manual
testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3.
Each section is independently verifiable; combined they're what made
the first successful artifact generation work end-to-end.
## Model picker via settings.json (agy has no --model flag)
agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the
TUI Switch-Model picker writes the chosen label to
`~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every
`-p` invocation re-reads that file on startup — verified by capturing
the `--log-file` line `Propagating selected model override to backend:
label="<model>"`. Antigravity's `fallbackModels` now lists the 8
labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude
Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs`
persists the user's choice to settings.json right before spawn. The
synthetic `default` id is preserved — picking it leaves settings.json
untouched so a user who switches models from agy's own TUI keeps
their choice.
Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's
hardcoded blocklist in `SettingsDialog.tsx` migrates to the
declarative flag (it rejects free-form ids at the ACP layer), and
antigravity opts out because its label set is a server-side enum that
silently fails on unrecognised strings.
## Form-loop fix (transcript sanitizer + stronger OVERRIDE)
The discovery form loop on weak/medium plain-stream models (GPT-OSS
120B Medium, Gemini 3.5 Flash) had two reinforcing causes:
1. `buildDaemonTranscript` packed the prior assistant turn's
literal `<question-form>` markup into the user request on the
next turn, giving the model a template to echo. New
`sanitizePriorAssistantTurnForTranscript` strips
`<question-form>...</question-form>` blocks and ```json fences
that match form-schema shape, replacing them with a brief
placeholder. User content is preserved verbatim (a user who
legitimately mentions `<question-form>` in chat keeps their
message intact).
2. The OVERRIDE block on form-answered turns was 4 lines and only
banned the bare `<question-form>` tag — models still emitted the
fenced JSON, form-asking prose ("Got it — tell me the following"),
and fake system events ("subagents stopped"). The new
`FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and
pins them via tests, so silently weakening any line reintroduces
the regression.
Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext.
hasPriorAssistantTurn as forward-looking abstractions (skipTranscript
option on composeChatUserRequestForAgent). Antigravity does NOT opt
in — agy's `-c` resume activates an internal agentic loop with tool
retries and fallback-to-cached-response on tool errors that the OD
system prompt cannot steer; reverted after seeing byte-identical
form re-emissions caused by agy's own retry logic, not OD's transcript.
## One-click OAuth via system terminal
agy print mode can't complete Google Sign-In on its own (the OAuth
callback page asks the user to paste an auth code back into agy, but
`-p` has no input field). Before this commit the auth banner only
told the user to "open a terminal yourself."
Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform
launcher in `runtimes/terminal-launch.ts`:
- macOS: osascript → Terminal.app `do script "agy"` + activate
- Linux: tries x-terminal-emulator, gnome-terminal, konsole,
xfce4-terminal, xterm in order
- Windows: `cmd /c start "Open Design" cmd /k agy`
The endpoint hardcodes the `agy` command (no user input → no shell
injection surface) and is loopback-gated like the other daemon
endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a
"Sign in via terminal" button next to Retry; clicking it spawns the
terminal so the user can finish OAuth in one click.
## Silent-failure classification (auth vs quota via --log-file)
agy print mode is silent on stdout/stderr for both missing-OAuth AND
quota-exhausted failures — the upstream
`RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the
`not logged into Antigravity` line only surface in agy's
`--log-file`. Without log inspection the daemon misread quota as
"auth required" and showed the wrong banner.
`RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp
path that antigravity's buildArgs translates to `--log-file <path>`.
The empty-output guard now reads that log on a `code === 0 &&
!childStdoutSeen` exit, feeds the tail to
`classifyAgentServiceFailure`, and routes:
- "not logged into Antigravity" → AGENT_AUTH_REQUIRED with
antigravityAuthGuidance
- "RESOURCE_EXHAUSTED" / "quota" / → RATE_LIMITED with
"Individual quota reached" antigravityQuotaGuidance
- none of the above (rare) → fall back to auth guidance
as the most likely cause
Both surface a terminal launcher in the auth banner: auth gets "Sign
in via terminal", quota gets "Switch model in terminal" — same
endpoint, contextual label. The handler is identical (open agy in a
terminal); the user either signs in or uses agy's Switch Model
picker to pick a model with available quota.
## Validation
- `pnpm guard` pass
- `pnpm --filter @open-design/daemon` runtime + telemetry suites:
192 passed, 1 skipped (the 1 pre-existing `task-type` failure on
origin/main is unrelated to this change)
- `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance
/ AgentIcon suites pass (51 web tests)
- Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B
Medium: turn-1 question-form rendered correctly, turn-2 produced
`<artifact>` with full HTML (3.3KB Modern Minimal design) instead
of re-emitting the form. agy `--log-file` content correctly
classified as RATE_LIMITED when Gemini Pro quota was exhausted,
and as AGENT_AUTH_REQUIRED when keychain was cleared.
* fix(web/test): align amrAgent fixture with supportsCustomModel contract
The AMR agent definition in the daemon ships `supportsCustomModel: false`
so the Settings model picker hides the free-text "Custom…" option. The
PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded)
to `selected.supportsCustomModel !== false` (declarative), but the test
fixture was not updated to carry the same field — causing the
`__custom__` sentinel to appear in the picker under test.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery
CI surfaced two failures on the merge with main:
- chat-route.test marks submitted discovery form answers ... expected
the main-version wording 'Do not emit another <formId> form.'
- telemetry-message-finalization keeps non-discovery form answers
active ... expected task-type to fall through the else branch
('Treat these form answers as the active user turn'), not the
discovery RULE 2/RULE 3 build branch.
The colleague's earlier fba1e40b form-loop fix tightened both pieces
(stronger wording + grouped discovery|task-type into the build branch)
but didn't update the tests that pin the contract. Revert the
transition wording to main and re-scope the build directive to
'discovery' only. The aggressive form-loop suppression we added in
this PR now lives in the system-prompt FORM_ANSWERED_SYSTEM_OVERRIDE
block, which is far stronger than the user-request transition text
this commit reverts.
* fix(daemon): scope formOverride by form id, detach Linux terminal, move agy log cleanup to finally
- FORM_ANSWERED_GENERIC_OVERRIDE: new exported constant for non-discovery/
non-task-type form ids; contains only the "do not re-ask" suppression
without the RULE 2 / RULE 3 / artifact directive.
- formAnswerTransitionForCurrentPrompt: extend build-transition branch to
include task-type alongside discovery, keeping user-turn and system
override consistent.
- Prompt assembly (server.ts ~10848): derive formOverride from the parsed
form id — FORM_ANSWERED_SYSTEM_OVERRIDE for discovery/task-type,
FORM_ANSWERED_GENERIC_OVERRIDE for all other form ids, empty otherwise.
- launchOnLinux: replace execFileAsync (waited for terminal exit, 3 s cap)
with spawn({ detached: true, stdio: 'ignore' }) + unref(); resolve on
the 'spawn' event so long-lived interactive terminals (xterm, konsole)
are not killed mid-OAuth-flow.
- Antigravity log cleanup: move fs.promises.unlink(agentLogFilePath) into
a try/finally wrapper around the close handler so every exit path
(success, failure, cancel, non-zero exit) cleans up the per-run temp
file, preventing unbounded /tmp accumulation.
- Tests: rename task-type case to assert build-transition behaviour; add
generic-form-id case (preferences) pinning the non-build path; add
FORM_ANSWERED_GENERIC_OVERRIDE content assertions.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(daemon): switch Antigravity buildArgs to chat subcommand invocation
Replace top-level `-p -` with `agy chat [--log-file …] -` so the adapter
uses the documented chat subcommand and stdin sentinel instead of the
unrecognised global -p flag. Update the agent-args test description and
all four deepEqual assertions to assert the ['chat', '-'] shape.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* test(daemon): drop real-platform default-launch assertion from terminal-launch suite
The removed test called launchAgentInSystemTerminal('agy') with no
platform override, which invokes the real system terminal on every
developer machine running the daemon test suite (Terminal.app on macOS,
cmd.exe on Windows, xterm/gnome-terminal on Linux). That is an
unacceptable OS side effect for a unit test.
The behaviour being asserted — that omitting platform selects
process.platform — is a TypeScript default-parameter guarantee, not a
runtime invariant that needs an integration test. The remaining 'aix'
case continues to pin the unsupported-platform failure shape.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(daemon): buffer Antigravity stdout to suppress auth URL before close-time classifier
The plain-stream close handler at code===0 can detect an agy OAuth
prompt in agentStdoutTail and emit AGENT_AUTH_REQUIRED, but by the
time close fires the stdout chunk has already been forwarded to the
client via the plain-stream `send('stdout', { chunk })` path. This
leaves both the raw OAuth URL and the terminal-launch guidance visible
in chat.
Buffer all stdout chunks for the `antigravity` agent instead of
forwarding them immediately. The existing close-time auth-prompt guard
(code===0, !trackingSubstantiveOutput, childStdoutSeen) returns early
when it detects the auth pattern, leaving the buffer unflushed and the
OAuth URL out of the SSE stream. For legitimate assistant output the
buffer is flushed in order just before design.runs.finish so the
chunks still arrive before the run's finished event.
Adds a chat-route integration test using a fake `agy` that exits 0
after printing the canonical auth prompt; asserts that the run emits
AGENT_AUTH_REQUIRED with no event: stdout delta containing the URL.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* test(daemon): isolate antigravity buildArgs argv test from real settings file
Pass a temp antigravitySettingsPath in the RuntimeContext for the
withModel argv assertion so unit tests do not touch
~/.gemini/antigravity-cli/settings.json. Adds the optional
antigravitySettingsPath field to RuntimeContext and threads it
through buildArgs to writeAntigravityModelSelection; production
callers leave it undefined, preserving the existing default path.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(daemon): revert Antigravity buildArgs to `-p -` (the only working agy v1.0.3 invocation)
The looper-reviewer-bot reported `chat` as agy's headless subcommand
based on its environment's agy build, and looper-fixer applied that
shape. The installed CLI (`agy --version` reports `1.0.3`) does NOT
expose a `chat` subcommand — `agy --help`'s `Available subcommands`
section lists only `changelog / help / install / plugin / update`,
and `agy chat - < prompt` exits 0 with empty stdout (the daemon then
forwards it as a 'successful' empty reply, exactly the failure mode
the auth/quota guard at server.ts ~12090 is meant to catch — for the
wrong reason).
`-p` is the documented print-mode flag (`Short alias for --print`)
and `agy -p -` reads the prompt from stdin and prints the model
reply, which the entire end-to-end test sequence in this PR has
verified against (form-loop fix, settings.json model routing,
log-file classification all confirmed working on Gemini 3.5 Flash
+ GPT-OSS 120B Medium with this invocation).
Updates the agent-args test to pin `['-p', '-']` instead of
`['chat', '-']` and adds an inline comment in antigravity.ts noting
that `chat` may exist in a future agy build but is not the contract
on the installed CLI today.
* fix(daemon): serialize Antigravity concrete-model spawns to dodge settings.json race
Reviewer (looper) flagged a concurrency race in the model-routing path:
~/.gemini/antigravity-cli/settings.json is process-global, so two OD
runs starting close together with different concrete models can race
the file — run A writes model A, run B writes model B, then A's agy
finally reads settings.json and executes on model B. The Settings
model picker becomes nondeterministic under parallel conversations.
Adds a per-process promise chain in antigravity.ts:
- acquireAntigravityModelLock(): chain-await + return release fn
- waitForAgyToReadModel(logPath, expected): polls agy's --log-file
for the upstream signal
'Propagating selected model override to backend: label="<X>"'
which model_config_manager.go emits once agy has finished reading
settings.json. Returns true on observed match, false on timeout.
Regex-escapes the expected label so '(' / ')' in 'GPT-OSS 120B
(Medium)' match literally, not as a capture group.
server.ts spawn pipeline now acquires the lock BEFORE buildArgs (which
performs the settings.json write) and schedules a release-once handler
that fires when EITHER (a) the log-file confirms agy read the model
or (b) the child exits — the exit fallback prevents a stuck/crashed
agy from starving the queue for every subsequent antigravity spawn.
Default-model spawns bypass the lock entirely: their buildArgs doesn't
touch settings.json, so there's nothing to serialize.
Tests pin:
- FIFO ordering across 2 / 3 concurrent acquirers
- Wait helper's regex correctly matches parenthesized labels
- Wait helper does NOT match a different model with shared prefix
- Wait helper swallows missing-log-file errors and returns false on
timeout (no spawn-pipeline crash if the log never appears)
194 → 198 passing runtime tests, 0 regressions.
* fix(daemon): close Antigravity lock release race on slow agy startup (looper #263fd2fe7)
Reviewer flagged that the previous serialization scheduled
`releaseOnce` in `.finally()` on waitForAgyToReadModel — meaning the
helper's `false` timeout return ALSO released the lock. If agy took
longer than the 15s polling window to read settings.json (cold start,
swap-thrash, slow network handshake to the upstream backend), run A's
lock dropped at 15s, run B rewrote settings.json with model B, and
run A's still-starting agy then read the wrong model. Same race the
original mutex was meant to close.
Fix the release semantics to be release-on-confirmation-only:
- waitForAgyToReadModel: `false` now strictly means 'I gave up
polling,' not 'agy definitely did not read this.' Document the
contract so a future caller can't conflate the two. Add an
optional AbortSignal so server.ts can stop polling when the child
exits — without it, the leftover watcher could outlive the run
and accidentally match a later concurrent run's log content,
releasing the wrong lock.
- server.ts: schedule `releaseOnce` only when waitForAgyToReadModel
returns true. The exit handler (which fires for crashes, fast
exits, normal completion) is now the canonical fallback that
releases the lock no matter what — the queue can't starve
permanently because agy always exits eventually. The exit
handler also fires the AbortController so the watcher cleans up.
New tests pin:
- timeout returns false WITHOUT any release-implying side effect
- already-aborted signal short-circuits (no readFile calls)
- abort mid-poll wakes the helper from its setTimeout (no
multi-hundred-ms hang waiting out a poll interval that no longer
matters)
198 → 201 passing runtime tests, 0 regressions.
---------
Co-authored-by: qiongyu1999 <2694684348@qq.com>
266 lines
11 KiB
TypeScript
266 lines
11 KiB
TypeScript
import { describe, expect, it, vi } from 'vitest';
|
|
|
|
import {
|
|
FORM_ANSWERED_GENERIC_OVERRIDE,
|
|
composeChatUserRequestForAgent,
|
|
createFinalizedMessageTelemetryReporter,
|
|
shouldReportRunCompletedFromMessage,
|
|
telemetryPromptFromRunRequest,
|
|
} from '../src/server.js';
|
|
|
|
describe('Langfuse message finalization gate', () => {
|
|
const terminalMessage = {
|
|
id: 'assistant-1',
|
|
role: 'assistant',
|
|
content: 'final answer',
|
|
runId: 'run-1',
|
|
runStatus: 'succeeded',
|
|
};
|
|
|
|
it('does not report when only terminal runStatus has been persisted', () => {
|
|
expect(
|
|
shouldReportRunCompletedFromMessage(terminalMessage, {
|
|
...terminalMessage,
|
|
}),
|
|
).toBe(false);
|
|
});
|
|
|
|
it('reports only on the final telemetry-marked message write', () => {
|
|
expect(
|
|
shouldReportRunCompletedFromMessage(terminalMessage, {
|
|
...terminalMessage,
|
|
producedFiles: [],
|
|
telemetryFinalized: true,
|
|
}),
|
|
).toBe(true);
|
|
});
|
|
|
|
it('ignores non-terminal run statuses even if marked finalized', () => {
|
|
expect(
|
|
shouldReportRunCompletedFromMessage(
|
|
{ ...terminalMessage, runStatus: 'running' },
|
|
{ telemetryFinalized: true },
|
|
),
|
|
).toBe(false);
|
|
});
|
|
|
|
it('uses the explicit current prompt for telemetry instead of the full transcript', () => {
|
|
expect(
|
|
telemetryPromptFromRunRequest(
|
|
'## user\npre-consent brief\n\n## assistant\ndraft\n\n## user\npost-consent revision',
|
|
'post-consent revision',
|
|
),
|
|
).toBe('post-consent revision');
|
|
});
|
|
|
|
it('falls back to the legacy message when currentPrompt is absent', () => {
|
|
expect(telemetryPromptFromRunRequest('legacy prompt', undefined)).toBe(
|
|
'legacy prompt',
|
|
);
|
|
});
|
|
|
|
it('promotes discovery form answers above the transcript with a build-now instruction', () => {
|
|
const currentPrompt = [
|
|
'[form answers \u2014 discovery]',
|
|
'- output: Dashboard / tool UI',
|
|
'- brand: Pick a direction for me [value: pick_direction]',
|
|
].join('\n');
|
|
const prompt = composeChatUserRequestForAgent(
|
|
'## user\ninitial brief\n\n## assistant\n<form/>',
|
|
currentPrompt,
|
|
);
|
|
|
|
expect(prompt).toContain('## Latest user turn - form answers submitted');
|
|
expect(prompt).toContain(currentPrompt);
|
|
expect(prompt).toContain('The user has answered the discovery form.');
|
|
expect(prompt).toContain('For Branch B answers, build now instead of asking another brief.');
|
|
expect(prompt.indexOf('## Full conversation transcript')).toBeGreaterThan(
|
|
prompt.indexOf(currentPrompt),
|
|
);
|
|
});
|
|
|
|
it('task-type form answers trigger the build transition just like discovery', () => {
|
|
const prompt = composeChatUserRequestForAgent(
|
|
'## user\ninitial brief',
|
|
'[form answers - task-type]\n- taskType: Slide deck',
|
|
);
|
|
|
|
expect(prompt).toContain('The user has answered the task-type form.');
|
|
expect(prompt).toContain('build now instead of asking another brief');
|
|
expect(prompt).not.toContain('Treat these form answers as the active user turn');
|
|
});
|
|
|
|
it('unknown form ids get the generic transition without forcing the build', () => {
|
|
const prompt = composeChatUserRequestForAgent(
|
|
'## user\ninitial brief',
|
|
'[form answers - preferences]\n- theme: dark',
|
|
);
|
|
|
|
expect(prompt).toContain('The user has answered the preferences form.');
|
|
expect(prompt).toContain('Treat these form answers as the active user turn');
|
|
expect(prompt).not.toContain('build now instead of asking another brief');
|
|
});
|
|
|
|
// `agy -c` carries its own conversation memory, so packing the
|
|
// rendered web transcript (the `## user` / `## assistant` blocks)
|
|
// into the user request duplicates context the upstream CLI already
|
|
// has — AND the embedded copy includes the literal `<question-form>`
|
|
// markup the agent emitted on turn 1, which the model then re-emits
|
|
// on turn 2, looking like the discovery form loop never breaks.
|
|
// With `skipTranscript: true`, only the latest user turn ships and
|
|
// the misleading "## Full conversation transcript" header is dropped.
|
|
it('drops the transcript and transcript header when skipTranscript is true', () => {
|
|
const currentPrompt = [
|
|
'[form answers — discovery]',
|
|
'- output: Dashboard / tool UI',
|
|
'- brand: Pick a direction for me [value: pick_direction]',
|
|
].join('\n');
|
|
const transcript = [
|
|
'## user',
|
|
'初始需求',
|
|
'',
|
|
'## assistant',
|
|
'<question-form id="discovery">…</question-form>',
|
|
'',
|
|
'## user',
|
|
currentPrompt,
|
|
].join('\n');
|
|
|
|
const prompt = composeChatUserRequestForAgent(transcript, currentPrompt, {
|
|
skipTranscript: true,
|
|
});
|
|
|
|
// The form-answer transition still fires — that drives RULE 2 / 3.
|
|
expect(prompt).toContain('The user has answered the discovery form.');
|
|
// The latest user turn is preserved verbatim.
|
|
expect(prompt).toContain(currentPrompt);
|
|
// The transcript header is dropped — it was misleading because the
|
|
// body underneath is no longer a transcript.
|
|
expect(prompt).not.toContain('## Full conversation transcript');
|
|
// The prior assistant turn's `<question-form>` markup must NOT
|
|
// leak in — that's the form-loop regression we're guarding.
|
|
// (The transition block legitimately mentions "<question-form>"
|
|
// in prose, so the assertion targets the opening tag the prior
|
|
// turn carried, not the bare substring.)
|
|
expect(prompt).not.toContain('<question-form id="discovery">');
|
|
expect(prompt).not.toContain('## assistant');
|
|
});
|
|
|
|
// The aggressive form-answered OVERRIDE block is what tells weak
|
|
// plain agents (GPT-OSS-120B Medium, Gemini 3.5 Flash) to skip
|
|
// RULE 1's form example on follow-up turns. We pin the trigger
|
|
// condition AND the specific anti-patterns the literal carries,
|
|
// because silently weakening any of them — e.g. dropping the
|
|
// markdown-fence ban or the "subagents stopped" hallucination ban —
|
|
// reintroduces the form-echo regression we hit in PR #3157 on GPT-OSS.
|
|
it('FORM_ANSWERED_SYSTEM_OVERRIDE pins the anti-patterns weak plain agents need spelled out', async () => {
|
|
const { FORM_ANSWERED_SYSTEM_OVERRIDE } = await import('../src/server.js');
|
|
|
|
// Headline must call out that this is a follow-up turn, not turn 1.
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('## OVERRIDE — form already answered');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('turn 2 or later');
|
|
// RULE 1 stays in the prompt so turn 1 can still emit a valid form;
|
|
// OVERRIDE just demotes it to documentation for follow-up turns.
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('Treat RULE 1\nas read-only documentation');
|
|
|
|
// Forbidden anti-patterns observed in real captures:
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('`<question-form>` tag of any id');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('```json fenced block');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('Form-asking prose');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('"subagents stopped"');
|
|
|
|
// Required path: route to RULE 2 / RULE 3 so the model still
|
|
// emits the `<artifact>` block on the same turn.
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('RULE 2');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('RULE 3');
|
|
expect(FORM_ANSWERED_SYSTEM_OVERRIDE).toContain('`<artifact>`');
|
|
});
|
|
|
|
it('FORM_ANSWERED_GENERIC_OVERRIDE is used for non-discovery/task-type form ids', () => {
|
|
// Non-build-transition forms should get a smaller override that only
|
|
// suppresses re-asking — not the RULE 2 / RULE 3 / artifact directive.
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).toContain('## OVERRIDE — form already answered');
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).toContain('turn 2 or later');
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).toContain('Do not ask the same form again');
|
|
// Must NOT contain the artifact-build directive that only applies to
|
|
// discovery / task-type — sending it for an unrelated form id would give
|
|
// the model contradictory instructions.
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).not.toContain('RULE 2');
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).not.toContain('RULE 3');
|
|
expect(FORM_ANSWERED_GENERIC_OVERRIDE).not.toContain('`<artifact>`');
|
|
});
|
|
|
|
it('FORM_ANSWERED_SYSTEM_OVERRIDE only fires through composeChatUserRequestForAgent\'s transition gate', async () => {
|
|
// Defense-in-depth check: a turn that is NOT a form-answer follow-up
|
|
// (no `[form answers — …]` header in `currentPrompt`) must not
|
|
// surface any of the OVERRIDE language, even when `message` carries
|
|
// a transcript that mentions question-form. Otherwise we'd suppress
|
|
// the legitimate turn-1 form ask.
|
|
const transcript = '## user\n初始需求\n\n## assistant\n<question-form id="discovery">...</question-form>';
|
|
const currentPrompt = '继续做点修改';
|
|
|
|
const prompt = composeChatUserRequestForAgent(transcript, currentPrompt);
|
|
expect(prompt).not.toContain('OVERRIDE — form already answered');
|
|
expect(prompt).not.toContain('Treat RULE 1');
|
|
});
|
|
|
|
it('also drops the transcript on a non-form turn when skipTranscript is true', () => {
|
|
// Without a form-answer transition, the function previously returned
|
|
// `message` verbatim. With skipTranscript the body must come from
|
|
// `currentPrompt` instead so a follow-up `agy -c` turn doesn't carry
|
|
// the duplicate transcript.
|
|
const transcript = '## user\n第一轮\n\n## assistant\n回答\n\n## user\n第二轮 follow-up';
|
|
const currentPrompt = '第二轮 follow-up';
|
|
|
|
const skipped = composeChatUserRequestForAgent(transcript, currentPrompt, {
|
|
skipTranscript: true,
|
|
});
|
|
expect(skipped).toBe(currentPrompt);
|
|
|
|
// Default behavior unchanged (backward compatibility for every
|
|
// adapter that doesn't set resumesSessionViaCli).
|
|
const kept = composeChatUserRequestForAgent(transcript, currentPrompt);
|
|
expect(kept).toBe(transcript);
|
|
});
|
|
|
|
it('invokes Langfuse reporting once when the final message write is marked', () => {
|
|
const run = {
|
|
id: 'run-1',
|
|
projectId: 'project-1',
|
|
conversationId: 'conv-1',
|
|
assistantMessageId: 'assistant-1',
|
|
status: 'succeeded',
|
|
createdAt: 1,
|
|
updatedAt: 2,
|
|
events: [],
|
|
};
|
|
const report = vi.fn();
|
|
const reporter = createFinalizedMessageTelemetryReporter({
|
|
design: { runs: { get: vi.fn(() => run) } },
|
|
db: 'db',
|
|
dataDir: '/tmp/od-data',
|
|
reportedRuns: new Set<string>(),
|
|
getAppVersion: () => ({ version: '0.7.0', channel: 'beta', packaged: true }),
|
|
report,
|
|
});
|
|
|
|
reporter(
|
|
{ ...terminalMessage, endedAt: 1234 },
|
|
{ telemetryFinalized: true },
|
|
);
|
|
reporter(
|
|
{ ...terminalMessage, endedAt: 1234 },
|
|
{ telemetryFinalized: true },
|
|
);
|
|
|
|
expect(report).toHaveBeenCalledTimes(1);
|
|
expect(report).toHaveBeenCalledWith({
|
|
db: 'db',
|
|
dataDir: '/tmp/od-data',
|
|
run,
|
|
persistedRunStatus: 'succeeded',
|
|
persistedEndedAt: 1234,
|
|
appVersion: { version: '0.7.0', channel: 'beta', packaged: true },
|
|
});
|
|
});
|
|
});
|