mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
* fix(analytics): bucket feedback agent/model directly on the event
Reason × agent / reason × model splits on
`assistant_feedback_reason_submit` were 25-74% `unknown` because the
event only carried `run_id` — analyses had to join back to
`run_created/run_finished`, which loses rows whenever the feedback is
given to a message whose run sits outside the query window (the common
case for feedback on older messages), and whose `model_id` was `null`
to begin with (the user didn't pick a specific model — went with the
agent's default).
Carry `agent_provider_id` and `model_id` directly on every feedback
event so the analyses no longer need to join. Replace `null/unknown`
with the `default` bucket via `modelIdForTracking` (and let
`agentIdToTracking` fall through to `other`) at every emit site —
`null` was an analyst-hostile mix of "no selection" and "join failed";
`default` is a real, analysable bucket. On `run_finished`, upgrade the
model to the agent-reported value from initializing/model status
events when the user did not pick one — covers ACP, claude-stream,
copilot-stream, json-event-stream, qoder, pi-rpc.
* fix(analytics): use feedbackAgentProviderIdToTracking and assistantFeedbackModelId for feedback events
Wire API-mode agent ids (anthropic-api → anthropic) and agentName-parsed
model ids through the feedback emit path. Previously the feedback props used
agentIdToTracking (no anthropic-api case) and assistantModelDetail (no
agentName fallback), causing model_id='default' and agent_provider_id='other'
for API-mode agents.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(analytics): extend feedback/run schema for full agent/model coverage
Layered on top of the conflict resolution and the v1 emit switchover
in 0c1b30440. Three things the prior commits did not cover:
1) The v2 `assistant_feedback_*` family (page='studio') shares
`AssistantFeedbackBase`. Add `agent_provider_id` + `model_id` once on
the base so all four derived emits (reason_view, click, reason_click,
reason_submit) carry the same context as the v1 family, instead of
leaving the v2 dashboard with the same `unknown` gap the v1 PR was
trying to close.
2) Tighten `FeedbackSubmitResultProps.model_id` and
`feedbackAgentProviderIdToTracking` from `string | null` /
`TrackingFeedbackProviderId | null` to non-null. The web emit paths
already bucket null/empty through `modelIdForTracking` and the
`?? 'other'` fallback; collapsing that at the helper / contract
layer means `null` becomes a TS error at every new emit site, so we
can't regress the unknown bucket again in a future event.
3) Comment on `run_finished.model_id` so reviewers reading
`finishedModelId` see why the agent-reported value upgrades the
request-side one.
* fix(analytics): continue event scan past usage to find agent-reported model
The reverse scan for agentReportedModel was broken: the loop broke on
the first usage event (terminal) before ever reaching the status:initializing
or status:model event (emitted at run start, lower index). This meant
run_finished.model_id always fell through to modelIdForTracking(null) =
'default' for any run that reported usage tokens.
Fix: track haveUsageTokens as a flag and defer the break until both usage
tokens are found and either the model is not needed (user picked one) or
the agent-reported model has been captured. Extract the logic into
scanRunEventsForFinishedProps for unit testability.
Tests: six new cases in run-lifecycle-analytics.test.ts cover the
initializing→usage append order, ACP status:model, detail field fallback,
early exit when reqBodyModel is set, no-status event, and empty events.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(analytics): guard usage block with !haveUsageTokens to prevent early events overwriting terminal tokens
In the reverse-scan loop of scanRunEventsForFinishedProps, the usage block
lacked a !haveUsageTokens guard. When needAgentModel is true and the
agentReportedModel lives at the start of the run (lower index), the loop
walks all the way back past multiple usage events (one per step/turn in
multi-step runs), overwriting inputTokens/outputTokens on each pass. The
surviving values were those of the earliest step, not the terminal total.
Adding !haveUsageTokens to the usage block condition ensures only the first
(terminal) usage event seen in reverse sets the token counts; subsequent
earlier usage events are skipped while the scan continues for agentReportedModel.
Adds a test case for initializing(model) → usage(step1) → usage(terminal)
asserting both terminal token counts and agentReportedModel.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
125 lines
5.1 KiB
TypeScript
125 lines
5.1 KiB
TypeScript
import { describe, expect, it } from 'vitest';
|
|
import {
|
|
__forTestResolveRunProjectKindForAnalytics,
|
|
__forTestScanRunEventsForFinishedProps,
|
|
} from '../src/server.js';
|
|
|
|
describe('run lifecycle analytics', () => {
|
|
it('falls back to stored project metadata when analytics hints omit project kind', () => {
|
|
expect(
|
|
__forTestResolveRunProjectKindForAnalytics({
|
|
hintProjectKind: null,
|
|
projectMetadata: { kind: 'prototype' },
|
|
}),
|
|
).toBe('prototype');
|
|
});
|
|
|
|
it('maps project metadata kind to the analytics project_kind enum', () => {
|
|
expect(
|
|
__forTestResolveRunProjectKindForAnalytics({
|
|
hintProjectKind: null,
|
|
projectMetadata: { kind: 'deck' },
|
|
}),
|
|
).toBe('slide_deck');
|
|
});
|
|
|
|
it('preserves explicit analytics hints over project metadata', () => {
|
|
expect(
|
|
__forTestResolveRunProjectKindForAnalytics({
|
|
hintProjectKind: 'design_system',
|
|
projectMetadata: { kind: 'other' },
|
|
}),
|
|
).toBe('design_system');
|
|
});
|
|
|
|
it('classifies design-system workspace projects when hints are absent', () => {
|
|
expect(
|
|
__forTestResolveRunProjectKindForAnalytics({
|
|
hintProjectKind: null,
|
|
projectMetadata: { kind: 'other', importedFrom: 'design-system' },
|
|
}),
|
|
).toBe('design_system');
|
|
});
|
|
});
|
|
|
|
describe('scanRunEventsForFinishedProps', () => {
|
|
function usageEvent(inputTokens: number, outputTokens: number) {
|
|
return { event: 'agent', data: { type: 'usage', usage: { input_tokens: inputTokens, output_tokens: outputTokens } } };
|
|
}
|
|
|
|
function initializingEvent(model: string) {
|
|
return { event: 'agent', data: { type: 'status', label: 'initializing', model } };
|
|
}
|
|
|
|
function modelEvent(model: string) {
|
|
return { event: 'agent', data: { type: 'status', label: 'model', model } };
|
|
}
|
|
|
|
it('extracts agent model from initializing event when usage event follows it (real run order)', () => {
|
|
// Append order mirrors a real run: initializing first, usage last.
|
|
// Reverse scan must not stop at usage before reading the model signal.
|
|
const events = [initializingEvent('claude-opus-4'), usageEvent(100, 200)];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, '');
|
|
expect(result.agentReportedModel).toBe('claude-opus-4');
|
|
expect(result.inputTokens).toBe(100);
|
|
expect(result.outputTokens).toBe(200);
|
|
});
|
|
|
|
it('extracts agent model from ACP status:model event when usage follows it', () => {
|
|
const events = [modelEvent('gpt-4o'), usageEvent(50, 75)];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, '');
|
|
expect(result.agentReportedModel).toBe('gpt-4o');
|
|
expect(result.inputTokens).toBe(50);
|
|
});
|
|
|
|
it('reads model from detail field when model field is absent', () => {
|
|
const events = [
|
|
{ event: 'agent', data: { type: 'status', label: 'initializing', detail: 'gemini-pro' } },
|
|
usageEvent(10, 20),
|
|
];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, '');
|
|
expect(result.agentReportedModel).toBe('gemini-pro');
|
|
});
|
|
|
|
it('stops early on usage when reqBodyModel is set (no need to scan for agent model)', () => {
|
|
// When the user picked a model, needAgentModel=false so the loop exits
|
|
// as soon as usage tokens are found — it does not need to walk back to
|
|
// the initializing event.
|
|
const events = [initializingEvent('claude-opus-4'), usageEvent(30, 40)];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, 'claude-haiku-4-5');
|
|
expect(result.inputTokens).toBe(30);
|
|
expect(result.outputTokens).toBe(40);
|
|
// agentReportedModel may or may not be found (early exit), but the caller
|
|
// ignores it when reqBodyModel is set — no assertion on its value here.
|
|
});
|
|
|
|
it('returns null agentReportedModel when no status event is present', () => {
|
|
const events = [usageEvent(5, 10)];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, '');
|
|
expect(result.agentReportedModel).toBeNull();
|
|
expect(result.inputTokens).toBe(5);
|
|
});
|
|
|
|
it('handles empty event list', () => {
|
|
const result = __forTestScanRunEventsForFinishedProps([], '');
|
|
expect(result.agentReportedModel).toBeNull();
|
|
expect(result.inputTokens).toBeUndefined();
|
|
expect(result.outputTokens).toBeUndefined();
|
|
});
|
|
|
|
it('uses terminal usage event tokens when multiple usage events exist', () => {
|
|
// Multi-step/multi-turn runs emit one usage event per step/turn (json-event-stream,
|
|
// pi-rpc). Reverse scan hits the terminal (highest-index) usage first; the
|
|
// !haveUsageTokens guard must prevent earlier usage events from overwriting those values
|
|
// while the loop continues scanning back for agentReportedModel.
|
|
const events = [
|
|
initializingEvent('claude-opus-4'),
|
|
usageEvent(100, 200), // step 1 — must NOT overwrite terminal values
|
|
usageEvent(500, 750), // terminal turn — seen first in reverse, values must survive
|
|
];
|
|
const result = __forTestScanRunEventsForFinishedProps(events, '');
|
|
expect(result.agentReportedModel).toBe('claude-opus-4');
|
|
expect(result.inputTokens).toBe(500);
|
|
expect(result.outputTokens).toBe(750);
|
|
});
|
|
});
|