fix(analytics): bucket feedback agent/model directly on the event (#3240)

* fix(analytics): bucket feedback agent/model directly on the event

Reason × agent / reason × model splits on
`assistant_feedback_reason_submit` were 25-74% `unknown` because the
event only carried `run_id` — analyses had to join back to
`run_created/run_finished`, which loses rows whenever the feedback is
given to a message whose run sits outside the query window (the common
case for feedback on older messages), and whose `model_id` was `null`
to begin with (the user didn't pick a specific model — went with the
agent's default).

Carry `agent_provider_id` and `model_id` directly on every feedback
event so the analyses no longer need to join. Replace `null/unknown`
with the `default` bucket via `modelIdForTracking` (and let
`agentIdToTracking` fall through to `other`) at every emit site —
`null` was an analyst-hostile mix of "no selection" and "join failed";
`default` is a real, analysable bucket. On `run_finished`, upgrade the
model to the agent-reported value from initializing/model status
events when the user did not pick one — covers ACP, claude-stream,
copilot-stream, json-event-stream, qoder, pi-rpc.

* fix(analytics): use feedbackAgentProviderIdToTracking and assistantFeedbackModelId for feedback events

Wire API-mode agent ids (anthropic-api → anthropic) and agentName-parsed
model ids through the feedback emit path. Previously the feedback props used
agentIdToTracking (no anthropic-api case) and assistantModelDetail (no
agentName fallback), causing model_id='default' and agent_provider_id='other'
for API-mode agents.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): extend feedback/run schema for full agent/model coverage

Layered on top of the conflict resolution and the v1 emit switchover
in 0c1b30440. Three things the prior commits did not cover:

1) The v2 `assistant_feedback_*` family (page='studio') shares
   `AssistantFeedbackBase`. Add `agent_provider_id` + `model_id` once on
   the base so all four derived emits (reason_view, click, reason_click,
   reason_submit) carry the same context as the v1 family, instead of
   leaving the v2 dashboard with the same `unknown` gap the v1 PR was
   trying to close.

2) Tighten `FeedbackSubmitResultProps.model_id` and
   `feedbackAgentProviderIdToTracking` from `string | null` /
   `TrackingFeedbackProviderId | null` to non-null. The web emit paths
   already bucket null/empty through `modelIdForTracking` and the
   `?? 'other'` fallback; collapsing that at the helper / contract
   layer means `null` becomes a TS error at every new emit site, so we
   can't regress the unknown bucket again in a future event.

3) Comment on `run_finished.model_id` so reviewers reading
   `finishedModelId` see why the agent-reported value upgrades the
   request-side one.

* fix(analytics): continue event scan past usage to find agent-reported model

The reverse scan for agentReportedModel was broken: the loop broke on
the first usage event (terminal) before ever reaching the status:initializing
or status:model event (emitted at run start, lower index). This meant
run_finished.model_id always fell through to modelIdForTracking(null) =
'default' for any run that reported usage tokens.

Fix: track haveUsageTokens as a flag and defer the break until both usage
tokens are found and either the model is not needed (user picked one) or
the agent-reported model has been captured. Extract the logic into
scanRunEventsForFinishedProps for unit testability.

Tests: six new cases in run-lifecycle-analytics.test.ts cover the
initializing→usage append order, ACP status:model, detail field fallback,
early exit when reqBodyModel is set, no-status event, and empty events.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): guard usage block with !haveUsageTokens to prevent early events overwriting terminal tokens

In the reverse-scan loop of scanRunEventsForFinishedProps, the usage block
lacked a !haveUsageTokens guard. When needAgentModel is true and the
agentReportedModel lives at the start of the run (lower index), the loop
walks all the way back past multiple usage events (one per step/turn in
multi-step runs), overwriting inputTokens/outputTokens on each pass. The
surviving values were those of the earliest step, not the terminal total.

Adding !haveUsageTokens to the usage block condition ensures only the first
(terminal) usage event seen in reverse sets the token counts; subsequent
earlier usage events are skipped while the scan continues for agentReportedModel.

Adds a test case for initializing(model) → usage(step1) → usage(terminal)
asserting both terminal token counts and agentReportedModel.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
This commit is contained in:
lefarcen 2026-05-29 11:06:06 +08:00 committed by GitHub
parent 45873a551b
commit 08c350fb0f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 225 additions and 37 deletions

View file

@ -242,6 +242,7 @@ import { observePendingInstallerApplyAttempts } from './update-apply-observation
import {
agentIdToTracking,
deriveConfigureGlobals,
modelIdForTracking,
projectKindToTracking,
type ObservabilityEventRequest,
} from '@open-design/contracts/analytics';
@ -1915,6 +1916,53 @@ export function __forTestResolveRunProjectKindForAnalytics(args) {
return resolveRunProjectKindForAnalytics(args);
}
// Scans run.events newest→oldest to extract usage token counts and the
// agent-reported model name. The scan must not short-circuit on usage
// before reaching the model signal: usage is a terminal event while
// status:initializing/model is emitted at the very start of the run, so
// in reverse iteration usage is seen first. The loop continues until both
// usage tokens are found AND (the caller already has a model from reqBody
// OR the agent-reported model has been found).
function scanRunEventsForFinishedProps(events, reqBodyModel) {
let inputTokens;
let outputTokens;
let agentReportedModel = null;
const needAgentModel = !(typeof reqBodyModel === 'string' && reqBodyModel.trim());
let haveUsageTokens = false;
for (let i = events.length - 1; i >= 0; i -= 1) {
const ev = events[i];
const data = ev?.data;
if (ev?.event === 'agent' && data?.type === 'usage' && data.usage && !haveUsageTokens) {
const u = data.usage;
if (typeof u.input_tokens === 'number') inputTokens = u.input_tokens;
if (typeof u.output_tokens === 'number') outputTokens = u.output_tokens;
if (inputTokens !== undefined || outputTokens !== undefined) haveUsageTokens = true;
}
if (
!agentReportedModel &&
ev?.event === 'agent' &&
data?.type === 'status' &&
(data.label === 'model' || data.label === 'initializing')
) {
const candidate =
typeof data.model === 'string'
? data.model
: typeof data.detail === 'string'
? data.detail
: null;
if (candidate && candidate.trim()) {
agentReportedModel = candidate.trim();
}
}
if (haveUsageTokens && (!needAgentModel || agentReportedModel)) break;
}
return { inputTokens, outputTokens, agentReportedModel };
}
export function __forTestScanRunEventsForFinishedProps(events, reqBodyModel) {
return scanRunEventsForFinishedProps(events, reqBodyModel);
}
function githubRepoNameFromPluginName(name) {
const slug = String(name)
.toLowerCase()
@ -12641,11 +12689,17 @@ export async function startServer({
? (reqBody.attachments as unknown[]).length > 0
: false,
user_query_tokens: userQueryTokens,
model_id: typeof reqBody.model === 'string' ? reqBody.model : null,
agent_provider_id:
typeof reqBody.agentId === 'string'
? agentIdToTracking(reqBody.agentId)
: null,
// `modelIdForTracking` buckets null/empty into `'default'` so the
// PostHog `model_id` column always has an analysable value. The
// user-picked model only lands here on `run_created` (the agent
// hasn't initialised yet); `run_finished` below upgrades this to
// the agent-reported model when available.
model_id: modelIdForTracking(
typeof reqBody.model === 'string' ? reqBody.model : null,
),
agent_provider_id: agentIdToTracking(
typeof reqBody.agentId === 'string' ? reqBody.agentId : null,
),
skill_id: typeof reqBody.skillId === 'string' ? reqBody.skillId : null,
mcp_id: null,
token_count_source: userQueryTokens > 0 ? 'estimated' : 'unknown',
@ -12671,26 +12725,22 @@ export async function startServer({
// child close without error event, etc.).
const result = runResultFromStatus(status.status);
const errorCode = deriveRunErrorCode(status);
let inputTokens: number | undefined;
let outputTokens: number | undefined;
for (let i = run.events.length - 1; i >= 0; i -= 1) {
const ev = run.events[i];
const data = ev?.data as
| { type?: string; usage?: Record<string, unknown> | null }
| null
| undefined;
if (ev?.event === 'agent' && data?.type === 'usage' && data.usage) {
const u = data.usage;
if (typeof u.input_tokens === 'number') inputTokens = u.input_tokens;
if (typeof u.output_tokens === 'number') outputTokens = u.output_tokens;
if (inputTokens !== undefined || outputTokens !== undefined) break;
}
}
// ACP reports { type:'status', label:'model', model:<id> } after
// session/new; stream adapters report { type:'status',
// label:'initializing', model:<id> } at run start. The scan must
// not short-circuit on usage before reaching the model signal —
// see `scanRunEventsForFinishedProps` for the invariant.
const { inputTokens, outputTokens, agentReportedModel } =
scanRunEventsForFinishedProps(run.events, reqBody.model);
const haveUsage = inputTokens !== undefined || outputTokens !== undefined;
const totalTokens =
inputTokens !== undefined && outputTokens !== undefined
? inputTokens + outputTokens
: undefined;
const finishedModelId =
typeof reqBody.model === 'string' && reqBody.model.trim()
? modelIdForTracking(reqBody.model)
: modelIdForTracking(agentReportedModel);
design.analytics.capture({
eventName: 'run_finished',
context: analyticsContext,
@ -12702,6 +12752,10 @@ export async function startServer({
// `design_system_generation` to match the run_created shape.
area: isDesignSystemRun ? 'design_system_generation' : 'chat_panel',
result,
// `model_id` upgrades the request-side value with the
// agent-reported model on terminal state; see
// `finishedModelId` derivation above.
model_id: finishedModelId,
// Incremental count of `.html` paths the run produced or
// modified, deduped per file. Replaces the hard-coded `0`
// that masked the "did this run actually generate an

View file

@ -1,5 +1,8 @@
import { describe, expect, it } from 'vitest';
import { __forTestResolveRunProjectKindForAnalytics } from '../src/server.js';
import {
__forTestResolveRunProjectKindForAnalytics,
__forTestScanRunEventsForFinishedProps,
} from '../src/server.js';
describe('run lifecycle analytics', () => {
it('falls back to stored project metadata when analytics hints omit project kind', () => {
@ -38,3 +41,85 @@ describe('run lifecycle analytics', () => {
).toBe('design_system');
});
});
describe('scanRunEventsForFinishedProps', () => {
function usageEvent(inputTokens: number, outputTokens: number) {
return { event: 'agent', data: { type: 'usage', usage: { input_tokens: inputTokens, output_tokens: outputTokens } } };
}
function initializingEvent(model: string) {
return { event: 'agent', data: { type: 'status', label: 'initializing', model } };
}
function modelEvent(model: string) {
return { event: 'agent', data: { type: 'status', label: 'model', model } };
}
it('extracts agent model from initializing event when usage event follows it (real run order)', () => {
// Append order mirrors a real run: initializing first, usage last.
// Reverse scan must not stop at usage before reading the model signal.
const events = [initializingEvent('claude-opus-4'), usageEvent(100, 200)];
const result = __forTestScanRunEventsForFinishedProps(events, '');
expect(result.agentReportedModel).toBe('claude-opus-4');
expect(result.inputTokens).toBe(100);
expect(result.outputTokens).toBe(200);
});
it('extracts agent model from ACP status:model event when usage follows it', () => {
const events = [modelEvent('gpt-4o'), usageEvent(50, 75)];
const result = __forTestScanRunEventsForFinishedProps(events, '');
expect(result.agentReportedModel).toBe('gpt-4o');
expect(result.inputTokens).toBe(50);
});
it('reads model from detail field when model field is absent', () => {
const events = [
{ event: 'agent', data: { type: 'status', label: 'initializing', detail: 'gemini-pro' } },
usageEvent(10, 20),
];
const result = __forTestScanRunEventsForFinishedProps(events, '');
expect(result.agentReportedModel).toBe('gemini-pro');
});
it('stops early on usage when reqBodyModel is set (no need to scan for agent model)', () => {
// When the user picked a model, needAgentModel=false so the loop exits
// as soon as usage tokens are found — it does not need to walk back to
// the initializing event.
const events = [initializingEvent('claude-opus-4'), usageEvent(30, 40)];
const result = __forTestScanRunEventsForFinishedProps(events, 'claude-haiku-4-5');
expect(result.inputTokens).toBe(30);
expect(result.outputTokens).toBe(40);
// agentReportedModel may or may not be found (early exit), but the caller
// ignores it when reqBodyModel is set — no assertion on its value here.
});
it('returns null agentReportedModel when no status event is present', () => {
const events = [usageEvent(5, 10)];
const result = __forTestScanRunEventsForFinishedProps(events, '');
expect(result.agentReportedModel).toBeNull();
expect(result.inputTokens).toBe(5);
});
it('handles empty event list', () => {
const result = __forTestScanRunEventsForFinishedProps([], '');
expect(result.agentReportedModel).toBeNull();
expect(result.inputTokens).toBeUndefined();
expect(result.outputTokens).toBeUndefined();
});
it('uses terminal usage event tokens when multiple usage events exist', () => {
// Multi-step/multi-turn runs emit one usage event per step/turn (json-event-stream,
// pi-rpc). Reverse scan hits the terminal (highest-index) usage first; the
// !haveUsageTokens guard must prevent earlier usage events from overwriting those values
// while the loop continues scanning back for agentReportedModel.
const events = [
initializingEvent('claude-opus-4'),
usageEvent(100, 200), // step 1 — must NOT overwrite terminal values
usageEvent(500, 750), // terminal turn — seen first in reverse, values must survive
];
const result = __forTestScanRunEventsForFinishedProps(events, '');
expect(result.agentReportedModel).toBe('claude-opus-4');
expect(result.inputTokens).toBe(500);
expect(result.outputTokens).toBe(750);
});
});

View file

@ -21,7 +21,9 @@ import {
} from "../analytics/events";
import {
feedbackAgentProviderIdToTracking,
modelIdForTracking,
normalizeCustomReason,
type TrackingFeedbackProviderId,
type TrackingFeedbackReasonCode,
type TrackingFeedbackRatingWithNone,
type TrackingProjectKind,
@ -645,7 +647,7 @@ export function AssistantMessage({
conversationId={conversationId}
runId={message.runId ?? null}
assistantMessageId={message.id}
modelId={assistantFeedbackModelId(message)}
modelId={modelIdForTracking(assistantFeedbackModelId(message))}
agentProviderId={feedbackAgentProviderIdToTracking(message.agentId)}
producedFileCount={displayedProduced.length}
hasDesignSystemContext={hasDesignSystemContext}
@ -869,8 +871,8 @@ function AssistantFeedback({
conversationId: string | null;
runId: string | null;
assistantMessageId: string;
modelId: string | null;
agentProviderId: ReturnType<typeof feedbackAgentProviderIdToTracking>;
modelId: string;
agentProviderId: TrackingFeedbackProviderId;
producedFileCount: number;
}) {
const t = useT();
@ -925,6 +927,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
agent_provider_id: agentProviderId,
model_id: modelId,
rating: reasonRating,
});
}
@ -936,6 +940,8 @@ function AssistantFeedback({
conversationId,
assistantMessageId,
runId,
agentProviderId,
modelId,
]);
const toggleFeedback = (rating: ChatMessageFeedbackRating) => {
const nextRating = selected === rating ? null : rating;
@ -959,6 +965,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? "",
agent_provider_id: agentProviderId,
model_id: modelId,
rating,
rating_before: ratingBefore,
has_produced_files: producedFileCount > 0,
@ -978,6 +986,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
agent_provider_id: agentProviderId,
model_id: modelId,
rating: ratingAfter,
rating_before: ratingBefore,
has_produced_files: producedFileCount > 0,
@ -1017,6 +1027,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? "",
agent_provider_id: agentProviderId,
model_id: modelId,
rating: reasonRating,
...(reasonJoined ? { reason: reasonJoined } : {}),
reason_count: reasonCodes.length,
@ -1041,8 +1053,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? "",
model_id: modelId,
agent_provider_id: agentProviderId,
model_id: modelId,
rating: reasonRating,
...(reasonJoined ? { reason: reasonJoined } : {}),
reason_count: reasonCodes.length,
@ -1066,6 +1078,8 @@ function AssistantFeedback({
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
agent_provider_id: agentProviderId,
model_id: modelId,
rating: reasonRating,
reason: reasons,
reason_count: reasons.length,

View file

@ -1318,6 +1318,12 @@ export interface ShareOptionPopoverClickProps {
}
// FEEDBACK clicks (CSV rows 56 / 58)
// `agent_provider_id` / `model_id` are carried on every feedback event so
// `reason × agent` and `reason × model` analyses don't need to join back to
// `run_created` (which loses rows when the feedback is given to a message
// whose run is outside the query window — the dominant source of "unknown"
// in earlier reports). `model_id` uses `'default'` instead of null when the
// user did not pick a specific model; see `modelIdForTracking`.
export interface AssistantFeedbackButtonClickProps {
page_name: 'chat_panel';
area: 'chat_panel';
@ -1328,6 +1334,8 @@ export interface AssistantFeedbackButtonClickProps {
conversation_id: string | null;
assistant_message_id: string;
run_id: string;
agent_provider_id: TrackingFeedbackProviderId;
model_id: string;
// For `clear_feedback_rating`, `rating` carries the rating that was
// cleared (not the previous-before-clear value, which lives in
// `rating_before`). Mason flagged the v1 emission supplied the wrong
@ -1347,6 +1355,8 @@ export interface AssistantFeedbackReasonSubmitClickProps {
conversation_id: string | null;
assistant_message_id: string;
run_id: string;
agent_provider_id: TrackingFeedbackProviderId;
model_id: string;
rating: 'positive' | 'negative';
reason?: string;
reason_count: number;
@ -1730,8 +1740,11 @@ export interface RunCreatedProps {
aspect?: string;
has_attachment: boolean;
user_query_tokens: number;
model_id: string | null;
agent_provider_id: string | null;
// `'default'` when the user did not pick a specific model and the agent's
// own default was selected; use `modelIdForTracking` to bucket null/empty
// into `'default'` at every emit site.
model_id: string;
agent_provider_id: TrackingCliProviderId;
skill_id: string | null;
mcp_id: string | null;
token_count_source: TrackingTokenCountSource;
@ -1853,8 +1866,14 @@ export interface FeedbackSubmitResultProps {
conversation_id: string | null;
assistant_message_id: string;
run_id: string;
model_id: string | null;
agent_provider_id: TrackingFeedbackProviderId | null;
// `model_id` uses `modelIdForTracking` to bucket null/empty into the real
// `'default'` bucket (user accepted the agent's own default), so the
// PostHog `model_id` column never carries the analyst-hostile mix of
// "no selection" and "join failed" that `null/unknown` used to mean.
// `agent_provider_id` carries the BYOK provider when the agent maps to
// one, so reason × provider analyses can split CLI vs API surfaces.
model_id: string;
agent_provider_id: TrackingFeedbackProviderId;
rating: 'positive' | 'negative';
reason?: string;
reason_count: number;
@ -1874,6 +1893,12 @@ interface AssistantFeedbackBase {
// but the product funnel keys off this; we emit `null` rather than dropping
// the field so PostHog can distinguish "no run id" from "field forgotten".
run_id: string | null;
// Same rationale as `FeedbackSubmitResultProps`: carry agent/model on the
// event itself so reason × agent / reason × model analyses don't depend
// on joining back to `run_created`. Buckets via `modelIdForTracking` and
// `feedbackAgentProviderIdToTracking` at every emit site.
agent_provider_id: TrackingFeedbackProviderId;
model_id: string;
rating: TrackingFeedbackRating;
}
@ -2069,6 +2094,16 @@ export function executionModeToTracking(
return mode === 'daemon' ? 'local_cli' : 'byok';
}
// Model id bucket for analytics. `'default'` represents "user did not pick
// a specific model — went with the agent's own default". This is a real,
// analysable bucket, distinct from `null/unknown` which previously masked
// both "no selection" and "join failed". Callers that have a non-empty
// model string pass it through unchanged.
export function modelIdForTracking(model: string | null | undefined): string {
const trimmed = typeof model === 'string' ? model.trim() : '';
return trimmed.length > 0 ? trimmed : 'default';
}
// Daemon agent id (apps/daemon/src/agents.ts) → CSV cli_provider_id.
export function agentIdToTracking(agentId: string | null | undefined): TrackingCliProviderId {
switch (agentId) {
@ -2105,22 +2140,22 @@ export function agentIdToTracking(agentId: string | null | undefined): TrackingC
export function feedbackAgentProviderIdToTracking(
agentId: string | null | undefined,
): TrackingFeedbackProviderId | null {
): TrackingFeedbackProviderId {
switch (agentId) {
case 'anthropic-api':
return byokProtocolToTracking('anthropic');
return byokProtocolToTracking('anthropic') ?? 'other';
case 'openai-api':
return byokProtocolToTracking('openai');
return byokProtocolToTracking('openai') ?? 'other';
case 'azure-openai-api':
return byokProtocolToTracking('azure');
return byokProtocolToTracking('azure') ?? 'other';
case 'google-gemini-api':
return byokProtocolToTracking('google');
return byokProtocolToTracking('google') ?? 'other';
case 'ollama-cloud-api':
return byokProtocolToTracking('ollama');
return byokProtocolToTracking('ollama') ?? 'other';
case 'senseaudio-api':
return byokProtocolToTracking('senseaudio');
return byokProtocolToTracking('senseaudio') ?? 'other';
default:
return agentId ? agentIdToTracking(agentId) : null;
return agentIdToTracking(agentId);
}
}