feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)

* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
This commit is contained in:
lefarcen 2026-05-21 19:28:51 +08:00 committed by GitHub
parent 10e2019c59
commit 6690dbd5bb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
18 changed files with 780 additions and 9 deletions

View file

@ -12,7 +12,25 @@ import { isSafeId as isSafeProjectId } from './projects.js';
import { projectKindToTracking } from '@open-design/contracts/analytics';
import { validateBaseUrlResolved } from './connectionTest.js';
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle' | 'paths'> {}
// Allowlist for the `/feedback` route. Mirrors the
// ChatMessageFeedbackReasonCode union in packages/contracts/src/api/chat.ts.
// Kept inline (not imported as a runtime value, since the contract type is
// type-only) so a stale client can't poison Langfuse with unknown categories.
const FEEDBACK_REASON_ALLOWLIST: ReadonlySet<string> = new Set([
'matched_request',
'strong_visual',
'useful_structure',
'easy_to_continue',
'followed_design_system',
'missed_request',
'weak_visual',
'incomplete_output',
'hard_to_use',
'missed_design_system',
'other',
]);
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle' | 'paths' | 'telemetry'> {}
export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
const { db, design } = ctx;
@ -122,6 +140,74 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
res.json({ ok: true });
});
// Receives the user's thumbs-up/down (+ reason codes) for an assistant
// turn and forwards it to Langfuse as a `score-create`. Web persists the
// feedback itself via PUT /messages/:id; this endpoint exists only as a
// telemetry side channel — the daemon is the single network egress for
// Langfuse and gates on `telemetry.metrics + telemetry.content` consent.
//
// The consent + sink decision is fast (awaits a small file read, no
// network); we await it so the response status honestly reflects whether
// the score was enqueued, skipped for consent, or skipped because no
// Langfuse sink is configured. The actual Langfuse network call happens
// as a detached promise inside the bridge.
app.post('/api/runs/:id/feedback', async (req, res) => {
const runId = req.params.id;
const body = (req.body ?? {}) as Partial<{
projectId: string;
conversationId: string;
assistantMessageId: string;
rating: 'positive' | 'negative';
reasonCodes: string[];
hasCustomReason: boolean;
customReason: string;
}>;
if (!runId) {
return sendApiError(res, 400, 'INVALID_RUN_ID', 'runId missing');
}
if (body.rating !== 'positive' && body.rating !== 'negative') {
return sendApiError(res, 400, 'INVALID_RATING', 'rating must be positive or negative');
}
// Drop anything outside the contract-side reason allowlist and
// deduplicate; otherwise a malformed or replayed client payload could
// create unknown Langfuse categories or duplicate score ids in the
// same batch.
const reasonCodes = Array.isArray(body.reasonCodes)
? Array.from(
new Set(
body.reasonCodes.filter(
(c): c is string =>
typeof c === 'string' && FEEDBACK_REASON_ALLOWLIST.has(c),
),
),
)
: [];
const customReason = typeof body.customReason === 'string' ? body.customReason : '';
const reportFeedback = ctx.telemetry?.reportFeedback;
if (!reportFeedback) {
res.status(202).json({ status: 'skipped_no_sink' });
return;
}
// Build score metadata bag that lands in the Langfuse score body.
// Mirrors the PostHog event so analysts can cross-reference.
const scoreMetadata: Record<string, unknown> = {
projectId: body.projectId,
conversationId: body.conversationId,
assistantMessageId: body.assistantMessageId,
hasCustomReason: body.hasCustomReason === true,
customReason,
};
const outcome = await reportFeedback({
runId,
rating: body.rating,
reasonCodes,
hasCustomReason: body.hasCustomReason === true,
customReason,
scoreMetadata,
});
res.status(202).json(outcome);
});
app.post('/api/chat', (req, res) => {
if (isDaemonShuttingDown()) {
return sendApiError(res, 503, 'UPSTREAM_UNAVAILABLE', 'daemon is shutting down');

View file

@ -14,9 +14,12 @@ import { readAppConfig } from './app-config.js';
import type { AppVersionInfo } from './app-version.js';
import { listMessages } from './db.js';
import {
readTelemetrySinkConfig,
reportRunCompleted,
reportRunFeedback,
type ArtifactSummary,
type EventsSummary,
type FeedbackReportContext,
type MessageSummary,
type ReportContext,
type RuntimeInfo,
@ -357,3 +360,71 @@ export async function reportRunCompletedFromDaemon(
console.warn('[langfuse-bridge] report failed:', String(err));
}
}
export interface ReportRunFeedbackFromDaemonOpts {
dataDir: string;
runId: string;
rating: 'positive' | 'negative';
reasonCodes: string[];
hasCustomReason: boolean;
/** Raw "other" free text. Empty when no custom reason. */
customReason: string;
/** Extra context for Langfuse score metadata (projectId / conversationId / assistantMessageId). */
scoreMetadata?: Record<string, unknown>;
fetchImpl?: typeof fetch;
}
/**
* Result for the POST /api/runs/:id/feedback handler. Telemetry is
* best-effort and the network call runs after the response is sent, but
* the handler still tells the caller whether the report was at least
* enqueued useful for QA and e2e.
*/
export type FeedbackReportOutcome =
| { status: 'accepted' }
| { status: 'skipped_consent' }
| { status: 'skipped_no_sink' };
export async function reportRunFeedbackFromDaemon(
opts: ReportRunFeedbackFromDaemonOpts,
): Promise<FeedbackReportOutcome> {
let cfg;
try {
cfg = await readAppConfig(opts.dataDir);
} catch (err) {
console.warn('[langfuse-bridge] feedback config read failed:', String(err));
return { status: 'skipped_no_sink' };
}
const prefs = cfg.telemetry ?? {};
if (prefs.metrics !== true || prefs.content !== true) {
return { status: 'skipped_consent' };
}
// Pre-resolve the sink before claiming `accepted`. Avoids advertising a
// successful enqueue to callers when there's no Langfuse endpoint
// configured to ship the score to.
const sink = readTelemetrySinkConfig();
if (!sink) {
return { status: 'skipped_no_sink' };
}
const ctx: FeedbackReportContext = {
runId: opts.runId,
installationId: cfg.installationId ?? null,
prefs,
rating: opts.rating,
reasonCodes: opts.reasonCodes,
hasCustomReason: opts.hasCustomReason,
customReason: opts.customReason,
...(opts.scoreMetadata ? { metadata: opts.scoreMetadata } : {}),
};
// Fire-and-forget the actual network send so the route can respond
// immediately. The handler's response already encodes the consent +
// sink-presence outcome above; failures inside the send are operational
// telemetry, not a client-facing signal.
void reportRunFeedback(
ctx,
opts.fetchImpl ? { fetchImpl: opts.fetchImpl } : {},
).catch((err) => {
console.warn('[langfuse-bridge] feedback report failed:', String(err));
});
return { status: 'accepted' };
}

View file

@ -151,6 +151,29 @@ export interface ReportRunOpts {
fetchImpl?: typeof fetch;
}
/**
* Payload sent to Langfuse when a user thumbs-up/down's an assistant turn.
*
* The `runId` doubles as the Langfuse trace id (same convention used by
* buildTracePayload), so the score lands on the existing trace if the run
* was previously reported. If the run wasn't reported (e.g. content
* consent was off at run completion, then turned on before the user
* scored), Langfuse will accept the score anyway and the trace will
* materialize when/if the daemon backfills it.
*/
export interface FeedbackReportContext {
runId: string;
installationId: string | null;
prefs: TelemetryPrefs;
rating: 'positive' | 'negative';
reasonCodes: string[];
/** Raw "other" free text the user typed. Trimmed; empty string when absent. */
customReason: string;
hasCustomReason: boolean;
/** Optional context bag that ends up in Langfuse score metadata. */
metadata?: Record<string, unknown>;
}
export function readLangfuseConfig(
env: NodeJS.ProcessEnv = process.env,
): LangfuseConfig | null {
@ -658,3 +681,105 @@ export async function reportRunCompleted(
}
await postLangfuseBatch(config, batch, fetchImpl);
}
// Build a Langfuse `score-create` batch for a user-supplied turn rating.
//
// Langfuse scores let evals filter traces by user feedback. We emit one
// NUMERIC score (`user_rating`, +1 / -1) plus optional CATEGORICAL scores
// for each reason code, so the Langfuse UI's score filters work out of
// the box. Raw custom-reason text rides in the score metadata when the
// user opted into telemetry.content; the consent gate lives in
// reportRunFeedback below, so this builder stays content-agnostic.
//
// Limitation: stable score ids (`${traceId}-rating`, `${traceId}-reason-${code}`)
// mean re-submission overwrites cleanly, but reason codes the user removes
// in a follow-up submission do not get a tombstone. A future change can
// thread `removedReasonCodes` through and emit overwriting "cleared"
// scores for them; not done here to keep this PR scoped to the bridge.
export function buildFeedbackPayload(ctx: FeedbackReportContext): unknown[] {
const traceId = ctx.runId;
const nowIso = new Date().toISOString();
const batch: unknown[] = [];
const ratingMetadata: Record<string, unknown> = {
reasonCodes: ctx.reasonCodes,
reasonCount: ctx.reasonCodes.length,
hasCustomReason: ctx.hasCustomReason,
// Raw text — gated upstream by telemetry.content consent.
customReason: ctx.customReason || undefined,
installationId: ctx.installationId ?? undefined,
...(ctx.metadata ?? {}),
};
batch.push({
id: randomUUID(),
type: 'score-create',
timestamp: nowIso,
body: {
id: `${traceId}-rating`,
traceId,
name: 'user_rating',
value: ctx.rating === 'positive' ? 1 : -1,
dataType: 'NUMERIC',
comment: ctx.rating,
metadata: ratingMetadata,
},
});
for (const code of ctx.reasonCodes) {
batch.push({
id: randomUUID(),
type: 'score-create',
timestamp: nowIso,
body: {
// Stable per (run, code) so re-submission overwrites cleanly.
id: `${traceId}-reason-${code}`,
traceId,
name: 'user_rating_reason',
value: code,
dataType: 'CATEGORICAL',
// Group the reason under the rating it was submitted with so a
// "matched_request" tag on a thumbs-down run is still visibly
// negative in the Langfuse UI.
comment: ctx.rating,
},
});
}
return batch;
}
export async function reportRunFeedback(
ctx: FeedbackReportContext,
opts: ReportRunOpts = {},
): Promise<void> {
if (ctx.prefs.metrics !== true) return;
if (ctx.prefs.content !== true) return;
const config = resolveReportConfig(opts);
if (!config) return;
let batch: unknown[];
try {
batch = buildFeedbackPayload(ctx);
} catch (error) {
console.warn(`[langfuse-trace] Feedback payload build error: ${String(error)}`);
return;
}
const serialized = JSON.stringify({ batch });
const serializedBytes = Buffer.byteLength(serialized, 'utf8');
if (serializedBytes > HARD_BATCH_MAX_BYTES) {
console.warn(
`[langfuse-trace] Feedback batch too large (${serializedBytes}B > ${HARD_BATCH_MAX_BYTES}B), dropping feedback for ${ctx.runId}`,
);
return;
}
const fetchImpl = opts.fetchImpl ?? globalThis.fetch;
if (config.kind === 'relay') {
await postRelayBatch(config, serialized, fetchImpl);
return;
}
await postLangfuseBatch(config, batch, fetchImpl);
}

View file

@ -55,6 +55,21 @@ export interface RoutineDeps {
export interface TelemetryDeps {
reportFinalizedMessage: (saved: any, body?: any) => void;
/**
* Best-effort Langfuse score emission for assistant-turn user ratings.
* Returns the categorical outcome so the API surface in chat-routes can
* report back to the web client whether the report was accepted or
* skipped (consent off / no sink). The handler must not await this in
* the request hot path fire-and-forget.
*/
reportFeedback?: (req: {
runId: string;
rating: 'positive' | 'negative';
reasonCodes: string[];
hasCustomReason: boolean;
customReason: string;
scoreMetadata?: Record<string, unknown>;
}) => Promise<{ status: 'accepted' | 'skipped_consent' | 'skipped_no_sink' }>;
}
export interface ServerContext {

View file

@ -184,7 +184,10 @@ import { renderDesignSystemPreview } from './design-system-preview.js';
import { renderDesignSystemShowcase } from './design-system-showcase.js';
import { createChatRunService } from './runs.js';
import { deriveRunErrorCode, runResultFromStatus } from './run-result.js';
import { reportRunCompletedFromDaemon } from './langfuse-bridge.js';
import {
reportRunCompletedFromDaemon,
reportRunFeedbackFromDaemon,
} from './langfuse-bridge.js';
import {
createAnalyticsService,
newInsertId,
@ -4619,6 +4622,19 @@ export async function startServer({
getAppVersion: () => cachedAppVersion,
});
const reportFeedback = (req: {
runId: string;
rating: 'positive' | 'negative';
reasonCodes: string[];
hasCustomReason: boolean;
customReason: string;
scoreMetadata?: Record<string, unknown>;
}) =>
reportRunFeedbackFromDaemon({
dataDir: RUNTIME_DATA_DIR,
...req,
});
// DNS-aware wrapper. The sync `validateBaseUrl` only inspects the literal
// hostname string, so a public DNS name pointing at an internal address
// (`internal.example.com → 10.0.0.5`) still passes. We delegate to
@ -11770,7 +11786,7 @@ export async function startServer({
critique: critiqueDeps,
validation: validationDeps,
lifecycle: { isDaemonShuttingDown: () => daemonShuttingDown },
telemetry: { reportFinalizedMessage, reportFeedback },
});
registerStaticSpaFallback(app, STATIC_DIR);

View file

@ -1,10 +1,13 @@
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
buildFeedbackPayload,
buildTracePayload,
readLangfuseConfig,
readTelemetrySinkConfig,
reportRunCompleted,
reportRunFeedback,
type FeedbackReportContext,
type LangfuseConfig,
type ReportContext,
type TelemetrySinkConfig,
@ -749,3 +752,110 @@ describe('reportRunCompleted', () => {
expect(warnSpy).not.toHaveBeenCalled();
});
});
function makeFeedbackCtx(
overrides: Partial<FeedbackReportContext> = {},
): FeedbackReportContext {
return {
runId: 'run-feedback-1',
installationId: 'install-uuid-1',
prefs: { metrics: true, content: true },
rating: 'positive',
reasonCodes: ['matched_request'],
hasCustomReason: false,
customReason: '',
...overrides,
};
}
describe('buildFeedbackPayload', () => {
it('emits a numeric user_rating score plus per-reason categorical scores', () => {
const batch = buildFeedbackPayload(
makeFeedbackCtx({
rating: 'negative',
reasonCodes: ['missed_request', 'weak_visual'],
hasCustomReason: true,
customReason: 'It got the layout wrong on tablet',
}),
) as Array<Record<string, any>>;
expect(batch).toHaveLength(3);
const ratingScore = batch[0]!;
expect(ratingScore.type).toBe('score-create');
expect(ratingScore.body.traceId).toBe('run-feedback-1');
expect(ratingScore.body.name).toBe('user_rating');
expect(ratingScore.body.value).toBe(-1);
expect(ratingScore.body.dataType).toBe('NUMERIC');
expect(ratingScore.body.comment).toBe('negative');
expect(ratingScore.body.metadata).toMatchObject({
reasonCount: 2,
customReason: 'It got the layout wrong on tablet',
hasCustomReason: true,
});
for (const reasonScore of batch.slice(1)) {
expect(reasonScore.body.name).toBe('user_rating_reason');
expect(reasonScore.body.dataType).toBe('CATEGORICAL');
expect(reasonScore.body.comment).toBe('negative');
expect(reasonScore.body.traceId).toBe('run-feedback-1');
}
expect(batch[1]!.body.value).toBe('missed_request');
expect(batch[2]!.body.value).toBe('weak_visual');
});
it('does not emit reason scores when no codes were submitted', () => {
const batch = buildFeedbackPayload(
makeFeedbackCtx({ reasonCodes: [] }),
) as Array<Record<string, any>>;
expect(batch).toHaveLength(1);
expect(batch[0]!.body.name).toBe('user_rating');
expect(batch[0]!.body.value).toBe(1);
});
});
describe('reportRunFeedback', () => {
const TEST_CONFIG: LangfuseConfig = {
baseUrl: 'https://us.cloud.langfuse.com',
authHeader: 'Basic Zm9vOmJhcg==',
retries: 0,
timeoutMs: 1000,
};
beforeEach(() => {
vi.useRealTimers();
});
it('skips when metrics consent is off', async () => {
const fetchSpy = vi.fn();
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: false, content: true } }), {
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
});
expect(fetchSpy).not.toHaveBeenCalled();
});
it('skips when content consent is off', async () => {
const fetchSpy = vi.fn();
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: true, content: false } }), {
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
});
expect(fetchSpy).not.toHaveBeenCalled();
});
it('posts a score-create batch to /api/public/ingestion when consent is on', async () => {
const fetchSpy = vi.fn().mockResolvedValue(
new Response(JSON.stringify({ successes: [], errors: [] }), { status: 207 }),
);
await reportRunFeedback(
makeFeedbackCtx({ reasonCodes: ['matched_request'] }),
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
);
expect(fetchSpy).toHaveBeenCalledTimes(1);
const [url, init] = fetchSpy.mock.calls[0]!;
expect(url).toBe('https://us.cloud.langfuse.com/api/public/ingestion');
expect(init.method).toBe('POST');
const body = JSON.parse(init.body);
expect(body.batch).toHaveLength(2);
expect(body.batch[0].type).toBe('score-create');
expect(body.batch[0].body.value).toBe(1);
});
});

View file

@ -53,7 +53,11 @@ import type {
PresentPopoverClickProps,
ShareOptionPopoverClickProps,
AssistantFeedbackButtonClickProps,
AssistantFeedbackClickProps,
AssistantFeedbackReasonClickProps,
AssistantFeedbackReasonSubmitClickProps,
AssistantFeedbackReasonSubmitProps,
AssistantFeedbackReasonViewProps,
SettingsSidebarClickProps,
SettingsExecutionModeTabClickProps,
SettingsLocalCliClickProps,
@ -616,3 +620,47 @@ export function trackSettingsConnectorAuthResult(
): void {
send(track, 'settings_connector_auth_result', props);
}
export function trackAssistantFeedbackClick(
track: Track,
props: AssistantFeedbackClickProps,
) {
track(
'assistant_feedback_click',
props as unknown as Record<string, unknown>,
);
}
export function trackAssistantFeedbackReasonView(
track: Track,
props: AssistantFeedbackReasonViewProps,
) {
track(
'assistant_feedback_reason_view',
props as unknown as Record<string, unknown>,
);
}
export function trackAssistantFeedbackReasonClick(
track: Track,
props: AssistantFeedbackReasonClickProps,
options?: { requestId: string },
) {
track(
'assistant_feedback_reason_click',
props as unknown as Record<string, unknown>,
options,
);
}
export function trackAssistantFeedbackReasonSubmit(
track: Track,
props: AssistantFeedbackReasonSubmitProps,
options?: { requestId: string },
) {
track(
'assistant_feedback_reason_submit',
props as unknown as Record<string, unknown>,
options,
);
}

View file

@ -7,11 +7,20 @@ import { submitChatRunToolResult } from "../providers/daemon";
import { useAnalytics } from "../analytics/provider";
import {
trackAssistantFeedbackButtonClick,
trackAssistantFeedbackClick,
trackAssistantFeedbackReasonClick,
trackAssistantFeedbackReasonPanelSurfaceView,
trackAssistantFeedbackReasonSubmit,
trackAssistantFeedbackReasonSubmitClick,
trackAssistantFeedbackReasonView,
trackFeedbackSubmitResult,
} from "../analytics/events";
import type { TrackingProjectKind } from "@open-design/contracts/analytics";
import {
normalizeCustomReason,
type TrackingFeedbackReasonCode,
type TrackingFeedbackRatingWithNone,
type TrackingProjectKind,
} from "@open-design/contracts/analytics";
import {
splitOnQuestionForms,
type QuestionForm,
@ -550,10 +559,10 @@ function AssistantFeedback({
}) {
const t = useT();
const analytics = useAnalytics();
// P0 — analytics context the feedback events need. The four ids are
// either user-anchored (projectId / assistantMessageId) or run-anchored
// (runId), so we pass them down with a stable identity. `producedFileCount`
// feeds `has_produced_files` on assistant_feedback_button click.
// Analytics context the feedback events need. The four ids are either
// user-anchored (projectId / assistantMessageId) or run-anchored (runId),
// so we pass them down with a stable identity. `producedFileCount` feeds
// `has_produced_files` on assistant_feedback_button click.
const [burstKey, setBurstKey] = useState(0);
const [reasonRating, setReasonRating] =
useState<ChatMessageFeedbackRating | null>(null);
@ -585,6 +594,24 @@ function AssistantFeedback({
run_id: runId ?? "",
rating: reasonRating,
});
// Dedicated assistant_feedback_reason_view event paired with the
// umbrella surface_view above. Requires the full project + conversation
// identity (its props type is stricter than the umbrella variant);
// skipped on test renders that mount AssistantMessage without those.
if (projectId && projectKind && conversationId) {
trackAssistantFeedbackReasonView(analytics.track, {
page: "studio",
area: "chat_panel",
element: "assistant_feedback_reason_panel",
view_type: "panel",
project_id: projectId,
project_kind: projectKind,
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
rating: reasonRating,
});
}
}, [
reasonRating,
analytics.track,
@ -620,6 +647,26 @@ function AssistantFeedback({
rating_before: ratingBefore,
has_produced_files: producedFileCount > 0,
});
// Dedicated assistant_feedback_click paired with the umbrella ui_click
// above. Carries the post-action rating in the widened union (allows
// 'none' for the clear path).
if (projectId && projectKind && conversationId) {
const ratingAfter: TrackingFeedbackRatingWithNone = nextRating ?? "none";
trackAssistantFeedbackClick(analytics.track, {
page: "studio",
area: "chat_panel",
element: "assistant_feedback_button",
action: nextRating ? "submit_feedback_rating" : "clear_feedback_rating",
project_id: projectId,
project_kind: projectKind,
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
rating: ratingAfter,
rating_before: ratingBefore,
has_produced_files: producedFileCount > 0,
});
}
onFeedback(nextRating ? { rating: nextRating } : null);
};
const toggleReasonCode = (code: ChatMessageFeedbackReasonCode) => {
@ -687,6 +734,47 @@ function AssistantFeedback({
},
{ requestId },
);
// Dedicated assistant_feedback_reason_click + reason_submit paired with
// the umbrella ui_click + feedback_submit_result above. Both fire under
// the same `requestId` so PostHog can stitch click → result per the
// tracking spec.
if (projectId && projectKind && conversationId) {
const reasons = reasonCodes as TrackingFeedbackReasonCode[];
const sharedPayload = {
page: "studio" as const,
area: "chat_panel" as const,
project_id: projectId,
project_kind: projectKind,
conversation_id: conversationId,
assistant_message_id: assistantMessageId,
run_id: runId ?? null,
rating: reasonRating,
reason: reasons,
reason_count: reasons.length,
has_custom_reason: hasCustomReason,
custom_reason: hasCustomReason
? normalizeCustomReason(trimmedCustomReason)
: "",
};
trackAssistantFeedbackReasonClick(
analytics.track,
{
...sharedPayload,
element: "assistant_feedback_reason_submit_button",
action: "click_submit_feedback_reason",
},
{ requestId },
);
trackAssistantFeedbackReasonSubmit(
analytics.track,
{
...sharedPayload,
element: "assistant_feedback_reason_submit",
action: "submit_feedback_reason",
},
{ requestId },
);
}
onFeedback({
rating: reasonRating,
reasonCodes,

View file

@ -19,9 +19,11 @@ import {
listActiveChatRuns,
listProjectRuns,
reattachDaemonRun,
reportChatRunFeedback,
streamViaDaemon,
} from '../providers/daemon';
import { fetchElevenLabsVoiceOptions } from '../providers/elevenlabs-voices';
import { normalizeCustomReason } from '@open-design/contracts/analytics';
import {
deletePreviewComment,
fetchPreviewComments,
@ -1516,8 +1518,25 @@ export function ProjectView({
},
true,
);
// Forward affirmative ratings to the daemon → Langfuse `score-create`.
// Clears (change=null) are skipped — Langfuse scores are append-only,
// and the rating is also captured by the PostHog event so a clear is
// recoverable downstream if we ever need it.
const runId = assistantMessage.runId;
if (change && runId && activeConversationId) {
void reportChatRunFeedback({
runId,
projectId: project.id,
conversationId: activeConversationId,
assistantMessageId: assistantMessage.id,
rating: change.rating,
reasonCodes: change.reasonCodes ?? [],
hasCustomReason: !!change.customReason,
customReason: normalizeCustomReason(change.customReason),
});
}
},
[updateMessageById],
[updateMessageById, activeConversationId, project.id],
);
const appendAssistantErrorEvent = useCallback(

View file

@ -365,6 +365,31 @@ export async function submitChatRunToolResult(
}
}
// Forwards the user's assistant-turn rating to the daemon so it can emit
// a Langfuse `score-create`. Fire-and-forget — failures are not surfaced
// to the UI (the rating is already persisted on the message itself via
// the PUT /messages/:id round-trip).
export async function reportChatRunFeedback(req: {
runId: string;
projectId: string;
conversationId: string;
assistantMessageId: string;
rating: 'positive' | 'negative';
reasonCodes: string[];
hasCustomReason: boolean;
customReason: string;
}): Promise<void> {
try {
await fetch(`/api/runs/${encodeURIComponent(req.runId)}/feedback`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(req),
});
} catch {
// Best-effort.
}
}
export async function listActiveChatRuns(
projectId: string,
conversationId: string,

View file

@ -46,6 +46,7 @@ describe('ChatPane streaming state', () => {
render(
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={false}
error={null}
@ -117,6 +118,7 @@ Expected output:
render(
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={false}
error={null}

View file

@ -23,6 +23,8 @@ describe('AssistantMessage tool status', () => {
it('shows Done for a completed run tool use that has no tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
@ -43,6 +45,8 @@ describe('AssistantMessage tool status', () => {
it('keeps legacy completed messages without runStatus as Done', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
@ -66,6 +70,8 @@ describe('AssistantMessage tool status', () => {
it('shows Done in a grouped completed run when tool results are missing', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
@ -92,6 +98,8 @@ describe('AssistantMessage tool status', () => {
it('does not show Done when a failed run is missing a tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
@ -115,6 +123,8 @@ describe('AssistantMessage tool status', () => {
it('does not show Done when a canceled run is missing a tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
@ -138,6 +148,8 @@ describe('AssistantMessage tool status', () => {
it('keeps Running for a streaming tool use that has no tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{

View file

@ -79,6 +79,8 @@ describe('AssistantMessage unfinished todo state', () => {
it('shows a soft no-output state instead of Done for empty API responses', () => {
render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{ kind: 'status', label: 'empty_response', detail: 'deepseek-chat' },
{
@ -101,6 +103,8 @@ describe('AssistantMessage unfinished todo state', () => {
it('keeps Done for a completed latest TodoWrite fixture', () => {
render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
@ -166,6 +170,8 @@ describe('AssistantMessage unfinished todo state', () => {
const onContinue = vi.fn();
render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
@ -213,6 +219,8 @@ describe('AssistantMessage unfinished todo state', () => {
it('hides the continue button on older assistant turns', () => {
render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',

View file

@ -113,6 +113,7 @@ function renderChatPane({
onAssistantFeedback,
...render(
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={streaming}
error={null}

View file

@ -135,6 +135,7 @@ function setUserScroll(top: number) {
function chatPaneEl(messages: ChatMessage[], activeConversationId: string | null) {
return (
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={false}
error={null}

View file

@ -21,6 +21,7 @@ import type { ChatMessage, Conversation } from '../../src/types';
function renderChatPane(messages: ChatMessage[]) {
return render(
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={false}
error={null}

View file

@ -33,6 +33,10 @@ export type AnalyticsEventName =
| 'artifact_export_result'
// Feedback
| 'feedback_submit_result'
| 'assistant_feedback_click'
| 'assistant_feedback_reason_view'
| 'assistant_feedback_reason_click'
| 'assistant_feedback_reason_submit'
// Settings
| 'settings_view'
| 'settings_cli_test_result'
@ -158,6 +162,35 @@ export type TrackingRunResult = 'success' | 'failed' | 'cancelled';
export type TrackingExportResult = 'success' | 'failed' | 'cancelled';
export type TrackingTestResult = 'success' | 'failed' | 'timeout';
export type TrackingFeedbackRating = 'positive' | 'negative';
// Click events emit `none` when the user clears a previously-set rating, so
// `rating` (post-state) and `rating_before` (pre-state) on click both use
// this widened union. Reason events still require a concrete rating.
export type TrackingFeedbackRatingWithNone = 'positive' | 'negative' | 'none';
export type TrackingFeedbackAction =
| 'submit_feedback_rating'
| 'clear_feedback_rating';
// Mirrors ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts.
// Kept independent so the analytics wire format can evolve without forcing
// a contract bump on the chat persistence shape.
export type TrackingFeedbackReasonCode =
| 'matched_request'
| 'strong_visual'
| 'useful_structure'
| 'easy_to_continue'
| 'followed_design_system'
| 'missed_request'
| 'weak_visual'
| 'incomplete_output'
| 'hard_to_use'
| 'missed_design_system'
| 'other';
// Product confirmed on 2026-05-13: custom_reason ships the raw text so
// analysts can read the actual feedback. The earlier length-bucket approach
// from the tracking doc draft is no longer in effect.
export type TrackingTokenCountSource =
| 'provider_usage'
| 'estimated'
@ -1215,6 +1248,65 @@ export interface FeedbackSubmitResultProps {
result: TrackingResult;
}
interface AssistantFeedbackBase {
page: 'studio';
area: 'chat_panel';
project_id: string;
project_kind: TrackingProjectKind;
conversation_id: string;
assistant_message_id: string;
// run_id may be absent for messages whose run record is missing or pruned,
// but the product funnel keys off this; we emit `null` rather than dropping
// the field so PostHog can distinguish "no run id" from "field forgotten".
run_id: string | null;
rating: TrackingFeedbackRating;
}
// Click events override `rating` to allow `'none'` because the user can
// clear a previously-set rating; reason_* events still inherit the
// stricter `positive | negative` base since they only fire after the user
// commits to a thumb.
export interface AssistantFeedbackClickProps
extends Omit<AssistantFeedbackBase, 'rating'> {
element: 'assistant_feedback_button';
action: TrackingFeedbackAction;
/** Post-action state. `'none'` when the user just cleared their rating. */
rating: TrackingFeedbackRatingWithNone;
/** Pre-action state. Renamed from `previous_rating` for symmetry with `rating`. */
rating_before: TrackingFeedbackRatingWithNone;
has_produced_files: boolean;
}
export interface AssistantFeedbackReasonViewProps extends AssistantFeedbackBase {
element: 'assistant_feedback_reason_panel';
view_type: 'panel';
}
// Shape shared by reason_click (button click) and reason_submit (result).
// Both fire from the same submit handler with the same payload, threaded by
// request_id so PostHog can stitch click→result.
interface AssistantFeedbackReasonResultBase extends AssistantFeedbackBase {
reason: TrackingFeedbackReasonCode[];
reason_count: number;
has_custom_reason: boolean;
/** Raw free-text the user typed in the "other" input. Empty string when
* the user didn't select "other" or left the field blank. Product
* confirmed on 2026-05-13 that the raw text ships (no length bucketing). */
custom_reason: string;
}
export interface AssistantFeedbackReasonClickProps
extends AssistantFeedbackReasonResultBase {
element: 'assistant_feedback_reason_submit_button';
action: 'click_submit_feedback_reason';
}
export interface AssistantFeedbackReasonSubmitProps
extends AssistantFeedbackReasonResultBase {
element: 'assistant_feedback_reason_submit';
action: 'submit_feedback_reason';
}
// SETTINGS view + result events (page=settings)
export interface SettingsViewProps {
page_name: TrackingSettingsPage;
@ -1264,6 +1356,19 @@ export type AnalyticsEventPayload =
| { event: 'file_upload_result'; props: FileUploadResultProps }
| { event: 'artifact_export_result'; props: ArtifactExportResultProps }
| { event: 'feedback_submit_result'; props: FeedbackSubmitResultProps }
| { event: 'assistant_feedback_click'; props: AssistantFeedbackClickProps }
| {
event: 'assistant_feedback_reason_view';
props: AssistantFeedbackReasonViewProps;
}
| {
event: 'assistant_feedback_reason_click';
props: AssistantFeedbackReasonClickProps;
}
| {
event: 'assistant_feedback_reason_submit';
props: AssistantFeedbackReasonSubmitProps;
}
| { event: 'settings_view'; props: SettingsViewProps }
| { event: 'settings_cli_test_result'; props: SettingsCliTestResultProps }
| { event: 'settings_byok_test_result'; props: SettingsByokTestResultProps }
@ -1567,3 +1672,13 @@ export function deriveConfigureGlobals(
configure_availability: configureAvailability,
};
}
// Normalize the "other" custom-reason free text for transport. Trims
// whitespace and returns empty string when the field is blank or the user
// didn't select the "other" option. Callers should pass the raw text only
// when `has_custom_reason` is true; the helper itself is permissive.
export function normalizeCustomReason(
text: string | null | undefined,
): string {
return (text ?? '').trim();
}

View file

@ -70,6 +70,34 @@ export interface ChatMessageFeedback {
updatedAt?: number;
}
/**
* POST /api/runs/:runId/feedback relays the user's assistant-turn rating
* to Langfuse as a `score-create` so evals can filter traces by feedback.
* The daemon is the single network egress point for telemetry (web never
* talks to Langfuse directly), and gates this on `telemetry.metrics +
* telemetry.content` consent independently of what the browser thinks.
*
* `customReason` ships the raw free text the user typed in the "other"
* input (trimmed). Product confirmed on 2026-05-13 that analysts need the
* text to make sense of the feedback; this is consent-gated behind
* `telemetry.content` like the rest of the message-content telemetry.
*/
export interface ChatRunFeedbackRequest {
projectId: string;
conversationId: string;
assistantMessageId: string;
rating: ChatMessageFeedbackRating;
reasonCodes: ChatMessageFeedbackReasonCode[];
hasCustomReason: boolean;
/** Raw "other" free text (trimmed). Empty string when no custom reason. */
customReason: string;
}
export interface ChatRunFeedbackResponse {
/** `'accepted'` once the daemon has enqueued (or skipped due to consent). */
status: 'accepted' | 'skipped_consent' | 'skipped_no_sink';
}
export interface ChatRunCreateResponse {
runId: string;
appliedPluginSnapshotId?: string;