open-design/apps/daemon/tests/langfuse-trace.test.ts
lefarcen 6690dbd5bb
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
2026-05-21 19:28:51 +08:00

861 lines
27 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
buildFeedbackPayload,
buildTracePayload,
readLangfuseConfig,
readTelemetrySinkConfig,
reportRunCompleted,
reportRunFeedback,
type FeedbackReportContext,
type LangfuseConfig,
type ReportContext,
type TelemetrySinkConfig,
} from '../src/langfuse-trace.js';
function makeCtx(overrides: Partial<ReportContext> = {}): ReportContext {
const base: ReportContext = {
installationId: 'install-uuid-1',
projectId: 'proj-1',
conversationId: 'conv-uuid-aaaa-bbbb-cccc-dddd-eeeeeeeeeeee',
agentId: 'claude',
run: {
runId: 'run-1',
status: 'succeeded',
startedAt: 1_700_000_000_000,
endedAt: 1_700_000_004_500,
},
message: {
messageId: 'msg-1',
prompt: 'Make a landing page for a coffee shop.',
output: 'Here is a landing page draft …',
usage: {
inputTokens: 1234,
outputTokens: 567,
totalTokens: 1801,
},
},
artifacts: [],
tools: [
{
id: 'tool-1',
name: 'Bash',
startedAt: 1_700_000_001_000,
endedAt: 1_700_000_001_800,
input: '{"command":"ls -la"}',
output: 'total 0',
},
{
id: 'tool-2',
name: 'Write',
startedAt: 1_700_000_002_000,
endedAt: 1_700_000_002_900,
input: '{"path":"index.html"}',
output: 'wrote index.html',
},
],
eventsSummary: { toolCalls: 2, errors: 0, durationMs: 4500 },
prefs: { metrics: true, content: false, artifactManifest: false },
};
return { ...base, ...overrides };
}
const TEST_CONFIG: LangfuseConfig = {
authHeader: 'Basic dGVzdA==',
baseUrl: 'https://us.cloud.langfuse.com',
timeoutMs: 20_000,
retries: 0,
};
function bodyOf(
batch: unknown[],
type: string,
name?: string,
): Record<string, any> {
const event = (batch as Array<{ type: string; body: Record<string, any> }>).find(
(item) => item.type === type && (name === undefined || item.body.name === name),
);
expect(event).toBeTruthy();
return event!.body;
}
describe('readLangfuseConfig', () => {
it('returns null when keys are missing', () => {
expect(readLangfuseConfig({})).toBeNull();
expect(readLangfuseConfig({ LANGFUSE_PUBLIC_KEY: 'pk' })).toBeNull();
expect(readLangfuseConfig({ LANGFUSE_SECRET_KEY: 'sk' })).toBeNull();
});
it('returns null when keys are whitespace-only', () => {
expect(
readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: ' ',
LANGFUSE_SECRET_KEY: 'sk',
}),
).toBeNull();
});
it('builds Basic auth header from public:secret', () => {
const cfg = readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: 'pk-lf-abc',
LANGFUSE_SECRET_KEY: 'sk-lf-xyz',
});
expect(cfg).not.toBeNull();
const expected =
'Basic ' + Buffer.from('pk-lf-abc:sk-lf-xyz').toString('base64');
expect(cfg!.authHeader).toBe(expected);
});
it('uses default US base URL when LANGFUSE_BASE_URL is absent', () => {
const cfg = readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
});
expect(cfg!.baseUrl).toBe('https://us.cloud.langfuse.com');
});
it('honours LANGFUSE_BASE_URL and strips trailing slashes', () => {
const cfg = readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
LANGFUSE_BASE_URL: 'https://cloud.langfuse.com//',
});
expect(cfg!.baseUrl).toBe('https://cloud.langfuse.com');
});
it('reads optional timeout and retry tuning from env', () => {
const cfg = readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
LANGFUSE_TIMEOUT_MS: '45000',
LANGFUSE_RETRIES: '2',
});
expect(cfg!.timeoutMs).toBe(45_000);
expect(cfg!.retries).toBe(2);
});
it('falls back when timeout and retry env values are invalid', () => {
const cfg = readLangfuseConfig({
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
LANGFUSE_TIMEOUT_MS: '-1',
LANGFUSE_RETRIES: '-2',
});
expect(cfg!.timeoutMs).toBe(20_000);
expect(cfg!.retries).toBe(1);
});
});
describe('readTelemetrySinkConfig', () => {
it('prefers the Open Design telemetry relay when configured', () => {
const cfg = readTelemetrySinkConfig({
OPEN_DESIGN_TELEMETRY_RELAY_URL: 'https://telemetry.open-design.ai/api/langfuse//',
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
});
expect(cfg).toEqual({
kind: 'relay',
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
timeoutMs: 20_000,
retries: 1,
});
});
it('uses relay-specific timeout and retry tuning when present', () => {
const cfg = readTelemetrySinkConfig({
OPEN_DESIGN_TELEMETRY_RELAY_URL: 'https://telemetry.open-design.ai/api/langfuse',
OPEN_DESIGN_TELEMETRY_TIMEOUT_MS: '30000',
OPEN_DESIGN_TELEMETRY_RETRIES: '3',
LANGFUSE_TIMEOUT_MS: '1',
LANGFUSE_RETRIES: '0',
});
expect(cfg).toMatchObject({
kind: 'relay',
timeoutMs: 30_000,
retries: 3,
});
});
it('falls back to direct Langfuse config for local smoke tests', () => {
const cfg = readTelemetrySinkConfig({
LANGFUSE_PUBLIC_KEY: 'pk',
LANGFUSE_SECRET_KEY: 'sk',
});
expect(cfg).toMatchObject({
kind: 'langfuse',
baseUrl: 'https://us.cloud.langfuse.com',
});
});
});
describe('buildTracePayload', () => {
it('emits a trace with nested agent + generation observations', () => {
const batch = buildTracePayload(makeCtx());
const types = (batch as Array<{ type: string }>).map((e) => e.type);
expect(types).toEqual([
'trace-create',
'span-create',
'generation-create',
'span-create',
'span-create',
]);
const span = bodyOf(batch, 'span-create', 'agent-run');
const gen = bodyOf(batch, 'generation-create', 'llm');
const bash = bodyOf(batch, 'span-create', 'tool:Bash');
const write = bodyOf(batch, 'span-create', 'tool:Write');
expect(span.id).toBe('run-1-agent');
expect(span.traceId).toBe('run-1');
expect(gen.traceId).toBe('run-1');
expect(gen.parentObservationId).toBe('run-1-agent');
expect(bash.parentObservationId).toBe('run-1-agent');
expect(bash.input).toBeUndefined();
expect(bash.output).toBeUndefined();
expect(bash.metadata.toolName).toBe('Bash');
expect(write.parentObservationId).toBe('run-1-agent');
});
it('omits prompt + output when content gate is off', () => {
const batch = buildTracePayload(makeCtx());
const trace = (batch[0] as any).body;
const span = bodyOf(batch, 'span-create', 'agent-run');
const gen = bodyOf(batch, 'generation-create', 'llm');
const tool = bodyOf(batch, 'span-create', 'tool:Bash');
expect(trace.input).toBeUndefined();
expect(trace.output).toBeUndefined();
expect(span.input).toBeUndefined();
expect(span.output).toBeUndefined();
expect(gen.input).toBeUndefined();
expect(gen.output).toBeUndefined();
expect(tool.input).toBeUndefined();
expect(tool.output).toBeUndefined();
});
it('includes prompt + output when content gate is on', () => {
const batch = buildTracePayload(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
);
const trace = (batch[0] as any).body;
const tool = bodyOf(batch, 'span-create', 'tool:Bash');
expect(trace.input).toMatch(/coffee shop/);
expect(trace.output).toMatch(/landing page draft/);
expect(tool.input).toMatch(/ls -la/);
expect(tool.output).toBe('total 0');
});
it('truncates ASCII prompt at 8 KB and output at 16 KB (bytes == chars)', () => {
const longPrompt = 'a'.repeat(20_000);
const longOutput = 'b'.repeat(40_000);
const batch = buildTracePayload(
makeCtx({
message: {
messageId: 'msg-1',
prompt: longPrompt,
output: longOutput,
},
prefs: { metrics: true, content: true, artifactManifest: false },
}),
);
const trace = (batch[0] as any).body;
expect(Buffer.byteLength(trace.input, 'utf8')).toBe(8 * 1024);
expect(Buffer.byteLength(trace.output, 'utf8')).toBe(16 * 1024);
});
it('truncates by UTF-8 bytes, not by JS string length, for multi-byte text', () => {
// Each CJK character is 3 bytes in UTF-8 but 1 unit in String.length.
// 4096 chars × 3 bytes = 12_288 bytes, well over the 8 KB input cap.
const longCJK = '设'.repeat(4096);
expect(longCJK.length).toBe(4096);
expect(Buffer.byteLength(longCJK, 'utf8')).toBe(12_288);
const batch = buildTracePayload(
makeCtx({
message: { messageId: 'msg-1', prompt: longCJK, output: '' },
prefs: { metrics: true, content: true, artifactManifest: false },
}),
);
const trace = (batch[0] as any).body;
expect(Buffer.byteLength(trace.input, 'utf8')).toBeLessThanOrEqual(8 * 1024);
// Boundary safety: the trimmed result must still be valid UTF-8 (no
// half-encoded characters). Round-tripping through Buffer should be
// lossless if the cut landed correctly.
expect(Buffer.from(trace.input as string, 'utf8').toString('utf8')).toBe(
trace.input,
);
// And every character is still '设', i.e. we didn't mangle the encoding.
expect(/^设+$/.test(trace.input as string)).toBe(true);
});
it('omits artifacts when manifest gate is off', () => {
const batch = buildTracePayload(
makeCtx({
artifacts: [
{ slug: 'a', type: 'html', sizeBytes: 100 },
{ slug: 'b', type: 'jsx', sizeBytes: 200 },
],
}),
);
const trace = (batch[0] as any).body;
expect(trace.metadata.artifacts).toBeUndefined();
expect(trace.metadata.artifactsTruncated).toBeUndefined();
});
it('caps artifacts at 50 entries with a truncation flag', () => {
const many = Array.from({ length: 75 }, (_, i) => ({
slug: `art-${i}`,
type: 'html',
sizeBytes: 1,
}));
const batch = buildTracePayload(
makeCtx({
artifacts: many,
prefs: { metrics: true, content: false, artifactManifest: true },
}),
);
const trace = (batch[0] as any).body;
expect(trace.metadata.artifacts).toHaveLength(50);
expect(trace.metadata.artifactsTruncated).toBe(true);
});
it('keeps eventsSummary metadata regardless of content / artifact gates', () => {
const batch = buildTracePayload(makeCtx());
const trace = (batch[0] as any).body;
expect(trace.metadata.eventsSummary).toEqual({
toolCalls: 2,
errors: 0,
durationMs: 4500,
});
});
it('records token counts in metadata.tokens and generation.usage', () => {
const batch = buildTracePayload(makeCtx());
const trace = (batch[0] as any).body;
const gen = bodyOf(batch, 'generation-create', 'llm');
expect(trace.metadata.tokens).toEqual({
input: 1234,
output: 567,
total: 1801,
});
expect(gen.usage).toEqual({
input: 1234,
output: 567,
total: 1801,
unit: 'TOKENS',
});
});
it('uses conversationId as sessionId when within length limit', () => {
const batch = buildTracePayload(makeCtx());
expect((batch[0] as any).body.sessionId).toBe(
'conv-uuid-aaaa-bbbb-cccc-dddd-eeeeeeeeeeee',
);
});
it('drops sessionId when conversationId exceeds 200 chars', () => {
const batch = buildTracePayload(
makeCtx({ conversationId: 'x'.repeat(201) }),
);
expect((batch[0] as any).body.sessionId).toBeUndefined();
});
it('builds tag list with project + agent + extras', () => {
const batch = buildTracePayload(
makeCtx({ extraTags: ['legacy:tag'] }),
);
expect((batch[0] as any).body.tags).toEqual([
'open-design',
'project:proj-1',
'agent:claude',
'legacy:tag',
]);
});
it('adds turn-level tags (model / skill / DS) and runtime tags (os / client)', () => {
const batch = buildTracePayload(
makeCtx({
turn: {
model: 'gpt-4o',
reasoning: 'high',
skillId: 'landing-page',
designSystemId: 'mission-control',
},
runtime: {
os: 'darwin',
arch: 'arm64',
nodeVersion: 'v22.22.0',
appVersion: '0.5.0',
clientType: 'desktop',
},
}),
);
expect((batch[0] as any).body.tags).toEqual([
'open-design',
'project:proj-1',
'agent:claude',
'model:gpt-4o',
'skill:landing-page',
'ds:mission-control',
'os:darwin',
'client:desktop',
]);
});
it('promotes model + reasoning to first-class generation fields', () => {
const batch = buildTracePayload(
makeCtx({
turn: { model: 'claude-sonnet-4-5', reasoning: 'high' },
}),
);
const gen = bodyOf(batch, 'generation-create', 'llm');
expect(gen.model).toBe('claude-sonnet-4-5');
expect(gen.modelParameters).toEqual({ reasoning: 'high' });
});
it('omits modelParameters entirely when reasoning is unset', () => {
const batch = buildTracePayload(
makeCtx({ turn: { model: 'gpt-4o' } }),
);
const gen = bodyOf(batch, 'generation-create', 'llm');
expect(gen.model).toBe('gpt-4o');
expect(gen.modelParameters).toBeUndefined();
});
it('mirrors runtime + turn fields into trace metadata for query / export', () => {
const batch = buildTracePayload(
makeCtx({
turn: { model: 'claude-sonnet-4-5', skillId: 'landing-page' },
runtime: {
os: 'linux',
arch: 'x64',
nodeVersion: 'v22.22.0',
appVersion: '0.5.0',
appChannel: 'beta',
packaged: true,
clientType: 'web',
},
}),
);
const m = (batch[0] as any).body.metadata;
expect(m.model).toBe('claude-sonnet-4-5');
expect(m.skillId).toBe('landing-page');
expect(m.os).toBe('linux');
expect(m.arch).toBe('x64');
expect(m.nodeVersion).toBe('v22.22.0');
expect(m.appVersion).toBe('0.5.0');
expect(m.appChannel).toBe('beta');
expect(m.packaged).toBe(true);
expect(m.clientType).toBe('web');
expect(m.projectId).toBe('proj-1');
expect(m.agent).toBe('claude');
});
it('marks generation.level=ERROR when run failed', () => {
const batch = buildTracePayload(
makeCtx({
run: {
runId: 'run-1',
status: 'failed',
startedAt: 1,
endedAt: 2,
error: 'boom',
},
}),
);
const span = bodyOf(batch, 'span-create', 'agent-run');
const gen = bodyOf(batch, 'generation-create', 'llm');
expect(gen.level).toBe('ERROR');
expect(gen.statusMessage).toBe('boom');
expect(span.level).toBe('ERROR');
expect(span.statusMessage).toBe('boom');
expect(bodyOf(batch, 'event-create', 'run-error').statusMessage).toBe('boom');
expect((batch[0] as any).body.metadata.error).toBe('boom');
expect((batch[0] as any).body.metadata.success).toBe(false);
});
it('passes through anonymous installationId as userId', () => {
const batch = buildTracePayload(makeCtx({ installationId: null }));
expect((batch[0] as any).body.userId).toBeUndefined();
});
});
describe('reportRunCompleted', () => {
let warnSpy: ReturnType<typeof vi.spyOn>;
beforeEach(() => {
warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
});
afterEach(() => {
warnSpy.mockRestore();
vi.restoreAllMocks();
});
it('does nothing when metrics gate is off', async () => {
const fetchSpy = vi.fn();
await reportRunCompleted(
makeCtx({
prefs: { metrics: false, content: true, artifactManifest: true },
}),
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
);
expect(fetchSpy).not.toHaveBeenCalled();
});
it('does nothing when content gate is off', async () => {
const fetchSpy = vi.fn();
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: false, artifactManifest: true },
}),
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
);
expect(fetchSpy).not.toHaveBeenCalled();
});
it('does nothing when no Langfuse config is available', async () => {
const fetchSpy = vi.fn();
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: null,
fetchImpl: fetchSpy as any,
},
);
expect(fetchSpy).not.toHaveBeenCalled();
});
it('POSTs to /api/public/ingestion with Basic auth and a JSON batch body', async () => {
const fetchSpy = vi.fn().mockResolvedValue(
new Response('{}', { status: 200 }),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
},
);
expect(fetchSpy).toHaveBeenCalledTimes(1);
const call = fetchSpy.mock.calls[0]!;
const url = call[0] as string;
const init = call[1] as RequestInit & { headers: Record<string, string> };
expect(url).toBe('https://us.cloud.langfuse.com/api/public/ingestion');
expect(init.method).toBe('POST');
expect(init.headers.Authorization).toBe('Basic dGVzdA==');
expect(init.headers['Content-Type']).toBe('application/json');
const body = JSON.parse(init.body as string);
expect(Array.isArray(body.batch)).toBe(true);
expect(body.batch.map((item: any) => item.type)).toEqual([
'trace-create',
'span-create',
'generation-create',
'span-create',
'span-create',
]);
});
it('POSTs serialized ingestion batches to the Open Design telemetry relay', async () => {
const relayConfig: TelemetrySinkConfig = {
kind: 'relay',
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
timeoutMs: 20_000,
retries: 0,
};
const fetchSpy = vi.fn().mockResolvedValue(
new Response('{}', { status: 200 }),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: relayConfig,
fetchImpl: fetchSpy as any,
},
);
expect(fetchSpy).toHaveBeenCalledTimes(1);
const call = fetchSpy.mock.calls[0]!;
const url = call[0] as string;
const init = call[1] as RequestInit & { headers: Record<string, string> };
expect(url).toBe('https://telemetry.open-design.ai/api/langfuse');
expect(init.method).toBe('POST');
expect(init.headers.Authorization).toBeUndefined();
expect(init.headers['Content-Type']).toBe('application/json');
expect(init.headers['X-Open-Design-Telemetry']).toBe('langfuse-ingestion-v1');
const body = JSON.parse(init.body as string);
expect(Array.isArray(body.batch)).toBe(true);
});
it('warns when the relay returns per-event errors', async () => {
const relayConfig: TelemetrySinkConfig = {
kind: 'relay',
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
timeoutMs: 20_000,
retries: 0,
};
const fetchSpy = vi.fn().mockResolvedValue(
new Response(
JSON.stringify({ successes: [], errors: [{ id: 'bad', status: 400 }] }),
{ status: 207 },
),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: relayConfig,
fetchImpl: fetchSpy as any,
},
);
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('Relay per-event errors (1)'),
);
});
it('warns and drops when serialized batch exceeds the hard cap', async () => {
// Per-field truncation already caps prompt/output, so we overflow the
// hard cap by stuffing 50 artifact entries with very long slugs while
// artifactManifest is on (50 × 30 KB ≈ 1.5 MB > 1 MB cap).
const fetchSpy = vi.fn();
const fatArtifacts = Array.from({ length: 50 }, (_, i) => ({
slug: 'a'.repeat(30_000) + i,
type: 'html',
sizeBytes: 1,
}));
await reportRunCompleted(
makeCtx({
artifacts: fatArtifacts,
prefs: { metrics: true, content: true, artifactManifest: true },
}),
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
);
expect(fetchSpy).not.toHaveBeenCalled();
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('Batch too large'),
);
});
it('only warns (does not throw) when fetch rejects', async () => {
const fetchSpy = vi.fn().mockRejectedValue(new Error('network down'));
await expect(
reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
},
),
).resolves.toBeUndefined();
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('Fetch error'),
);
});
it('retries once when fetch rejects before warning', async () => {
const fetchSpy = vi
.fn()
.mockRejectedValueOnce(new Error('timeout'))
.mockResolvedValueOnce(new Response('{}', { status: 207 }));
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: { ...TEST_CONFIG, retries: 1 },
fetchImpl: fetchSpy as any,
},
);
expect(fetchSpy).toHaveBeenCalledTimes(2);
expect(warnSpy).not.toHaveBeenCalled();
});
it('only warns (does not throw) when ingestion responds non-2xx', async () => {
const fetchSpy = vi.fn().mockResolvedValue(
new Response('rate limited', { status: 429 }),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
},
);
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('Ingestion failed 429'),
);
});
it('warns when 207 Multi-Status body lists per-event errors', async () => {
// Langfuse legacy ingestion always responds with 207. response.ok is
// true, but malformed events show up in body.errors instead of as a
// top-level non-2xx. Without parsing them they'd be silently dropped.
const fetchSpy = vi.fn().mockResolvedValue(
new Response(
JSON.stringify({
successes: [{ id: 'a', status: 201 }],
errors: [
{
id: 'b',
status: 400,
message: 'invalid generation usage shape',
},
],
}),
{ status: 207 },
),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
},
);
expect(warnSpy).toHaveBeenCalledWith(
expect.stringContaining('Per-event errors (1)'),
);
});
it('does not warn when 207 body has empty errors array', async () => {
const fetchSpy = vi.fn().mockResolvedValue(
new Response(
JSON.stringify({
successes: [
{ id: 'a', status: 201 },
{ id: 'b', status: 201 },
],
errors: [],
}),
{ status: 207 },
),
);
await reportRunCompleted(
makeCtx({
prefs: { metrics: true, content: true, artifactManifest: false },
}),
{
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
},
);
expect(warnSpy).not.toHaveBeenCalled();
});
});
function makeFeedbackCtx(
overrides: Partial<FeedbackReportContext> = {},
): FeedbackReportContext {
return {
runId: 'run-feedback-1',
installationId: 'install-uuid-1',
prefs: { metrics: true, content: true },
rating: 'positive',
reasonCodes: ['matched_request'],
hasCustomReason: false,
customReason: '',
...overrides,
};
}
describe('buildFeedbackPayload', () => {
it('emits a numeric user_rating score plus per-reason categorical scores', () => {
const batch = buildFeedbackPayload(
makeFeedbackCtx({
rating: 'negative',
reasonCodes: ['missed_request', 'weak_visual'],
hasCustomReason: true,
customReason: 'It got the layout wrong on tablet',
}),
) as Array<Record<string, any>>;
expect(batch).toHaveLength(3);
const ratingScore = batch[0]!;
expect(ratingScore.type).toBe('score-create');
expect(ratingScore.body.traceId).toBe('run-feedback-1');
expect(ratingScore.body.name).toBe('user_rating');
expect(ratingScore.body.value).toBe(-1);
expect(ratingScore.body.dataType).toBe('NUMERIC');
expect(ratingScore.body.comment).toBe('negative');
expect(ratingScore.body.metadata).toMatchObject({
reasonCount: 2,
customReason: 'It got the layout wrong on tablet',
hasCustomReason: true,
});
for (const reasonScore of batch.slice(1)) {
expect(reasonScore.body.name).toBe('user_rating_reason');
expect(reasonScore.body.dataType).toBe('CATEGORICAL');
expect(reasonScore.body.comment).toBe('negative');
expect(reasonScore.body.traceId).toBe('run-feedback-1');
}
expect(batch[1]!.body.value).toBe('missed_request');
expect(batch[2]!.body.value).toBe('weak_visual');
});
it('does not emit reason scores when no codes were submitted', () => {
const batch = buildFeedbackPayload(
makeFeedbackCtx({ reasonCodes: [] }),
) as Array<Record<string, any>>;
expect(batch).toHaveLength(1);
expect(batch[0]!.body.name).toBe('user_rating');
expect(batch[0]!.body.value).toBe(1);
});
});
describe('reportRunFeedback', () => {
const TEST_CONFIG: LangfuseConfig = {
baseUrl: 'https://us.cloud.langfuse.com',
authHeader: 'Basic Zm9vOmJhcg==',
retries: 0,
timeoutMs: 1000,
};
beforeEach(() => {
vi.useRealTimers();
});
it('skips when metrics consent is off', async () => {
const fetchSpy = vi.fn();
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: false, content: true } }), {
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
});
expect(fetchSpy).not.toHaveBeenCalled();
});
it('skips when content consent is off', async () => {
const fetchSpy = vi.fn();
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: true, content: false } }), {
config: TEST_CONFIG,
fetchImpl: fetchSpy as any,
});
expect(fetchSpy).not.toHaveBeenCalled();
});
it('posts a score-create batch to /api/public/ingestion when consent is on', async () => {
const fetchSpy = vi.fn().mockResolvedValue(
new Response(JSON.stringify({ successes: [], errors: [] }), { status: 207 }),
);
await reportRunFeedback(
makeFeedbackCtx({ reasonCodes: ['matched_request'] }),
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
);
expect(fetchSpy).toHaveBeenCalledTimes(1);
const [url, init] = fetchSpy.mock.calls[0]!;
expect(url).toBe('https://us.cloud.langfuse.com/api/public/ingestion');
expect(init.method).toBe('POST');
const body = JSON.parse(init.body);
expect(body.batch).toHaveLength(2);
expect(body.batch[0].type).toBe('score-create');
expect(body.batch[0].body.value).toBe(1);
});
});