mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback
Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:
- Contracts: TrackingFeedback* enums and the four dedicated
assistant_feedback_* event payload types (click, reason_view,
reason_click, reason_submit), plus normalizeCustomReason helper.
The new event-name variants are added to TrackingEventName and the
AnalyticsEventPayload discriminated union next to the existing
surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
thin route that validates rating, allowlists reasonCodes through a
simple string filter, and fire-and-forgets into the daemon's
reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
forwards the rating + reasonCodes into Langfuse as user_rating
(NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
fetch) and apps/web/src/components/ProjectView.tsx wiring so each
thumbs-up/down + reason submission posts the side-channel.
Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
file_upload_result / feedback_submit_result / settings_* event
variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
/feedback routes; merge RegisterChatRoutesDeps to include both
'paths' and 'telemetry'. Drop PR's chat-routes-local
reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
keep main's projectKindForTracking prop name and its existing
emission of surface_view / ui_click / feedback_submit_result; the
PR's analyticsCtx-based reason_view/click/submit emission is dropped
in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
to match ChatPane's current prop name.
Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
ChatMessageFeedbackReasonCode or deduped.
* fix(analytics): address review on assistant feedback rebase
Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:
- chat-routes.ts: `/feedback` now awaits the daemon's preflight
outcome and echoes it as the response. The contract was already
shaped as `accepted | skipped_consent | skipped_no_sink`, but the
previous handler always returned `accepted` because the network
send was fire-and-forget. The consent + sink decision is local
(a small file read and an env-var lookup); the actual Langfuse
upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
contract's reason-code union and deduplicated before reaching
Langfuse, so a stale or replayed client can't poison the
Langfuse score table with unknown categorical values or
duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
fire-and-forget network send so the route can claim `accepted`
honestly. The legacy `skipped_no_sink` return on app-config read
failure is preserved.
Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
add `followed_design_system` and `missed_design_system` so the
analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
raw custom-reason text is bucketed before send. Product reversed
that on 2026-05-13 (raw text now ships, consent-gated). Replace
the stale comment with the real semantics + a note that there is
no tombstone path for reason codes the user removes in a
follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
`AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
delete from the rebase; restore the analytics-context comment
above the feedback hook.
Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
ProjectView still skips reportChatRunFeedback on `change===null`,
so Langfuse retains the previous rating until the user re-submits.
The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
smaller set — buildFeedbackPayload only overwrites the codes
present in the current payload.
* feat(analytics): wire PR's dedicated assistant_feedback_* events
The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.
Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.
Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button) ↔ assistant_feedback_click
- ui_click (reason submit button) ↔ assistant_feedback_reason_click
- feedback_submit_result ↔ assistant_feedback_reason_submit
Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
861 lines
27 KiB
TypeScript
861 lines
27 KiB
TypeScript
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||
|
||
import {
|
||
buildFeedbackPayload,
|
||
buildTracePayload,
|
||
readLangfuseConfig,
|
||
readTelemetrySinkConfig,
|
||
reportRunCompleted,
|
||
reportRunFeedback,
|
||
type FeedbackReportContext,
|
||
type LangfuseConfig,
|
||
type ReportContext,
|
||
type TelemetrySinkConfig,
|
||
} from '../src/langfuse-trace.js';
|
||
|
||
function makeCtx(overrides: Partial<ReportContext> = {}): ReportContext {
|
||
const base: ReportContext = {
|
||
installationId: 'install-uuid-1',
|
||
projectId: 'proj-1',
|
||
conversationId: 'conv-uuid-aaaa-bbbb-cccc-dddd-eeeeeeeeeeee',
|
||
agentId: 'claude',
|
||
run: {
|
||
runId: 'run-1',
|
||
status: 'succeeded',
|
||
startedAt: 1_700_000_000_000,
|
||
endedAt: 1_700_000_004_500,
|
||
},
|
||
message: {
|
||
messageId: 'msg-1',
|
||
prompt: 'Make a landing page for a coffee shop.',
|
||
output: 'Here is a landing page draft …',
|
||
usage: {
|
||
inputTokens: 1234,
|
||
outputTokens: 567,
|
||
totalTokens: 1801,
|
||
},
|
||
},
|
||
artifacts: [],
|
||
tools: [
|
||
{
|
||
id: 'tool-1',
|
||
name: 'Bash',
|
||
startedAt: 1_700_000_001_000,
|
||
endedAt: 1_700_000_001_800,
|
||
input: '{"command":"ls -la"}',
|
||
output: 'total 0',
|
||
},
|
||
{
|
||
id: 'tool-2',
|
||
name: 'Write',
|
||
startedAt: 1_700_000_002_000,
|
||
endedAt: 1_700_000_002_900,
|
||
input: '{"path":"index.html"}',
|
||
output: 'wrote index.html',
|
||
},
|
||
],
|
||
eventsSummary: { toolCalls: 2, errors: 0, durationMs: 4500 },
|
||
prefs: { metrics: true, content: false, artifactManifest: false },
|
||
};
|
||
return { ...base, ...overrides };
|
||
}
|
||
|
||
const TEST_CONFIG: LangfuseConfig = {
|
||
authHeader: 'Basic dGVzdA==',
|
||
baseUrl: 'https://us.cloud.langfuse.com',
|
||
timeoutMs: 20_000,
|
||
retries: 0,
|
||
};
|
||
|
||
function bodyOf(
|
||
batch: unknown[],
|
||
type: string,
|
||
name?: string,
|
||
): Record<string, any> {
|
||
const event = (batch as Array<{ type: string; body: Record<string, any> }>).find(
|
||
(item) => item.type === type && (name === undefined || item.body.name === name),
|
||
);
|
||
expect(event).toBeTruthy();
|
||
return event!.body;
|
||
}
|
||
|
||
describe('readLangfuseConfig', () => {
|
||
it('returns null when keys are missing', () => {
|
||
expect(readLangfuseConfig({})).toBeNull();
|
||
expect(readLangfuseConfig({ LANGFUSE_PUBLIC_KEY: 'pk' })).toBeNull();
|
||
expect(readLangfuseConfig({ LANGFUSE_SECRET_KEY: 'sk' })).toBeNull();
|
||
});
|
||
|
||
it('returns null when keys are whitespace-only', () => {
|
||
expect(
|
||
readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: ' ',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
}),
|
||
).toBeNull();
|
||
});
|
||
|
||
it('builds Basic auth header from public:secret', () => {
|
||
const cfg = readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk-lf-abc',
|
||
LANGFUSE_SECRET_KEY: 'sk-lf-xyz',
|
||
});
|
||
expect(cfg).not.toBeNull();
|
||
const expected =
|
||
'Basic ' + Buffer.from('pk-lf-abc:sk-lf-xyz').toString('base64');
|
||
expect(cfg!.authHeader).toBe(expected);
|
||
});
|
||
|
||
it('uses default US base URL when LANGFUSE_BASE_URL is absent', () => {
|
||
const cfg = readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
});
|
||
expect(cfg!.baseUrl).toBe('https://us.cloud.langfuse.com');
|
||
});
|
||
|
||
it('honours LANGFUSE_BASE_URL and strips trailing slashes', () => {
|
||
const cfg = readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
LANGFUSE_BASE_URL: 'https://cloud.langfuse.com//',
|
||
});
|
||
expect(cfg!.baseUrl).toBe('https://cloud.langfuse.com');
|
||
});
|
||
|
||
it('reads optional timeout and retry tuning from env', () => {
|
||
const cfg = readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
LANGFUSE_TIMEOUT_MS: '45000',
|
||
LANGFUSE_RETRIES: '2',
|
||
});
|
||
expect(cfg!.timeoutMs).toBe(45_000);
|
||
expect(cfg!.retries).toBe(2);
|
||
});
|
||
|
||
it('falls back when timeout and retry env values are invalid', () => {
|
||
const cfg = readLangfuseConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
LANGFUSE_TIMEOUT_MS: '-1',
|
||
LANGFUSE_RETRIES: '-2',
|
||
});
|
||
expect(cfg!.timeoutMs).toBe(20_000);
|
||
expect(cfg!.retries).toBe(1);
|
||
});
|
||
});
|
||
|
||
describe('readTelemetrySinkConfig', () => {
|
||
it('prefers the Open Design telemetry relay when configured', () => {
|
||
const cfg = readTelemetrySinkConfig({
|
||
OPEN_DESIGN_TELEMETRY_RELAY_URL: 'https://telemetry.open-design.ai/api/langfuse//',
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
});
|
||
expect(cfg).toEqual({
|
||
kind: 'relay',
|
||
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
|
||
timeoutMs: 20_000,
|
||
retries: 1,
|
||
});
|
||
});
|
||
|
||
it('uses relay-specific timeout and retry tuning when present', () => {
|
||
const cfg = readTelemetrySinkConfig({
|
||
OPEN_DESIGN_TELEMETRY_RELAY_URL: 'https://telemetry.open-design.ai/api/langfuse',
|
||
OPEN_DESIGN_TELEMETRY_TIMEOUT_MS: '30000',
|
||
OPEN_DESIGN_TELEMETRY_RETRIES: '3',
|
||
LANGFUSE_TIMEOUT_MS: '1',
|
||
LANGFUSE_RETRIES: '0',
|
||
});
|
||
expect(cfg).toMatchObject({
|
||
kind: 'relay',
|
||
timeoutMs: 30_000,
|
||
retries: 3,
|
||
});
|
||
});
|
||
|
||
it('falls back to direct Langfuse config for local smoke tests', () => {
|
||
const cfg = readTelemetrySinkConfig({
|
||
LANGFUSE_PUBLIC_KEY: 'pk',
|
||
LANGFUSE_SECRET_KEY: 'sk',
|
||
});
|
||
expect(cfg).toMatchObject({
|
||
kind: 'langfuse',
|
||
baseUrl: 'https://us.cloud.langfuse.com',
|
||
});
|
||
});
|
||
});
|
||
|
||
describe('buildTracePayload', () => {
|
||
it('emits a trace with nested agent + generation observations', () => {
|
||
const batch = buildTracePayload(makeCtx());
|
||
const types = (batch as Array<{ type: string }>).map((e) => e.type);
|
||
expect(types).toEqual([
|
||
'trace-create',
|
||
'span-create',
|
||
'generation-create',
|
||
'span-create',
|
||
'span-create',
|
||
]);
|
||
const span = bodyOf(batch, 'span-create', 'agent-run');
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
const bash = bodyOf(batch, 'span-create', 'tool:Bash');
|
||
const write = bodyOf(batch, 'span-create', 'tool:Write');
|
||
expect(span.id).toBe('run-1-agent');
|
||
expect(span.traceId).toBe('run-1');
|
||
expect(gen.traceId).toBe('run-1');
|
||
expect(gen.parentObservationId).toBe('run-1-agent');
|
||
expect(bash.parentObservationId).toBe('run-1-agent');
|
||
expect(bash.input).toBeUndefined();
|
||
expect(bash.output).toBeUndefined();
|
||
expect(bash.metadata.toolName).toBe('Bash');
|
||
expect(write.parentObservationId).toBe('run-1-agent');
|
||
});
|
||
|
||
it('omits prompt + output when content gate is off', () => {
|
||
const batch = buildTracePayload(makeCtx());
|
||
const trace = (batch[0] as any).body;
|
||
const span = bodyOf(batch, 'span-create', 'agent-run');
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
const tool = bodyOf(batch, 'span-create', 'tool:Bash');
|
||
expect(trace.input).toBeUndefined();
|
||
expect(trace.output).toBeUndefined();
|
||
expect(span.input).toBeUndefined();
|
||
expect(span.output).toBeUndefined();
|
||
expect(gen.input).toBeUndefined();
|
||
expect(gen.output).toBeUndefined();
|
||
expect(tool.input).toBeUndefined();
|
||
expect(tool.output).toBeUndefined();
|
||
});
|
||
|
||
it('includes prompt + output when content gate is on', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
);
|
||
const trace = (batch[0] as any).body;
|
||
const tool = bodyOf(batch, 'span-create', 'tool:Bash');
|
||
expect(trace.input).toMatch(/coffee shop/);
|
||
expect(trace.output).toMatch(/landing page draft/);
|
||
expect(tool.input).toMatch(/ls -la/);
|
||
expect(tool.output).toBe('total 0');
|
||
});
|
||
|
||
it('truncates ASCII prompt at 8 KB and output at 16 KB (bytes == chars)', () => {
|
||
const longPrompt = 'a'.repeat(20_000);
|
||
const longOutput = 'b'.repeat(40_000);
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
message: {
|
||
messageId: 'msg-1',
|
||
prompt: longPrompt,
|
||
output: longOutput,
|
||
},
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
);
|
||
const trace = (batch[0] as any).body;
|
||
expect(Buffer.byteLength(trace.input, 'utf8')).toBe(8 * 1024);
|
||
expect(Buffer.byteLength(trace.output, 'utf8')).toBe(16 * 1024);
|
||
});
|
||
|
||
it('truncates by UTF-8 bytes, not by JS string length, for multi-byte text', () => {
|
||
// Each CJK character is 3 bytes in UTF-8 but 1 unit in String.length.
|
||
// 4096 chars × 3 bytes = 12_288 bytes, well over the 8 KB input cap.
|
||
const longCJK = '设'.repeat(4096);
|
||
expect(longCJK.length).toBe(4096);
|
||
expect(Buffer.byteLength(longCJK, 'utf8')).toBe(12_288);
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
message: { messageId: 'msg-1', prompt: longCJK, output: '' },
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
);
|
||
const trace = (batch[0] as any).body;
|
||
expect(Buffer.byteLength(trace.input, 'utf8')).toBeLessThanOrEqual(8 * 1024);
|
||
// Boundary safety: the trimmed result must still be valid UTF-8 (no
|
||
// half-encoded characters). Round-tripping through Buffer should be
|
||
// lossless if the cut landed correctly.
|
||
expect(Buffer.from(trace.input as string, 'utf8').toString('utf8')).toBe(
|
||
trace.input,
|
||
);
|
||
// And every character is still '设', i.e. we didn't mangle the encoding.
|
||
expect(/^设+$/.test(trace.input as string)).toBe(true);
|
||
});
|
||
|
||
it('omits artifacts when manifest gate is off', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
artifacts: [
|
||
{ slug: 'a', type: 'html', sizeBytes: 100 },
|
||
{ slug: 'b', type: 'jsx', sizeBytes: 200 },
|
||
],
|
||
}),
|
||
);
|
||
const trace = (batch[0] as any).body;
|
||
expect(trace.metadata.artifacts).toBeUndefined();
|
||
expect(trace.metadata.artifactsTruncated).toBeUndefined();
|
||
});
|
||
|
||
it('caps artifacts at 50 entries with a truncation flag', () => {
|
||
const many = Array.from({ length: 75 }, (_, i) => ({
|
||
slug: `art-${i}`,
|
||
type: 'html',
|
||
sizeBytes: 1,
|
||
}));
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
artifacts: many,
|
||
prefs: { metrics: true, content: false, artifactManifest: true },
|
||
}),
|
||
);
|
||
const trace = (batch[0] as any).body;
|
||
expect(trace.metadata.artifacts).toHaveLength(50);
|
||
expect(trace.metadata.artifactsTruncated).toBe(true);
|
||
});
|
||
|
||
it('keeps eventsSummary metadata regardless of content / artifact gates', () => {
|
||
const batch = buildTracePayload(makeCtx());
|
||
const trace = (batch[0] as any).body;
|
||
expect(trace.metadata.eventsSummary).toEqual({
|
||
toolCalls: 2,
|
||
errors: 0,
|
||
durationMs: 4500,
|
||
});
|
||
});
|
||
|
||
it('records token counts in metadata.tokens and generation.usage', () => {
|
||
const batch = buildTracePayload(makeCtx());
|
||
const trace = (batch[0] as any).body;
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
expect(trace.metadata.tokens).toEqual({
|
||
input: 1234,
|
||
output: 567,
|
||
total: 1801,
|
||
});
|
||
expect(gen.usage).toEqual({
|
||
input: 1234,
|
||
output: 567,
|
||
total: 1801,
|
||
unit: 'TOKENS',
|
||
});
|
||
});
|
||
|
||
it('uses conversationId as sessionId when within length limit', () => {
|
||
const batch = buildTracePayload(makeCtx());
|
||
expect((batch[0] as any).body.sessionId).toBe(
|
||
'conv-uuid-aaaa-bbbb-cccc-dddd-eeeeeeeeeeee',
|
||
);
|
||
});
|
||
|
||
it('drops sessionId when conversationId exceeds 200 chars', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({ conversationId: 'x'.repeat(201) }),
|
||
);
|
||
expect((batch[0] as any).body.sessionId).toBeUndefined();
|
||
});
|
||
|
||
it('builds tag list with project + agent + extras', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({ extraTags: ['legacy:tag'] }),
|
||
);
|
||
expect((batch[0] as any).body.tags).toEqual([
|
||
'open-design',
|
||
'project:proj-1',
|
||
'agent:claude',
|
||
'legacy:tag',
|
||
]);
|
||
});
|
||
|
||
it('adds turn-level tags (model / skill / DS) and runtime tags (os / client)', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
turn: {
|
||
model: 'gpt-4o',
|
||
reasoning: 'high',
|
||
skillId: 'landing-page',
|
||
designSystemId: 'mission-control',
|
||
},
|
||
runtime: {
|
||
os: 'darwin',
|
||
arch: 'arm64',
|
||
nodeVersion: 'v22.22.0',
|
||
appVersion: '0.5.0',
|
||
clientType: 'desktop',
|
||
},
|
||
}),
|
||
);
|
||
expect((batch[0] as any).body.tags).toEqual([
|
||
'open-design',
|
||
'project:proj-1',
|
||
'agent:claude',
|
||
'model:gpt-4o',
|
||
'skill:landing-page',
|
||
'ds:mission-control',
|
||
'os:darwin',
|
||
'client:desktop',
|
||
]);
|
||
});
|
||
|
||
it('promotes model + reasoning to first-class generation fields', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
turn: { model: 'claude-sonnet-4-5', reasoning: 'high' },
|
||
}),
|
||
);
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
expect(gen.model).toBe('claude-sonnet-4-5');
|
||
expect(gen.modelParameters).toEqual({ reasoning: 'high' });
|
||
});
|
||
|
||
it('omits modelParameters entirely when reasoning is unset', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({ turn: { model: 'gpt-4o' } }),
|
||
);
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
expect(gen.model).toBe('gpt-4o');
|
||
expect(gen.modelParameters).toBeUndefined();
|
||
});
|
||
|
||
it('mirrors runtime + turn fields into trace metadata for query / export', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
turn: { model: 'claude-sonnet-4-5', skillId: 'landing-page' },
|
||
runtime: {
|
||
os: 'linux',
|
||
arch: 'x64',
|
||
nodeVersion: 'v22.22.0',
|
||
appVersion: '0.5.0',
|
||
appChannel: 'beta',
|
||
packaged: true,
|
||
clientType: 'web',
|
||
},
|
||
}),
|
||
);
|
||
const m = (batch[0] as any).body.metadata;
|
||
expect(m.model).toBe('claude-sonnet-4-5');
|
||
expect(m.skillId).toBe('landing-page');
|
||
expect(m.os).toBe('linux');
|
||
expect(m.arch).toBe('x64');
|
||
expect(m.nodeVersion).toBe('v22.22.0');
|
||
expect(m.appVersion).toBe('0.5.0');
|
||
expect(m.appChannel).toBe('beta');
|
||
expect(m.packaged).toBe(true);
|
||
expect(m.clientType).toBe('web');
|
||
expect(m.projectId).toBe('proj-1');
|
||
expect(m.agent).toBe('claude');
|
||
});
|
||
|
||
it('marks generation.level=ERROR when run failed', () => {
|
||
const batch = buildTracePayload(
|
||
makeCtx({
|
||
run: {
|
||
runId: 'run-1',
|
||
status: 'failed',
|
||
startedAt: 1,
|
||
endedAt: 2,
|
||
error: 'boom',
|
||
},
|
||
}),
|
||
);
|
||
const span = bodyOf(batch, 'span-create', 'agent-run');
|
||
const gen = bodyOf(batch, 'generation-create', 'llm');
|
||
expect(gen.level).toBe('ERROR');
|
||
expect(gen.statusMessage).toBe('boom');
|
||
expect(span.level).toBe('ERROR');
|
||
expect(span.statusMessage).toBe('boom');
|
||
expect(bodyOf(batch, 'event-create', 'run-error').statusMessage).toBe('boom');
|
||
expect((batch[0] as any).body.metadata.error).toBe('boom');
|
||
expect((batch[0] as any).body.metadata.success).toBe(false);
|
||
});
|
||
|
||
it('passes through anonymous installationId as userId', () => {
|
||
const batch = buildTracePayload(makeCtx({ installationId: null }));
|
||
expect((batch[0] as any).body.userId).toBeUndefined();
|
||
});
|
||
});
|
||
|
||
describe('reportRunCompleted', () => {
|
||
let warnSpy: ReturnType<typeof vi.spyOn>;
|
||
|
||
beforeEach(() => {
|
||
warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
|
||
});
|
||
|
||
afterEach(() => {
|
||
warnSpy.mockRestore();
|
||
vi.restoreAllMocks();
|
||
});
|
||
|
||
it('does nothing when metrics gate is off', async () => {
|
||
const fetchSpy = vi.fn();
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: false, content: true, artifactManifest: true },
|
||
}),
|
||
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
|
||
);
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('does nothing when content gate is off', async () => {
|
||
const fetchSpy = vi.fn();
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: false, artifactManifest: true },
|
||
}),
|
||
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
|
||
);
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('does nothing when no Langfuse config is available', async () => {
|
||
const fetchSpy = vi.fn();
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: null,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('POSTs to /api/public/ingestion with Basic auth and a JSON batch body', async () => {
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response('{}', { status: 200 }),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(fetchSpy).toHaveBeenCalledTimes(1);
|
||
const call = fetchSpy.mock.calls[0]!;
|
||
const url = call[0] as string;
|
||
const init = call[1] as RequestInit & { headers: Record<string, string> };
|
||
expect(url).toBe('https://us.cloud.langfuse.com/api/public/ingestion');
|
||
expect(init.method).toBe('POST');
|
||
expect(init.headers.Authorization).toBe('Basic dGVzdA==');
|
||
expect(init.headers['Content-Type']).toBe('application/json');
|
||
const body = JSON.parse(init.body as string);
|
||
expect(Array.isArray(body.batch)).toBe(true);
|
||
expect(body.batch.map((item: any) => item.type)).toEqual([
|
||
'trace-create',
|
||
'span-create',
|
||
'generation-create',
|
||
'span-create',
|
||
'span-create',
|
||
]);
|
||
});
|
||
|
||
it('POSTs serialized ingestion batches to the Open Design telemetry relay', async () => {
|
||
const relayConfig: TelemetrySinkConfig = {
|
||
kind: 'relay',
|
||
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
|
||
timeoutMs: 20_000,
|
||
retries: 0,
|
||
};
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response('{}', { status: 200 }),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: relayConfig,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(fetchSpy).toHaveBeenCalledTimes(1);
|
||
const call = fetchSpy.mock.calls[0]!;
|
||
const url = call[0] as string;
|
||
const init = call[1] as RequestInit & { headers: Record<string, string> };
|
||
expect(url).toBe('https://telemetry.open-design.ai/api/langfuse');
|
||
expect(init.method).toBe('POST');
|
||
expect(init.headers.Authorization).toBeUndefined();
|
||
expect(init.headers['Content-Type']).toBe('application/json');
|
||
expect(init.headers['X-Open-Design-Telemetry']).toBe('langfuse-ingestion-v1');
|
||
const body = JSON.parse(init.body as string);
|
||
expect(Array.isArray(body.batch)).toBe(true);
|
||
});
|
||
|
||
it('warns when the relay returns per-event errors', async () => {
|
||
const relayConfig: TelemetrySinkConfig = {
|
||
kind: 'relay',
|
||
relayUrl: 'https://telemetry.open-design.ai/api/langfuse',
|
||
timeoutMs: 20_000,
|
||
retries: 0,
|
||
};
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response(
|
||
JSON.stringify({ successes: [], errors: [{ id: 'bad', status: 400 }] }),
|
||
{ status: 207 },
|
||
),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: relayConfig,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(warnSpy).toHaveBeenCalledWith(
|
||
expect.stringContaining('Relay per-event errors (1)'),
|
||
);
|
||
});
|
||
|
||
it('warns and drops when serialized batch exceeds the hard cap', async () => {
|
||
// Per-field truncation already caps prompt/output, so we overflow the
|
||
// hard cap by stuffing 50 artifact entries with very long slugs while
|
||
// artifactManifest is on (50 × 30 KB ≈ 1.5 MB > 1 MB cap).
|
||
const fetchSpy = vi.fn();
|
||
const fatArtifacts = Array.from({ length: 50 }, (_, i) => ({
|
||
slug: 'a'.repeat(30_000) + i,
|
||
type: 'html',
|
||
sizeBytes: 1,
|
||
}));
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
artifacts: fatArtifacts,
|
||
prefs: { metrics: true, content: true, artifactManifest: true },
|
||
}),
|
||
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
|
||
);
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
expect(warnSpy).toHaveBeenCalledWith(
|
||
expect.stringContaining('Batch too large'),
|
||
);
|
||
});
|
||
|
||
it('only warns (does not throw) when fetch rejects', async () => {
|
||
const fetchSpy = vi.fn().mockRejectedValue(new Error('network down'));
|
||
await expect(
|
||
reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
),
|
||
).resolves.toBeUndefined();
|
||
expect(warnSpy).toHaveBeenCalledWith(
|
||
expect.stringContaining('Fetch error'),
|
||
);
|
||
});
|
||
|
||
it('retries once when fetch rejects before warning', async () => {
|
||
const fetchSpy = vi
|
||
.fn()
|
||
.mockRejectedValueOnce(new Error('timeout'))
|
||
.mockResolvedValueOnce(new Response('{}', { status: 207 }));
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: { ...TEST_CONFIG, retries: 1 },
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(fetchSpy).toHaveBeenCalledTimes(2);
|
||
expect(warnSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('only warns (does not throw) when ingestion responds non-2xx', async () => {
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response('rate limited', { status: 429 }),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(warnSpy).toHaveBeenCalledWith(
|
||
expect.stringContaining('Ingestion failed 429'),
|
||
);
|
||
});
|
||
|
||
it('warns when 207 Multi-Status body lists per-event errors', async () => {
|
||
// Langfuse legacy ingestion always responds with 207. response.ok is
|
||
// true, but malformed events show up in body.errors instead of as a
|
||
// top-level non-2xx. Without parsing them they'd be silently dropped.
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response(
|
||
JSON.stringify({
|
||
successes: [{ id: 'a', status: 201 }],
|
||
errors: [
|
||
{
|
||
id: 'b',
|
||
status: 400,
|
||
message: 'invalid generation usage shape',
|
||
},
|
||
],
|
||
}),
|
||
{ status: 207 },
|
||
),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(warnSpy).toHaveBeenCalledWith(
|
||
expect.stringContaining('Per-event errors (1)'),
|
||
);
|
||
});
|
||
|
||
it('does not warn when 207 body has empty errors array', async () => {
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response(
|
||
JSON.stringify({
|
||
successes: [
|
||
{ id: 'a', status: 201 },
|
||
{ id: 'b', status: 201 },
|
||
],
|
||
errors: [],
|
||
}),
|
||
{ status: 207 },
|
||
),
|
||
);
|
||
await reportRunCompleted(
|
||
makeCtx({
|
||
prefs: { metrics: true, content: true, artifactManifest: false },
|
||
}),
|
||
{
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
},
|
||
);
|
||
expect(warnSpy).not.toHaveBeenCalled();
|
||
});
|
||
});
|
||
|
||
function makeFeedbackCtx(
|
||
overrides: Partial<FeedbackReportContext> = {},
|
||
): FeedbackReportContext {
|
||
return {
|
||
runId: 'run-feedback-1',
|
||
installationId: 'install-uuid-1',
|
||
prefs: { metrics: true, content: true },
|
||
rating: 'positive',
|
||
reasonCodes: ['matched_request'],
|
||
hasCustomReason: false,
|
||
customReason: '',
|
||
...overrides,
|
||
};
|
||
}
|
||
|
||
describe('buildFeedbackPayload', () => {
|
||
it('emits a numeric user_rating score plus per-reason categorical scores', () => {
|
||
const batch = buildFeedbackPayload(
|
||
makeFeedbackCtx({
|
||
rating: 'negative',
|
||
reasonCodes: ['missed_request', 'weak_visual'],
|
||
hasCustomReason: true,
|
||
customReason: 'It got the layout wrong on tablet',
|
||
}),
|
||
) as Array<Record<string, any>>;
|
||
expect(batch).toHaveLength(3);
|
||
const ratingScore = batch[0]!;
|
||
expect(ratingScore.type).toBe('score-create');
|
||
expect(ratingScore.body.traceId).toBe('run-feedback-1');
|
||
expect(ratingScore.body.name).toBe('user_rating');
|
||
expect(ratingScore.body.value).toBe(-1);
|
||
expect(ratingScore.body.dataType).toBe('NUMERIC');
|
||
expect(ratingScore.body.comment).toBe('negative');
|
||
expect(ratingScore.body.metadata).toMatchObject({
|
||
reasonCount: 2,
|
||
customReason: 'It got the layout wrong on tablet',
|
||
hasCustomReason: true,
|
||
});
|
||
for (const reasonScore of batch.slice(1)) {
|
||
expect(reasonScore.body.name).toBe('user_rating_reason');
|
||
expect(reasonScore.body.dataType).toBe('CATEGORICAL');
|
||
expect(reasonScore.body.comment).toBe('negative');
|
||
expect(reasonScore.body.traceId).toBe('run-feedback-1');
|
||
}
|
||
expect(batch[1]!.body.value).toBe('missed_request');
|
||
expect(batch[2]!.body.value).toBe('weak_visual');
|
||
});
|
||
|
||
it('does not emit reason scores when no codes were submitted', () => {
|
||
const batch = buildFeedbackPayload(
|
||
makeFeedbackCtx({ reasonCodes: [] }),
|
||
) as Array<Record<string, any>>;
|
||
expect(batch).toHaveLength(1);
|
||
expect(batch[0]!.body.name).toBe('user_rating');
|
||
expect(batch[0]!.body.value).toBe(1);
|
||
});
|
||
});
|
||
|
||
describe('reportRunFeedback', () => {
|
||
const TEST_CONFIG: LangfuseConfig = {
|
||
baseUrl: 'https://us.cloud.langfuse.com',
|
||
authHeader: 'Basic Zm9vOmJhcg==',
|
||
retries: 0,
|
||
timeoutMs: 1000,
|
||
};
|
||
|
||
beforeEach(() => {
|
||
vi.useRealTimers();
|
||
});
|
||
|
||
it('skips when metrics consent is off', async () => {
|
||
const fetchSpy = vi.fn();
|
||
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: false, content: true } }), {
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
});
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('skips when content consent is off', async () => {
|
||
const fetchSpy = vi.fn();
|
||
await reportRunFeedback(makeFeedbackCtx({ prefs: { metrics: true, content: false } }), {
|
||
config: TEST_CONFIG,
|
||
fetchImpl: fetchSpy as any,
|
||
});
|
||
expect(fetchSpy).not.toHaveBeenCalled();
|
||
});
|
||
|
||
it('posts a score-create batch to /api/public/ingestion when consent is on', async () => {
|
||
const fetchSpy = vi.fn().mockResolvedValue(
|
||
new Response(JSON.stringify({ successes: [], errors: [] }), { status: 207 }),
|
||
);
|
||
await reportRunFeedback(
|
||
makeFeedbackCtx({ reasonCodes: ['matched_request'] }),
|
||
{ config: TEST_CONFIG, fetchImpl: fetchSpy as any },
|
||
);
|
||
expect(fetchSpy).toHaveBeenCalledTimes(1);
|
||
const [url, init] = fetchSpy.mock.calls[0]!;
|
||
expect(url).toBe('https://us.cloud.langfuse.com/api/public/ingestion');
|
||
expect(init.method).toBe('POST');
|
||
const body = JSON.parse(init.body);
|
||
expect(body.batch).toHaveLength(2);
|
||
expect(body.batch[0].type).toBe('score-create');
|
||
expect(body.batch[0].body.value).toBe(1);
|
||
});
|
||
});
|