open-design/apps/web/tests/components/assistant-message-tool-status.test.tsx
lefarcen 6690dbd5bb
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
2026-05-21 19:28:51 +08:00

173 lines
5 KiB
TypeScript

// @vitest-environment jsdom
import { cleanup, render, screen } from '@testing-library/react';
import { afterEach, describe, expect, it } from 'vitest';
import { AssistantMessage } from '../../src/components/AssistantMessage';
import type { AgentEvent, ChatMessage } from '../../src/types';
function messageWithEvents(events: AgentEvent[]): ChatMessage {
return {
id: 'assistant-1',
role: 'assistant',
content: '',
events,
startedAt: 1_000,
endedAt: 3_000,
runStatus: 'succeeded',
};
}
describe('AssistantMessage tool status', () => {
afterEach(() => cleanup());
it('shows Done for a completed run tool use that has no tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Run guard' },
},
])}
streaming={false}
projectId="project-1"
/>,
);
expect(container.querySelector('.op-status-ok')?.textContent).toMatch(/^done$/i);
expect(container.querySelector('.op-status-running')).toBeNull();
});
it('keeps legacy completed messages without runStatus as Done', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Execute guard' },
},
]),
runStatus: undefined,
}}
streaming={false}
projectId="project-1"
/>,
);
expect(container.querySelector('.op-status-ok')?.textContent).toMatch(/^done$/i);
expect(container.querySelector('.op-status-running')).toBeNull();
});
it('shows Done in a grouped completed run when tool results are missing', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Execute guard' },
},
{
kind: 'tool_use',
id: 'tool-2',
name: 'Bash',
input: { command: 'pnpm typecheck', description: 'Execute typecheck' },
},
])}
streaming={false}
projectId="project-1"
/>,
);
expect(container.querySelector('.action-card-toggle.running')).toBeNull();
expect(screen.getByRole('button', { name: /Done/i })).toBeTruthy();
});
it('does not show Done when a failed run is missing a tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Execute guard' },
},
]),
runStatus: 'failed',
}}
streaming={false}
projectId="project-1"
/>,
);
expect(container.querySelector('.op-status-error')?.textContent).toMatch(/^error$/i);
expect(container.querySelector('.op-status-ok')).toBeNull();
});
it('does not show Done when a canceled run is missing a tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Execute guard' },
},
]),
runStatus: 'canceled',
}}
streaming={false}
projectId="project-1"
/>,
);
expect(container.querySelector('.op-status-error')?.textContent).toMatch(/^error$/i);
expect(container.querySelector('.op-status-ok')).toBeNull();
});
it('keeps Running for a streaming tool use that has no tool result', () => {
const { container } = render(
<AssistantMessage
projectKind="prototype"
conversationId="conv-1"
message={{
...messageWithEvents([
{
kind: 'tool_use',
id: 'tool-1',
name: 'Bash',
input: { command: 'pnpm guard', description: 'Run guard' },
},
]),
endedAt: undefined,
runStatus: 'running',
}}
streaming
projectId="project-1"
/>,
);
expect(container.querySelector('.op-status-running')?.textContent).toBe('running…');
expect(screen.queryByText('Done')).toBeNull();
});
});