open-design/apps/web/tests/components/chat-scroll-preservation.test.tsx
lefarcen 6690dbd5bb
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
2026-05-21 19:28:51 +08:00

258 lines
9.2 KiB
TypeScript

// @vitest-environment jsdom
// Polyfill scrollTo for jsdom (not available in jsdom's HTMLElement)
if (typeof HTMLElement.prototype.scrollTo !== 'function') {
HTMLElement.prototype.scrollTo = function (
options?: ScrollToOptions | number,
_y?: number,
) {
if (typeof options === 'object' && options !== null) {
if (options.top !== undefined) this.scrollTop = options.top;
if (options.left !== undefined) this.scrollLeft = options.left;
}
};
}
import { act, cleanup, fireEvent, render, screen } from '@testing-library/react';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { ChatPane } from '../../src/components/ChatPane';
import type { ChatMessage } from '../../src/types';
// jsdom does not run a layout engine, so scrollHeight/clientHeight/scrollTop
// are all 0. The scroll-preservation effect derives "near bottom" and the
// restore target from those, so we install prototype-level getters/setters
// that route every chat-log read/write through a per-test `geom` object.
//
// The earlier shape installed instance-level Object.defineProperty mocks on
// the *remounted* chat-log only AFTER `await switchTab('Chat')`. Inside that
// act() the component schedules a rAF that writes scrollTop on the new
// element; depending on whether jsdom's rAF polyfill flushed before the await
// resolved, the write either landed on the still-default prototype setter
// (lost) or the not-yet-installed instance setter (also lost). The instance
// mock's closure-captured `remountedTop` then served its initial 0 forever
// and the assertion failed nondeterministically across CI runs without any
// product-code change. Patching at the prototype level eliminates the race
// because any chat-log instance React mounts later automatically reads/writes
// through this single test-controlled geometry.
type Geom = { scrollHeight: number; clientHeight: number; scrollTop: number };
let geom: Geom;
let rafCallbacks: FrameRequestCallback[] = [];
let resizeCallbacks: ResizeObserverCallback[] = [];
let savedDescriptors: Record<
'scrollTop' | 'scrollHeight' | 'clientHeight',
PropertyDescriptor | undefined
>;
let originalResizeObserver: typeof ResizeObserver | undefined;
function isChatLog(el: HTMLElement): boolean {
return typeof el?.classList?.contains === 'function' && el.classList.contains('chat-log');
}
beforeEach(() => {
geom = { scrollHeight: 0, clientHeight: 0, scrollTop: 0 };
rafCallbacks = [];
resizeCallbacks = [];
vi.spyOn(window, 'requestAnimationFrame').mockImplementation((callback) => {
rafCallbacks.push(callback);
return rafCallbacks.length;
});
originalResizeObserver = globalThis.ResizeObserver;
class MockResizeObserver {
constructor(callback: ResizeObserverCallback) {
resizeCallbacks.push(callback);
}
observe = vi.fn();
unobserve = vi.fn();
disconnect = vi.fn();
}
Object.defineProperty(globalThis, 'ResizeObserver', {
configurable: true,
writable: true,
value: MockResizeObserver,
});
savedDescriptors = {
scrollTop: Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'scrollTop'),
scrollHeight: Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'scrollHeight'),
clientHeight: Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'clientHeight'),
};
Object.defineProperty(HTMLElement.prototype, 'scrollTop', {
configurable: true,
get(this: HTMLElement) {
return isChatLog(this) ? geom.scrollTop : 0;
},
set(this: HTMLElement, v: number) {
if (isChatLog(this)) geom.scrollTop = v;
},
});
Object.defineProperty(HTMLElement.prototype, 'scrollHeight', {
configurable: true,
get(this: HTMLElement) {
return isChatLog(this) ? geom.scrollHeight : 0;
},
});
Object.defineProperty(HTMLElement.prototype, 'clientHeight', {
configurable: true,
get(this: HTMLElement) {
return isChatLog(this) ? geom.clientHeight : 0;
},
});
});
afterEach(() => {
cleanup();
vi.restoreAllMocks();
rafCallbacks = [];
resizeCallbacks = [];
if (originalResizeObserver) {
Object.defineProperty(globalThis, 'ResizeObserver', {
configurable: true,
writable: true,
value: originalResizeObserver,
});
} else {
delete (globalThis as unknown as { ResizeObserver?: unknown }).ResizeObserver;
}
for (const key of ['scrollTop', 'scrollHeight', 'clientHeight'] as const) {
const original = savedDescriptors[key];
if (original) {
Object.defineProperty(HTMLElement.prototype, key, original);
} else {
delete (HTMLElement.prototype as unknown as Record<string, unknown>)[key];
}
}
});
function setGeom(partial: Partial<Geom>) {
geom = { ...geom, ...partial };
}
function setUserScroll(top: number) {
geom.scrollTop = top;
const el = document.querySelector('.chat-log');
if (el) fireEvent.scroll(el);
}
function chatPaneEl(messages: ChatMessage[], activeConversationId: string | null) {
return (
<ChatPane
projectKindForTracking="prototype"
messages={messages}
streaming={false}
error={null}
projectId="project-1"
projectFiles={[]}
onEnsureProject={async () => 'project-1'}
onSend={() => {}}
onStop={() => {}}
conversations={[]}
activeConversationId={activeConversationId}
onSelectConversation={() => {}}
onDeleteConversation={() => {}}
/>
);
}
function renderChatPane(messages: ChatMessage[], activeConversationId: string | null = null) {
return render(chatPaneEl(messages, activeConversationId));
}
const sampleMessages: ChatMessage[] = [
{ id: 'u1', role: 'user', content: 'first request', createdAt: Date.now() },
{ id: 'a1', role: 'assistant', content: 'first reply', createdAt: Date.now() },
{ id: 'u2', role: 'user', content: 'second request', createdAt: Date.now() },
{ id: 'a2', role: 'assistant', content: 'second reply', createdAt: Date.now() },
];
async function flushFrame() {
await act(async () => {
const callbacks = rafCallbacks;
rafCallbacks = [];
callbacks.forEach((callback) => callback(performance.now()));
await Promise.resolve();
});
}
async function triggerResize() {
await act(async () => {
const callbacks = [...resizeCallbacks];
callbacks.forEach((callback) =>
callback([], {} as ResizeObserver),
);
await Promise.resolve();
});
}
describe('chat scroll behavior', () => {
it('does not render the removed comments tab beside chat', () => {
renderChatPane(sampleMessages);
expect(screen.queryByRole('tab', { name: 'Comments' })).toBeNull();
expect(screen.queryByRole('tab', { name: 'Chat' })).toBeNull();
});
it('does not auto-scroll a short scrollback (~90px above bottom) when new content streams in', async () => {
setGeom({ scrollHeight: 1000, clientHeight: 400, scrollTop: 0 });
const { rerender } = render(chatPaneEl(sampleMessages, null));
// Drain the initial-bottom-scroll rAF queued during the first render,
// otherwise it fires after our setUserScroll calls and re-pins the
// ref to true behind the test's back.
await flushFrame();
// User intentionally scrolls 90px up: distance = 1000 - 510 - 400 = 90.
// That's between the 80px auto-follow cutoff and the 120px jump-button
// threshold, so the user is deliberately reading slightly earlier
// output and should not be yanked to the latest message.
setUserScroll(510);
// A new assistant message streams in; scrollHeight grows.
const streamed: ChatMessage[] = [
...sampleMessages,
{ id: 'a3', role: 'assistant', content: 'streaming chunk', createdAt: Date.now() },
];
setGeom({ scrollHeight: 1100 });
await act(async () => {
rerender(chatPaneEl(streamed, null));
});
await flushFrame();
// Auto-scroll must respect the user's pause: scrollTop stays where
// they put it instead of snapping to the new scrollHeight.
expect(geom.scrollTop).toBe(510);
});
it('keeps following the latest content when the pinned chat DOM grows after render', async () => {
setGeom({ scrollHeight: 1000, clientHeight: 400, scrollTop: 0 });
renderChatPane(sampleMessages);
await flushFrame();
expect(geom.scrollTop).toBe(1000);
// A message body can grow after the messages effect has already run
// (markdown/tool rows/forms/images/layout). If the user was pinned to
// the bottom, that DOM resize must still carry them to the latest turn.
setGeom({ scrollHeight: 1200 });
await triggerResize();
await flushFrame();
expect(geom.scrollTop).toBe(1200);
});
it('lands new conversation at its own bottom when switching conversations', async () => {
const { rerender } = render(chatPaneEl(sampleMessages, 'conv-A'));
setGeom({ scrollHeight: 1000, clientHeight: 400, scrollTop: 0 });
// User scrolls up in conversation A.
setUserScroll(150);
// The active conversation changes to B. It must land at conversation
// B's own initial bottom, not at scrollTop: 0 and not at conversation
// A's saved offset.
rerender(chatPaneEl(sampleMessages, 'conv-B'));
await flushFrame();
// Saved state was cleared by the activeConversationId-reset effect,
// so the new conversation lands at its own scrollHeight rather than
// the browser default 0.
expect(geom.scrollTop).toBe(1000);
});
});