Commit graph

593 commits

Author SHA1 Message Date
lefarcen
e1bc83a476
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration

- Add @open-design/contracts/analytics subpath with the 17 P0 event
  payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
  request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
  PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
  + React provider) and mount it under <I18nProvider> in app/layout.

No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).

* feat(analytics): wire app_launch, home_view, home_click, project_create_result

- App.tsx: fire app_launch once after first effect tick. handleCreateProject
  now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
  has_available_cli isn't transiently false; home_view (asset_panel) fires
  per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
  the parent; a fresh request_id is generated here and threaded through
  onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
  for the worktree branch's renamed tabs (live-artifact, templates).

* feat(analytics): wire settings_view + 3 settings_click events

- settings_view fires on dialog mount and on every section switch,
  carrying the active section (mapped via settingsSectionToTracking
  for the 16-section worktree layout), execution_mode, and the
  selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
  values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
  cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
  and base_url inputs; provider_id widened to include google so the
  worktree's Gemini protocol slot type-checks.

* feat(analytics): wire studio_view + studio_click chat, studio_view artifact

- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
  produces a 16-hex anonymized id for (projectId, fileName). Stable
  cross-platform so the daemon and the web bundle resolve the same id
  without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
  studio_click chat_composer fires on attachment + send buttons with
  estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
  the dispatcher level, before any sub-viewer renders, with
  artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
  so the worktree branch's full share menu can emit verbatim.

* feat(analytics): wire studio_click share_option + artifact_export_result

HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:

- studio_click share_option fires immediately on click with the chosen
  export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
  sync exporters (html, markdown, template) the moment the call
  returns, success/failed for async exporters (pdf, zip, deploy)
  via .then/.catch. The same request_id threads both events so
  PostHog stitches click → result via $insert_id.

DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.

Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.

* feat(analytics): emit run_created and run_finished from the daemon

POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:

- Captures run_created with project_id, conversation_id, run_id,
  model_id, agent_provider_id (mapped via agentIdToTracking),
  skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
  mapping succeeded/canceled/failed to success/cancelled/failed and
  reporting total_duration_ms.

Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.

Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.

* feat(analytics): emit settings_cli_test_result + settings_byok_test_result

The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.

- handleTestAgent emits settings_cli_test_result on success and
  failure paths with cli_provider_id mapped via agentIdToTracking,
  result drawn from result.ok / catch branch, error_code from
  result.kind or the thrown error name, and duration_ms timed via
  performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
  using apiProtocol (anthropic|openai|azure|ollama|google) directly
  as provider_id — wider than the CSV's 5-value enum, documented in
  tracking-doc-issues.md §2.5.

Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.

* feat(analytics): gate PostHog on the existing telemetry.metrics consent

The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.

- /api/analytics/config now consults the persisted AppConfigPrefs:
  it returns enabled=true only when POSTHOG_KEY is set AND the user
  has chosen "Share usage data" (telemetry.metrics === true). The
  response also echoes installationId so the web client uses the
  same anonymous id Langfuse keys off of — one identity per install,
  shared across both sinks.
- Web AnalyticsProvider:
  - Bootstrap fetch resolves installationId and threads it through
    the x-od-analytics-anonymous-id header on every /api/* fetch,
    so daemon-side captures (run_created / run_finished /
    project_create_result) land on the same person record.
  - Exposes a setConsent(granted) method that calls posthog-js's
    opt_in_capturing / opt_out_capturing, wired from App.tsx via a
    useEffect watching config.telemetry?.metrics. Toggling Privacy
    → metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
  declined user fires nothing, and a freshly-opted-in user fires on
  the next reload.

* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env

Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).

tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
  process.env at packaging time, validates them (no whitespace in the
  key, http(s) URL for host, trailing-slash strip), and stamps them on
  ToolPackConfig. Fork builds without the env vars simply omit the
  fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
  fields to open-design-config.json next to the existing
  telemetryRelayUrl entry.

apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
  so the Electron entry and headless entry both forward them to the
  daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
  the daemon child env when present. The daemon's existing analytics
  module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
  builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
  is read there.

CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
  and POSTHOG_HOST (var) at workflow-env scope so every packaging job
  inherits them. PR / fork builds without these set simply skip the
  bake step.

Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
  whitespace rejection, invalid-URL rejection, and trailing-slash
  normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
  forwarding the keys when present and omitting them when null.

* feat(analytics): enable PostHog autocapture + perf + exceptions

Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.

Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:

  - $autocapture / $rageclick / $dead_click / $copy_autocapture:
    strips $el_text and value/placeholder/aria-label attrs from any
    input, textarea, password input, or contenteditable element. PostHog
    autocapture does not capture input.value by default, but $el_text
    on a <textarea> reflects the typed content — that's the prompt
    body for us, so it has to be scrubbed every time.
  - $pageview / $pageleave: drops query string and fragment from
    $current_url / $referrer so any future ?q=… can't leak.
  - $exception: rewrites file:// and absolute filesystem paths in
    stack frames to app://apps/<repo-relative> so we don't ship the
    user's home directory.
  - Suppresses $opt_in entirely — duplicate of our explicit
    setConsent toggle in App.tsx.

Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.

The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.

11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.

* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions

Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:

  specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
  got:       sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=

Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".

* feat(analytics): unify PostHog identity with Langfuse installationId

PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.

Two gaps closed:

1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
   localStorage entry on opt-out via posthog.reset(). Without this, a
   user who opts out, then clicks Delete my data, then re-opts in
   would see PostHog stitch their new session to the deleted identity
   because bootstrap.distinctID only takes effect on first init.

2. applyIdentity(newInstallationId) — Delete my data rotates the
   installationId in app-config; App.tsx now watches config.installationId
   and calls posthog.reset() then identify(newId) so the next event
   batch is fully decoupled from the deleted one. Idempotent on
   same-id re-renders so benign config refreshes don't churn PostHog
   identities.

The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.

The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.

* fix(langfuse): require both metrics AND content consent for trace reports

Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.

i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).

Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.

* feat(analytics): wire byok_provider_option + fix PR review P1s

Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.

byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
  anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
  tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
  existing setApiProtocol call; is_selected reflects whether the chip
  was already active.

Review fixes:

1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
   null so a Privacy → metrics opt-in after a previous decline triggers
   a fresh /api/analytics/config fetch. Without this, the disabled
   response was cached forever — first-session opt-in needed a reload
   to start sending PostHog events.

2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
   strict same-origin + /api/ pathname check (shared
   `isSameOriginApiCall` helper). Outbound third-party URLs containing
   `/api/` (e.g. provider.example.com/api/x) no longer receive our
   x-od-analytics-* headers.

3. provider.tsx (codex-connector, lefarcen): gate header injection on
   `resolvedAnonId` being non-null. When Privacy → metrics is off,
   /api/analytics/config returns enabled=false → resolvedAnonId stays
   null → wrapper never installs → daemon can't read consent-bearing
   headers → no daemon-side PostHog event. setConsent now also clears
   resolvedAnonId on opt-out and re-fetches on opt-in.

4. daemon/analytics.ts (defense in depth): createAnalyticsService now
   takes dataDir and capture() re-reads app-config to check
   telemetry.metrics inside the fire-and-forget wrapper. Even if a
   stale header somehow reaches the daemon after opt-out, the capture
   is dropped before posthog-node.capture is called.

* fix(web): place "Share usage data" on the right in privacy consent banner

Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.

Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.

* feat(analytics): populate run_finished token totals from claude-stream usage

Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.

Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.

Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.

* feat(analytics): estimate user_query_tokens from prompt length

The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).

Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.

token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
  baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
  from baseProps so input/output absent doesn't mask the estimate.
2026-05-12 22:32:42 +08:00
lefarcen
7b191b5f85
fix: load Orbit templates from design templates (#1442)
(cherry picked from commit 988e727927)

Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
2026-05-12 19:38:16 +08:00
lefarcen
4d8d233ce0
Fix Langfuse report finalization hook (#1402) 2026-05-12 19:22:49 +08:00
PerishFire
e6c5560884
Fix appearance accent color persistence (#1439) 2026-05-12 19:11:09 +08:00
lefarcen
2a0ebea50b release: Open Design 0.7.0
- bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0
- add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0:
  - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085)
  - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894)
  - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043)
  - HyperFrames: HTML-in-Canvas across web + skills (#866)
  - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231)
  - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095)
  - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263)
  - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205)
  - Fixed: ~30 web + desktop + daemon + packaging bugfixes
  - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954)
- catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link
- add 97 PR footnote refs ([#402]..[#1330])

Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning).

Release workflow validation runs after merge via release-stable.
2026-05-12 15:33:28 +08:00
shangxinyu1
5d674410f2
test: stabilize extended Playwright coverage (#1341)
* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: restore Codex path hydration assertion
2026-05-12 15:11:34 +08:00
Eli
9c489aa045
feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select (#1161)
* feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select

- Render real previews on project cards: HTML iframe / image / video / hashed gradient fallback with project initial; lazily fetches the project's primary file when metadata.entryFile is unset, prefers index.html → newest html → image → video.
- Live artifact card thumbnails embed the rendered artifact URL via sandboxed iframe.
- Replace the per-card close button with a `…` overflow menu (Rename, Delete) that opens on hover/click; click-outside and Esc close it.
- Add multi-select mode (toolbar toggle → checkbox per card → "N selected · Delete · Cancel" pill) with batch delete via the existing onDelete prop.
- Add a category tag to every card (Prototype / Live Artifact / Slide / Media) derived from project.metadata.intent / kind / skillId.
- Replace browser prompt() and confirm() with custom modals (rename input + danger-confirm) reusing the existing .modal shell.
- Add `more-horizontal` icon and 16 new i18n keys across all 18 locales (zh-CN/zh-TW localized; others fall back to English).

* test(e2e): update home delete flow for overflow menu + custom confirm modal

The previous flow targeted a per-card X button labelled "delete project <name>"
and asserted on a native `dialog` event. The card UI now exposes a `…` overflow
menu and a styled confirm modal, so reach delete via the menu and assert against
the modal's Cancel / Delete buttons instead.

* fix(web): harden Designs tab preview sandbox

* fix(web): hide Designs select mode in kanban
2026-05-12 15:08:22 +08:00
Eli
77f69257a7
feat(web): in-context comment thread for the artifact preview (#1276)
* feat(web): free-pin fallback in comment mode for unannotated artifacts

When the artifact has no data-od-id annotations, clicking in Comment
mode now posts a synthetic position-based target so the host opens a
popover at the click location. Daemon upsert validation requires a
non-empty selector/label, so the pin uses [data-od-pin=ID] and label
'pin'. Coordinates are document-space (viewport + scrollY) so pins
stay anchored after scroll/reload. Clicks on interactive elements
(a/button/input/textarea/select/label/contenteditable) keep their
native behavior and are not pinned.

* feat(web): tighten comment popover layout for free-pin and element targets

The popover header used to dump the raw elementId verbatim — fine for
data-od-id targets like 'hero-cta' but jarring for free-pins where
elementId is a synthetic 'pin-...' string. Branch on the prefix and
show 'Pin · at X, Y' for free-pins; keep the label + selection kind
for real element / pod targets. Replace the text 'Close' button with
an icon-only close affordance to match the popover-as-card visual.

Action row is now two right-aligned buttons (Comment + Send to
Claude) for element targets and (Add note + Send to Claude) for pod
targets, eliminating the three-button row that wrapped onto two
lines at narrow widths. The 'Remove' affordance for existing
comments stays left-aligned.

* feat(web): drop comments tab from chat sidebar

The chat sidebar's 'Comments' tab listed saved/attached preview
comments but duplicates the per-element popover already shown in the
artifact viewer. Hide the tab and its content while the right-side
comment thread panel takes over the same surface in-context. The
CommentsPanel / CommentSection components stay defined as dead code
for the moment so callers and translation keys remain valid; a later
pass can delete them.

* feat(web): right-side comment thread panel in board mode

Render a 320px CommentSidePanel anchored to the right of the
artifact preview whenever board (comment) mode is on. The panel
lists every saved preview comment for the current file with an
avatar initial, the element label (or 'Pin' for free-pin synthetic
ids), an Xd/Xh/Xm-ago timestamp, the note body, a Reply link, and
a checkbox.

Reply focuses the comment's element via liveSnapshotForComment so
the popover opens at the right anchor. Selecting one or more
comments via the checkboxes surfaces a 'N selected · Clear · Send
to Claude' action bar above the list; Send to Claude reuses the
existing onSendBoardCommentAttachments pipeline via
commentsToAttachments. The panel takes the place of the chat
sidebar's removed Comments tab so the thread lives next to the
artifact instead of behind a tab switch.

* feat(web): styles for right-side comment thread panel

Floating 320px panel anchored to the right edge of the artifact
preview with a scrollable comment list and a coral selection bar
that appears when one or more comments are checked. Selected items
get a coral tint; the reply / check / send-to-claude controls
match the popover's coral primary tone.

* feat(web): toast confirmation on comment save, close popover

After savePersistentComment succeeds, close the popover via
clearBoardComposer and surface a transient 'Comment saved' (or
'Pin saved' for free-pin targets) toast for 2.2s. Replaces the
previous behavior where the popover stayed open with an empty
draft after save, which left users uncertain whether the save
landed and forced an extra click to dismiss.

* feat(web): position the comment-save toast at the top of the preview

* feat(web): allow editing saved comment notes via the side panel

Rename the per-item 'Reply' affordance to 'Edit' (no thread model
exists yet, so reply was misleading) and pre-fill the popover with
the existing note when clicked. The save path goes through
onSavePreviewComment which the daemon implements as an upsert keyed
on (project, conversation, filePath, elementId), so the edit
overwrites the existing row's note without spawning a duplicate.

Also fall back to a snapshot synthesized from the saved comment's
own fields when the corresponding live target is no longer in the
iframe DOM (e.g. free-pin parents that were re-rendered), so the
edit path still works after artifact reloads.

* feat(web): hide already-sent comments from the side panel

After Send to Claude, the daemon flips the comment status from
'open' to 'applying' (and then 'needs_review' / 'resolved' /
'failed' depending on the run). Filter the side panel to status
=== 'open' so sent comments visibly leave the list — the user
gets clear feedback that the send landed and the panel stays
focused on actionable, un-sent items.

* feat(web): drop single-tab bar and conversation count badge

After the Comments tab was removed the chat header still rendered
a one-tab 'tablist' just for the Chat tab, which read as visual
noise without a sibling to switch between. Drop the tabs wrapper
entirely; the chat content stays mounted and the header now hosts
only the conversation-history affordance.

Also drop the numeric badge that overlaid the conversation history
button: counting open conversations next to a generic history icon
was easy to mistake for an unread / notification count. The dropdown
itself remains the canonical place to see and switch between past
conversations.

* feat(web): right-align chat header actions after tab bar removal

With the tabs wrapper gone, chat-header-actions sat flush left
because nothing was pushing it across the header. Add margin-left:
auto so the history / new-conversation / collapse buttons land at
the right edge, matching the design files / index.html tab row's
own right-aligned controls.

* feat(web): rename board-mode toggle to Comment with comment icon

The artifact preview toolbar's board-mode entry was labeled 'Tweaks'
with the tweaks icon, which collided with the palette Tweaks button
next to it and hid the comment capability behind a generic label.
Rename to 'Comment' with the comment icon and switch to the
viewer-action class so the button matches the surrounding toolbar
items (Edit/Draw) and the coral active state lands on the right
surface.

* fix(web): pass designTemplates to ProjectView in api-empty-response test

The test props for ProjectView were missing the designTemplates
prop that was added to Props in #955 (generic skills split). CI's
strict typecheck (tsc -b --noEmit) caught it; local runs that hit
project references differently did not. Pass an empty SkillSummary
array — matches the empty skills fixture for the same reason.
2026-05-12 15:05:08 +08:00
Eli
928079daf5
feat(web): consolidate Image/Video/Audio entries into a Media tab (#1167)
Reduces the New Project panel's top-level tab count by collapsing the
three media surfaces into a single Media tab with an inner segmented
control, and polishes the controls inside that tab so they stop
dominating the panel:

- Media tab + segmented (Image / Video / Audio) inside the panel body.
  Underlying ProjectKind branches and submission contract unchanged —
  the daemon still receives kind=image/video/audio.
- Model picker rewritten as a combobox: one trigger row + searchable,
  provider-grouped popover with Recommended badges. Replaces the flat
  grid of provider-grouped cards that scrolled past the fold once the
  fourth provider landed.
- Aspect picker compressed from a 5-card grid to a single row of
  segmented pills with mini ratio glyphs.
- Image surface no longer carries a free-form Style notes field; it
  was redundant with the prompt template + main prompt input.
- Live artifact tab locks fidelity to high-fidelity (the wireframe
  option is now hidden) — a wireframe live artifact doesn't make
  sense and the picker added noise.

i18n: adds tabMedia / titleMedia / model* keys across all 18 locales,
removes imageStyleLabel / imageStylePlaceholder. Tests + e2e selectors
updated to drive the new Media tab + segmented surface flow.
2026-05-12 14:52:03 +08:00
Eli
1b307bf17f
feat(web): tweaks palette popover with HSL hue-shift recoloring (#1292)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal
2026-05-12 14:38:00 +08:00
nettee
03da01a56f
ci: use open-design bot for contributors wall refresh (#1349) 2026-05-12 14:35:28 +08:00
Nicholas-Xiong
c0b679ecbc
fix: restore custom dropdown chevron for timezone selector in dark mode (#1368)
Fixes #1359

The timezone selector in the Routines form was showing repeated dropdown
icons and poor text readability in dark mode because:

1.  set  to remove the native
   chevron, but didn't restore a custom one via background-image
2. Missing  caused text to overlap with any chevron
3. No dark-mode-specific chevron color was defined

This commit adds the custom dropdown chevron styling (matching the global
select behavior) with proper padding and dark-mode color variants, ensuring:
- Single, correctly-positioned chevron icon
- Sufficient padding to prevent text overlap
- Proper contrast in both light and dark themes
- Consistent visual behavior with other form controls
2026-05-12 14:29:01 +08:00
Sid
fb47d0ae51
style(web): polish EntryView UI — sidebar layout, folder tabs, slim form, blue selected token (#1360)
* chore(web): upgrade radius scale + introduce blue --selected token

UI polish pass — design tokens for follow-up commits.

Radius scale was visually too square at the small end. Bump up so
buttons / inputs / cards feel rounded rather than boxy:

- `--radius-sm: 6px → 8px`  (buttons, inputs, small chips)
- `--radius: 10px → 12px`  (medium containers, Recent filter pill)
- `--radius-lg: 14px → 16px`  (project cards)
- `--radius-pill: 999px` unchanged (status chips)

Introduce a separate "selected" colour so selection indicators
(card borders, focus rings) read as blue instead of fighting with
the orange brand accent that drives primary CTAs:

- `--selected: #2563eb`  (Tailwind blue-600)
- `--selected-soft: rgba(37, 99, 235, 0.16)`  (soft tint for shadows)

No selectors are migrated to `--selected` in this commit — that
happens in a later "selected state" commit so the diff stays scoped.

* refactor(web): replace entry global header with sidebar brand + reorder bottom chips

Pre-existing layout: a global \`AppChromeHeader\` strip sat across the
whole top of EntryView (logo + settings gear), then a 2-column body
below it. Visual mass concentrated in a thin horizontal bar that did
not relate to the page's column structure, and the settings gear
duplicated the bottom Local-CLI chip.

New layout matches the two-column "brand-in-sidebar + tabs-in-main"
pattern: the brand block lives at the top of \`.entry-side\` (left
column), the right tabs live at the top of \`.entry-main\`, and the
vertical divider between them is the only horizontal seam.

EntryView:

- Drop \`<AppChromeHeader actions={avatarMenu} />\` from EntryView's
  render — the home page no longer renders the global chrome strip.
  (ProjectView still uses AppChromeHeader for back-nav / file
  actions, so the component itself stays in the codebase.)
- Add a sidebar brand block inside \`.entry-side\` using the
  already-defined \`.entry-brand\` / \`.entry-brand-mark\` /
  \`.entry-brand-title\` classes that were sitting dead in
  index.css.
- Reorder \`.entry-side-foot\` chips so that the env-critical
  Local CLI row sits on top of the row, with the secondary
  toggles (language picker, pet adoption, X follow icon) compact
  on a second row. The Follow @nexudotio chip drops its text
  label and becomes icon-only — pure marketing content, so it no
  longer earns a full-width pill.
- Settings access moves entirely to the Local CLI chip's existing
  click handler; the top-right gear is gone (it was a duplicate).

CSS:

- \`.entry-shell\` grid: \`auto 1fr\` → \`1fr\` (no header row).
- \`.entry-side\` background: \`var(--bg-panel)\` → \`transparent\`,
  so the sidebar shares the page beige and only the New-prototype
  card reads as white. Removes the "everything on the left is on
  one big white sheet" feeling.
- \`.entry-brand\` gets \`padding: 24px 20px 18px\` so the logo +
  title block has breathing room at the top of the sidebar.
- \`.entry-brand-mark\` width/height \`44 → 34\`. The previous
  44px gradient ring was visually heavier than the title text it
  sat next to.
- \`.entry-brand-title\` weight \`600 → 450\`, color
  \`var(--text-strong)\` → \`var(--text)\`. Serif title still
  reads as the page anchor without the chunky "bold black" stamp.
- \`.entry-brand-actions\` added for future right-aligned actions
  (carries no actual content in this commit — kept so re-adding a
  settings/avatar entry point doesn't need new CSS).
- \`.entry-side-foot .foot-pill\` slim pass: padding
  \`4px 10px → 3px 8px\`, font \`11.5px → 10.5px\`, gap \`6 → 5\`,
  plus \`justify-content: center\` and \`min-height: 24px\` so the
  icon-only Follow pill stays the same height as the text pills
  next to it.

* style(web): align right tabs row with brand row + strip hover/focus noise

Right column's tabs row ("Designs / Templates / Design systems /
Image templates / Video templates") needed three things:

1. Vertical center of tab text aligned with the brand logo on the
   left (both rows feel like one row, separated by the vertical
   divider only).
2. Active tab's underline sitting flush on the horizontal divider
   below the tabs (not floating mid-row).
3. No hover background, no focus outline, no transition — tabs are
   a navigation strip, not action buttons.

Changes:

- `.entry-header` padding `0 28px` → `24px 28px 0`, drop the
  `min-height: 52px`. Padding-top mirrors the brand block's
  padding-top (24px) so left logo top and right tabs top land on
  the same Y. Header height now content-driven; underline meets
  the `border-bottom` divider naturally.
- `.entry-tabs` gets `align-self: stretch` + `align-items: center`
  + `gap: 2px → 24px`. The stretch lets the tabs container fill
  header height; the bigger gap matches Claude Design's tab
  rhythm.
- `.entry-tab` becomes a "plain underline tab":
  - `border-radius: 6px 6px 0 0 → 0` (no folder-tab look — that's
    on the left tabs).
  - `padding: 14px 11px → 6px 4px 8px` so text + underline form a
    tight group, with the underline sitting at the bottom of the
    tab box right above the header divider.
  - `font-size: 14px → 12px` matches the left newproj tabs (set
    in commit 4) — both columns share the same tab type-size.
  - `transition: none` removes the inherited 120ms background /
    border / color transition.
- Hover / focus / active states explicitly zero out background,
  border-color, outline. Hover keeps a subtle color change
  (`text-muted → text`) so the tab still feels interactive
  without flashing a chip behind it.
- Active state colors are duplicated across `.active`,
  `.active:hover`, `.active:focus`, `.active:focus-visible` so the
  black underline never gets overwritten by the inactive-state
  rules above.

* style(web): folder-tab merge on left newproj tabs + flat card top corners

The left "Prototype / Live artifact / Slide deck / …" tabs sat as
plain underline tabs above a fully-rounded card. The active tab and
card looked like two stacked rectangles with a gap.

Folder-tab pattern:

- Active tab gets a white background + 12px top corners + a 1px
  border on top / left / right.
- Active tab's bottom border matches the card's background color
  (effectively invisible) — so where the tab sits, the card's top
  border is "broken" and tab + card read as one merged shape.
- Card top corners are square (`border-radius: 0 0 12px 12px`),
  bottom corners stay 12px. With the active tab's square bottom
  edge, the merge line at the tab/card seam is a clean horizontal,
  not a curve mismatch.

Implementation:

- `.newproj-tabs-shell`:
  - `overflow: hidden → visible` so the tab's overlap with the
    card below isn't clipped at the shell's bottom edge.
  - `margin-bottom: -1px` + `z-index: 2` so the shell renders on
    top of the card and the 1px tab/card overlap actually paints.
  - The `.can-left { padding-left: 40 }` / `.can-right` overrides
    used to reserve room for scroll arrows are removed (arrows
    are hidden, no extra padding needed).
- `.newproj-tabs` keeps its horizontal `overflow-x: auto` so the
  8 project-type tabs can still scroll inside the sidebar width.
- `.newproj-tabs-arrow` becomes `display: none`. The two
  chevron-circle buttons added clutter without much benefit —
  users with touchpads / wheels / keyboard already scroll the
  tabs row natively, and the `::before` / `::after` linear-
  gradient fades (now using `--bg` instead of `--bg-panel` so
  they fade into the page beige, not the sidebar panel that no
  longer exists) signal there are more tabs to the right.
- `.newproj-tab`:
  - Replace the plain bottom-underline (`border-bottom: 2px
    solid transparent`) with a full transparent 1px border so
    the active state can flip just the colors without changing
    layout.
  - `border-radius: 0 → 12px 12px 0 0`.
  - `position: relative` for z-index stacking.
  - `padding: 10px 6px → 7px 14px` (less vertical, more
    horizontal — tabs read as "labels" rather than chunky
    buttons).
  - Symmetric top/bottom padding (`7px`) so the text + folder-
    tab top corners stack cleanly.
  - `transition: none` — no animation between active/inactive
    states (tabs are nav, not action buttons).
- All hover / focus / focus-visible / active states zeroed out
  background and border-color so the inherited `button { … }`
  base style (which adds bg-subtle on hover) does not bleed in.
  Subtle color change on hover (`text-muted → text`) is the
  only affordance.
- `.newproj-tab.active` (+ active hover/focus combos so the
  base rules don't override): white bg, full var(--border) on
  three sides, bottom border = var(--bg-panel) (invisible
  against card), z-index 3 (above non-active tabs and shell
  pseudo-elements).
- `.newproj-body`:
  - `margin: 0 24px` so the card breathes inside the sidebar
    (and the active tab's left edge aligns with the card's left
    edge).
  - `padding: 18px 24px 28px → 16px 18px 18px` — tighter.
  - `border-radius: (full 12) → 0 0 12px 12px` for the
    flat-top merge with the active tab.
  - Adds explicit `border` + `background: var(--bg-panel)` +
    `box-shadow: var(--shadow-xs)` so the form reads as a card
    floating on the transparent sidebar.
  - `flex: 1 → 0 0 auto` (and `min-height: 0` / `overflow-y:
    auto` removed) — the card is content-sized, not stretched
    to fill the sidebar. Empty space below the card is now
    page beige, not a giant white sheet.
  - `gap: 14px → 12px` between form sections.

* style(web): slim NewProjectPanel form (title, fidelity, buttons, ds-picker)

The form inside the new white card felt overweight against the
compacted layout from the previous commits — fidelity cards were
~133px tall, the Create button + Open-folder secondary button both
had ~11px symmetric padding, the design-system trigger had a 32px
avatar in a 55px-tall row. Slim every element so the card reads as a
focused form, not a stack of beefy buttons.

Title:
- \`.newproj-title\` font \`14px / 600 → 13px / 550\`. Still
  visibly the section heading but no longer competing with the
  serif brand title above.

Fidelity:
- \`.fidelity-thumb { aspect-ratio: 12/7 → 16/7 }\`. The previous
  aspect made cards taller than they needed to be in the narrow
  sidebar column.
- \`.fidelity-card { gap: 8 → 6, padding: 10/10/12 → 8/8/10 }\`.
  Combined with the thumb aspect change, card height drops from
  ~133px → ~102px (visually close to the Claude Design reference
  while keeping the same content).

Primary / secondary buttons:
- \`.newproj-create\` padding \`11px (symmetric) → 8px 11px\`,
  margin-top \`4 → 2\` — primary CTA no longer towers over the
  fidelity cards above it.
- \`.newproj-import\` padding \`10px → 6px 10px\` — the secondary
  "Import Claude Design ZIP" button feels like an alt option, not
  a peer of Create.

Design system trigger:
- \`.ds-picker-trigger\` gap \`10 → 8\`, padding \`8/10 → 6/10\`.
- \`.ds-picker-title\` font \`13 → 12.5\` so name + subtitle stay
  legible in the slimmer row without overflowing the column.
- \`.ds-avatar\` width/height \`32 → 26\`, border-radius \`6 → 5\`.
  The thumbnail was the dominant element in the row; shrinking it
  pulls the row height from ~55px → ~50px.

Footer disclaimer:
- \`.newproj-footer\` padding-top \`0 → 12px\`. The "Only you can
  see your project by default." line was butting against the card
  bottom; 12px of air separates the disclaimer (page-bg context)
  from the card (panel-bg context) cleanly.

* style(web): blue selected indicators + Recent filter rounded + neutral input focus

Three small "selection state" tweaks driven by the new
\`--selected\` token introduced earlier in this branch:

1. Fidelity card selected border is now blue, not the brand
   accent. The orange Create button + the orange selected card
   border were fighting for the same visual role (primary
   action vs primary selection). Blue clearly says "this is
   the one that is selected" without competing with the CTA.

   - \`.fidelity-card.active\` border-color
     \`var(--accent) → var(--selected)\`.
   - Box-shadow ring + soft 0.04 drop swapped from the orange
     \`180/90/59\` rgba tuple to the blue \`37/99/235\` tuple.
   - \`.fidelity-card.active .fidelity-thumb\` border swapped
     from \`var(--accent-soft) → var(--selected-soft)\`.

2. Recent / Your designs filter is no longer a fully-rounded
   pill. The bottom-left settings chips deserve to be the only
   "999px pill" shape — those are tertiary status indicators.
   The Recent/Your designs toggle is a higher-importance
   inline filter, so it gets the medium radius instead.

   - \`.subtab-pill\` wrapper border-radius
     \`var(--radius-pill) → var(--radius)\` (12px).
   - Inner button border-radius
     \`var(--radius-pill) → var(--radius-sm)\` (8px).
   - Active state background \`var(--text) → var(--bg-panel)\`,
     color \`var(--bg) → var(--text)\`. The "black filled pill"
     read as a status badge; white-on-faint-gray reads as
     "selected toggle" — same shape as Claude Design's Recent
     pill.

3. Input focus is neutralised. The base \`input:focus\` rule
   added an orange border + a 3px orange-soft ring around the
   focused field — way too much visual weight for a quiet form
   ("Project name" → focus made it scream).

   - \`input:focus / textarea:focus / select:focus\` border-color
     \`var(--accent) → var(--border-strong)\` (light grey).
   - Box-shadow ring removed (\`none\`). Focused inputs now only
     darken their border by one step — barely visible but enough
     to confirm focus.

These three changes are grouped because they all migrate selection-
state styling off the brand accent and onto neutral / blue tokens.
The next pass (if any) can sweep the remaining \`var(--accent)\`
selection sites (\`.ds-row.active\`, \`.ds-picker-trigger.open\`,
\`.conv-pill.open\`, …) to use \`--selected\` too, but each of those
lives in a different surface and felt out of scope for the entry
view polish.

* refactor(web): pet rail toggle moves inside pet pill as split button

WHAT
- Convert the pet pill from a single `<button>` to a `<div>` containing
  two buttons separated by a 1px divider:
  * `.pet-pill-main` keeps the existing "Adopt a pet" / "Change pet"
    glyph + label + unadopted dot, still wired to `onAdoptPet`.
  * `.pet-pill-toggle` is a small icon-only button that flips
    `petRailHidden` — eye icon when the rail is hidden ("click to show"),
    eye-off when visible ("click to hide").
- Drop the old avatar-menu popover from EntryView entirely:
  `avatarMenuOpen` state, the outside-click / Escape effect, and the
  cog-popover trigger are all removed. The `Settings` entry of that
  popover was already redundant with the `Local CLI` chip; the
  `Hide/Show pet picker` entry now lives directly on the pet pill.
- CSS in the `.pet-pill` block:
  * `height: 24px` + `padding: 0` so the outer pill matches every other
    chip in the row vertically.
  * `.pet-pill-glyph` reduced from 14px to 12px and constrained to a
    14x14 inline-flex box so the unicorn / paw glyph stops pushing the
    chip taller than 24px.
  * Per-region hover (`.pet-pill-main:hover`, `.pet-pill-toggle:hover`)
    so each side of the split lights up independently, with the divider
    inheriting the accent tint while the chip is in `pet-pill-fresh`.

WHY
- After commit 5fe5721c removed the global `<AppChromeHeader>`, the only
  entrypoint to "Show pet picker" was the avatar-menu popover. Putting
  the avatar cog back next to the brand mark felt wrong: it elevates
  Settings (already on the `Local CLI` chip) to a primary affordance and
  sits next to the logo, where it doesn't belong by hierarchy.
- The pet-rail toggle is fundamentally a pet-area control — it belongs
  with the pet adoption chip, not in a popover. Putting both on the same
  chip via a split button gives the rail toggle a stable, discoverable
  home and keeps `.entry-brand` a brand-only row.

SCOPE
- `apps/web/src/components/EntryView.tsx` + `apps/web/src/index.css`.
  No new state, no new i18n keys (reuses `pet.railShow` / `pet.railHide`).
- The orphan i18n keys `entry.openSettingsTitle` and
  `entry.openSettingsAria` are no longer referenced by EntryView but are
  left in place — they're shared types that other locale files still
  declare; a focused cleanup belongs in a separate commit.

* test(e2e): update entry chrome + project mgmt assertions for new layout

WHAT
- entry-chrome-flows.test.ts:
  - Rename `entry chrome settings menu toggles pet rail visibility` →
    `pet pill toggle hides and shows the pet rail`. The flow no longer
    goes through an `Open settings` cog + `.avatar-popover` chain;
    instead it clicks the in-pill `.pet-pill-toggle` directly and
    verifies its `aria-label` flips between `Hide` / `Show pet picker`.
  - Replace `.app-chrome-header` / `.app-chrome-brand` assertions with
    `.entry-brand` + `.entry-brand-title` text checks. The global
    chrome strip no longer exists on EntryView.
  - The compact-width overflow guard now measures `.entry-brand` rather
    than `.app-chrome-header`, since the brand row replaced the chrome
    strip as the only top-of-page horizontal stack.

- project-management-flows.test.ts:
  - Drop the `Scroll project types right` arrow click. The
    `.newproj-tabs-arrow` buttons are hidden (the folder-tab pattern
    leans on shadow gradients on `.newproj-tabs-shell::before/::after`
    instead). Playwright's `locator.click()` auto-calls
    `scrollIntoViewIfNeeded()`, so clicking `new-project-tab-image`
    after a tab-switch still reaches the off-screen tab.

WHY
- These selectors / interactions are tied to UI affordances the earlier
  commits in this branch deliberately replaced. The behaviors they pin
  (pet rail toggle reachability, no horizontal overflow at 820px,
  draft preservation across tab switches) are still asserted — only
  the selectors needed to follow the new structure.

VERIFICATION
- `pnpm exec playwright test ui/entry-chrome-flows.test.ts ui/entry-configuration-flows.test.ts ui/project-management-flows.test.ts`
  → 17/17 passed (chromium project, single worker, fresh daemon).

* fix(web): restore .newproj-body as scroll container (P1 regression)

WHAT
Reintroduce `flex: 1 1 auto; min-height: 0; overflow-y: auto;` on
`.newproj-body`, alongside the `display: flex` + `padding` that
commit ba44e396 kept. The parent `.newproj` is still `overflow: hidden`,
so without these three lines the card can clip its own content with
no scroll recovery.

WHY
Reported by @lefarcen (P1) and @Siri-Ray in review on #1360. Before
this commit the slim-form pass made the body shrink-wrap (`flex: 0 0 auto`)
to keep the empty-state caption snug against the card edge. That works
when the form is short, but the card can grow well past the available
sidebar height in real scenarios:
- Compact-height windows (≤ 720 vertical px).
- Image / media tabs that add aspect + model rows.
- Validation / error text after a failed Create.
- Design-system popover opened with many systems.

In all four cases the Create / Import / Open-folder stack — or the
picker's bottom options — were sliding below the visible sidebar with
no scroll bar to recover them. This is a regression against the
behavior that landed in #1038, which made `.newproj-body` the scroll
container precisely to keep the form bounded.

SCOPE
- `apps/web/src/index.css` only, one ruleset.
- Visual cost: the empty-state caption (`.newproj-footer`) now sits at
  the bottom of the available sidebar height instead of hugging the
  card, which is the same behavior pre-#1167 / pre-this branch.
- A short comment in CSS now flags the invariant so a future refactor
  doesn't quietly flip the flex semantics again.

* fix(web): restore :focus-visible ring on entry-tab + newproj-tab (a11y)

WHAT
Split the prior `:focus, :focus-visible, :active` group on both tab
selectors so that `:focus-visible` no longer inherits the zero-out
that was added to scrub the orange mouse-focus halo:

- `.newproj-tab:focus-visible` → 2px inset blue ring (`--selected`)
  hugging the folder-tab's 8px top-corner radius, plus `--text`
  foreground so the label reads at full contrast while focused.
- `.entry-tab:focus-visible` → 2px solid outline in `--selected` with
  `outline-offset: 2px` and `border-radius: 4px`. Outline is used
  here instead of inset shadow because the tab has no padding to
  spare against the active 1px bottom border, and outline doesn't
  participate in layout.

Mouse-driven `:focus` and `:active` keep the prior transparent
treatment — there is no orange ring on click, which is the polish
the rest of this branch is going for.

WHY
Flagged by @Siri-Ray (changed-range) and @lefarcen (P2) on #1360:
the polish-tab commits stripped the focus indicator entirely instead
of just suppressing mouse focus, so keyboard users had no way to see
which tab was active during arrow-key navigation. Re-introducing
`:focus-visible` only restores keyboard reachability while keeping
the visual quiet for pointer users.

SCOPE
- `apps/web/src/index.css` only. Two rulesets touched, one new
  `:focus-visible` rule added per selector.
- No JS, no aria, no test churn — the rules trigger off the existing
  `:focus-visible` pseudo-class, which the same Playwright tests
  already exercise via Tab.

* fix(web): scope quieted input focus to .entry-side, restore global ring (a11y)

WHAT
Split the global input focus rule into two layers:

- `input:focus, textarea:focus, select:focus` now keeps a visible
  focus indicator on every input across the app — but in the new
  `--selected` blue (border + 3px `--selected-soft` ring) instead of
  the original `--accent` orange. This preserves accessibility for
  every settings page, dialog, project workspace, and right-column
  control that was previously losing its focus halo.

- `.entry-side input:focus` keeps the neutral treatment from this
  branch — `border-color: var(--border-strong)`, no ring. The orange
  "Create" CTA on the entry sidebar is already the loudest element in
  that panel, so a competing blue ring on the title / path inputs
  next to it pulled the eye in the wrong direction. Scoping the
  quieter focus to the sidebar keeps that intent without leaking out
  to the rest of the app.

WHY
Flagged by @lefarcen as a P2 a11y regression on #1360: the previous
version of this rule scrubbed the focus indicator (`box-shadow: none`,
border only one shade darker) for every input in the app, not just on
the entry surface this branch is targeting. Keyboard users on
settings forms and dialogs were left without a visible focus state.

SCOPE
- `apps/web/src/index.css` only, one global rule restored and one
  scoped override added. No JS, no template change.
- Color shift global focus orange → blue is intentional: it consumes
  the new `--selected` token introduced in commit 13dc8a65 and
  matches the active-state direction this PR is establishing.

* chore(web): drop dead AppChromeHeader / isMacPlatform imports + document --selected token

WHAT
- Remove the `AppChromeHeader` import from `EntryView.tsx`. The
  component itself is still used (and re-exported) by ProjectView;
  EntryView dropped its render site in commit 5fe5721c and the import
  has been a stale reference ever since.
- Remove the `isMacPlatform` import too. It was only used by the old
  avatar-menu popover (for the `⌘,` / `Ctrl+,` Settings hint) which
  was deleted along with the popover when the pet-pill split button
  replaced it.
- Add a docblock above the `--selected` / `--selected-soft` token pair
  in `index.css` so the cascade has a local explanation for why this
  blue is separate from the brand `--accent`. The note calls out which
  affordances should reach for `--selected` (active option, focused
  input ring, active filter pill) and pins the 16% soft fill role.

WHY
Both flagged by @lefarcen on #1360:
- P3 — dead import: the TS config doesn't fail on unused imports, so
  this was silently shipping as dead code and obscuring the deliberate
  removal of the global chrome header.
- P3 — token doc: the `--accent` vs `--selected` split was only
  explained in the PR body. Putting the rationale next to the token
  makes the contract durable beyond this discussion.

SCOPE
- `apps/web/src/components/EntryView.tsx`: two `import` lines removed.
- `apps/web/src/index.css`: one comment block added directly above the
  token declaration.
- Verified: `pnpm --filter @open-design/web typecheck` → exit 0.
2026-05-12 14:26:39 +08:00
huyhoangnhh98
140a4e1ff6
Improve responsive preview and design handoff outputs (#1224)
* feat: improve responsive design handoff

* feat: refine cross-platform design outputs

Changelog:\n- Add auto-fit responsive preview behavior for tablet/mobile frames.\n- Add landing page and OS widgets metadata options with project header chips.\n- Strengthen prompt contracts for modern breakpoints, app-specific modules, CJX-ready UX, and final product surfaces.\n- Require cross-platform outputs to use separate platform files instead of tabbed demo selectors.\n- Add DESIGN-MANIFEST.json plus richer handoff guidance to daemon/client exports.\n- Update archive/export tests for manifest and responsive viewport matrix.

* feat: enforce screen-file design outputs

Changelog:\n- Enforce screen-file-first generation for landing pages, app screens, platform surfaces, and OS widgets.\n- Update design handoff and manifest exports so coding tools map each screen file to separate routes/surfaces.\n- Strengthen minimal-brief visual guidance to avoid monochrome or unstyled design outputs.

* fix: address responsive handoff review feedback

* fix: address handoff review blockers

* fix: preserve proxy auth and normalized export entry

* fix: narrow frame wrapper filter to directory paths only

* fix: make artifact save failure banner generic

---------

Co-authored-by: Huy Hoàng <macos@MacBook-Pro-Hoang.local>
2026-05-12 14:18:33 +08:00
PerishFire
93865f71e7
fix(daemon): remove opencode stdin dash sentinel (#1365) 2026-05-12 14:15:46 +08:00
Krishna shakula
1ce7d6e8c5
fix: use ACP config options for model selection (#1208) 2026-05-12 14:07:20 +08:00
ashleyashli
a4649dacb3
fix: check contributor tiers on review and comment events (#1248)
* fix: check contributor tiers on review and comment events

Expand the contributor card workflow to run tier checks for PR reviews, issue comments, PR review comments, and discussion activity. The bot now understands pull_request_target directly, so remove the event-name shim.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: drop fork-unsafe triggers (review, issue_comment, review_comment)

Per @mrcfps and @chatgpt-codex-connector review: GitHub withholds
repository secrets on pull_request_review, pull_request_review_comment,
and issue_comment events when they originate on forked PRs, so wiring
those events here would fail-closed exactly for external contributors.

Keep the fork-safe triggers (pull_request_target.closed, issues.opened,
discussion.*, discussion_comment.*) and document why the three are
excluded. They can be re-added later via a workflow_run handoff.

---------

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: qiongyu1999 <qiongyu1999@gmail.com>
2026-05-12 14:06:20 +08:00
lefarcen
43f7fc536a
Add Langfuse telemetry relay (#1296)
* Add Langfuse telemetry relay

* Configure telemetry worker custom domain

* Add telemetry relay health check

* Harden telemetry relay config
2026-05-12 13:59:19 +08:00
nettee
71b4a331ab
spec(web): Token first tailwind (#1201) 2026-05-12 11:42:17 +08:00
Nagendhra Madishetti
dbc94b83ed
feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288) (#1308)
* feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288)

Adds a lightweight feedback widget that surfaces under each
assistant turn whose run succeeded. Users can submit positive or
negative feedback in one click; the negative path opens an optional
free-text comment area. The widget never blocks the message
composer and only mounts after the run has produced its final
artifact, matching the acceptance criteria.

What ships

- `<MessageFeedback>` (apps/web/src/components/MessageFeedback.tsx)
  renders the three states: idle (prompt + thumbs), submitted
  positive (confirmation + Change), submitted negative
  (confirmation + optional comment textarea + Send + Change).
- AssistantMessage.tsx slots the widget under AssistantFooter,
  gated on `runSucceeded && !hasEmptyResponse`, so failed and
  empty-response turns don't ask the user to rate something that
  never finished.
- The full record shape leaves room for the future analytics
  metadata the issue calls out (rating, comment, submittedAt;
  artifactRef / runId derivable from the surrounding message
  whenever the analytics pipeline lands).

Persistence (v1 = localStorage)

Lefarcen's clarifying comment on the issue asked whether v1 should
be daemon-persisted or in-memory while the analytics pipeline is
defined. The daemon's messages table is column-strict, so daemon
persistence would require a SQLite migration plus a contract bump
on `ChatMessage`; locking that shape in before the analytics
pipeline is designed risks reworking it twice. localStorage is the
middle ground: feedback survives reload (so the "feedback state is
visually clear after submission" criterion holds across tabs and
sessions) without committing the wire shape. The hook surface is
just `(value, setter)`, so a future PR can swap the storage layer
for a daemon mirror or an analytics shipper without touching the
React surface.

The store handles corrupted JSON, unknown future rating values,
disabled storage (private-mode browsers), and broadcasts changes
across listeners in the same tab via a CustomEvent so two mounts
of the hook for the same messageId stay in sync.

i18n

11 new keys under `feedback.*` (prompt, thumbsUp/Down, two
confirmation chips, comment label/placeholder/submit/saved, change).
English source values authored alongside the keys; zh-CN
translations added in the same pass so the locale alignment test
stays green and Chinese users see Chinese strings from day one.
The other 16 locales pick up English fallbacks via their existing
`...en` spread.

Test coverage

- `tests/state/message-feedback.test.ts` (8 jsdom cases) — round-trip,
  null-clear, corrupted JSON, missing rating, unknown rating, key
  collision across messages.
- `tests/components/MessageFeedback.test.tsx` (7 jsdom cases) — idle
  state, positive submit, negative submit, comment save, blank-comment
  Send disabled, Change unsticks the rating, rehydration from
  pre-populated storage.

The locale alignment test continues to enforce that every locale
declares the new keys (5/5 across 18 locales).

Validated

- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- tests/i18n/locales.test.ts 5/5
- tests/state/message-feedback.test.ts 8/8
- tests/components/MessageFeedback.test.tsx 7/7
- Full web suite: 98 files, 903 tests

* fix(web): tighten feedback widget gate + storage sync + textarea, add styles (PR #1308 review)

Addresses every P2/P3 from the codex + Siri-Ray + lefarcen reviews
on PR #1308, plus a couple of polish items the review surfaced
indirectly.

Visibility gate (lefarcen P2)

The gate was `runSucceeded && !hasEmptyResponse`, which also matched
text-only acknowledgements and question-form replies. The issue
scopes feedback to turns that produced a final artifact, so the
gate now also requires `produced.length > 0`. New AssistantMessage
suite (5 jsdom cases) pins: artifact -> shown, no-artifact -> hidden,
streaming -> hidden, failed run -> hidden, empty_response -> hidden.

Storage sync (codex P2 + lefarcen P2)

The previous broadcast contract was: write storage, dispatch a bare
CustomEvent, listeners re-read storage. That had two failure modes:

  - setItem throwing (private mode / quota / disabled storage) left
    the listener seeing null and clobbering the in-memory state the
    user just confirmed.
  - The clear path early-returned after removeItem and never
    dispatched, so a second mount of the same messageId stayed in
    the submitted state when the user clicked Change.

New contract: every successful OR failed write dispatches a
CustomEvent whose `detail.value` carries the new feedback record (or
null). Listeners apply the value directly without re-reading. Same-
tab sync survives storage failures and the clear path no longer
early-returns. Cross-tab still re-reads on the platform `storage`
event since that event has no detail. Two new storage tests pin the
new broadcast contract (positive + null) and the failed-setItem path;
two new component tests pin in-session confirmation under setItem
failure and two-mount Submit + Change synchrony.

Textarea draft fix (lefarcen P3)

The textarea used `draftComment || feedback.comment || ''` as its
controlled value, so erasing a saved comment snapped it back. The
draft is now exclusively the source of truth; a ref-backed effect
re-seeds the draft from feedback.comment whenever the rating
transitions (mount, idle -> negative, cross-mount sync). Send is now
enabled when `draftComment !== savedComment`, which lets the user
both edit and clear a saved comment. New component test pins erase+
Send actually removing a previously-saved comment.

Accessibility

The confirmation chip and "Comment saved" tag both gain
`role="status"` + `aria-live="polite"` so screen readers announce the
state transition. The thumb buttons keep their `aria-label`.

CSS (lefarcen P3)

The widget's `.message-feedback*` class set had no rules in
index.css, so it rendered with default browser controls. Added a
~130-line block that mirrors the surrounding chat pill/chip
vocabulary: bg-subtle background, border-pill confirmation chip,
accent-tinted positive state and amber-tinted negative state to
match the assistant-footer's data-unfinished pattern. Comment area
sits below the chip and wraps on narrow widths so the composer
isn't pushed off-screen on small panes.

Validated

- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- tests/state/message-feedback.test.ts 10/10 (was 8, +2 broadcast)
- tests/components/MessageFeedback.test.tsx 10/10 (was 7, +3 sync /
  storage-failure / clear-saved-comment)
- tests/components/AssistantMessage.test.tsx 5/5 (new file)
- tests/i18n/locales.test.ts 5/5
- Full web suite: 866 tests

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 11:10:28 +08:00
github-actions[bot]
c9d3358af4
docs(readme): refresh contributors wall (#1330)
Co-authored-by: mrcfps <23410977+mrcfps@users.noreply.github.com>
2026-05-12 10:49:57 +08:00
Nagendhra Madishetti
1df3eca161
feat(web): Critique Theater Phase 7 — reducer + useCritiqueStream + useCritiqueReplay (#1307)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 10:45:07 +08:00
Nagendhra Madishetti
64510b790b
fix(web): translate Design Files refresh strings instead of hardcoding English (#1254) (#1300)
* fix(web): translate Design Files / live artifact refresh strings instead of hardcoding English

When the app language was set to Chinese, the Design Files refresh
flow showed Chinese for the surrounding chrome but kept English for
every label and message originating in describeRefreshStatus,
describeEventPhase, and the refresh-event timeline body of
LiveArtifactRefreshHistoryPanel. Same-screen mixed-language UX, the
exact symptom reported in #1254.

Root cause: those three sites bypassed i18n entirely. describeRefreshStatus
returned hardcoded English label + description strings for the
running / succeeded / failed / idle / never statuses;
describeEventPhase returned hardcoded Started / Succeeded / Failed
labels; the timeline body inlined "Refresh started…",
"<n> source(s) updated", and "Refresh failed." string literals; and
the empty-timeline copy ("No refresh activity yet in this session.
Trigger Refresh to record a timeline…") was hardcoded too.

Fix: thread the existing TranslateFn through both helpers, swap every
hardcoded string for a t() lookup, and pull the empty-timeline copy
and the failure-fallback through the same path. Added 13 new keys
under liveArtifact.refresh.* — statusRunning, the five
*Description keys, three event-phase labels (eventStarted/Succeeded/Failed),
eventStartedDetail, sourcesUpdatedOne/Many with an {n} placeholder,
and timelineEmpty. Status labels for succeeded / failed / ready / never
already had keys (statusSucceeded / statusFailed / statusReady /
statusNever) so those are reused unchanged.

Locales: full Chinese translations added to zh-CN.ts (the locale
directly named in the issue). The other 16 locales pick up English
fallbacks through their existing ...en spread, so the locale-key
alignment test stays green; native translations for those locales
can land via the usual locale-team passes without re-touching the
source code.

* fix(web): cover the rest of the refresh panel under i18n + add a zh-CN render test

Lefarcen's review on #1254 / PR #1300 surfaced that the first pass
only translated three helpers (describeRefreshStatus,
describeEventPhase, session timeline body) and left the rest of the
panel in English. Under a Chinese UI the panel still mixed
languages, which was exactly the regression the issue was filed for.

This commit threads t() through every user-visible refresh-panel
string the user would see in the Chinese flow:

- Hero block: "Last refreshed" label + "Never" empty state.
- Created / Last updated facts + their "Unknown" empty label.
- Persisted refresh history header, hint, empty-state copy.
- Persisted timeline status badge: succeeded / running / failed /
  cancelled / skipped now resolve through describePersistedStatus,
  which uses an exhaustive switch off LiveArtifactRefreshLogEntry's
  status union so a future contract addition trips tsc.
- Session activity header, hint.
- Document source header, hint, Type / Tool / Connector field
  labels.
- Advanced debug metadata summary + note line.
- "just now" relative-time fallback in the persisted timeline.

22 new i18n keys total (23 with the new
heroLastRefreshedNever distinct from statusNever); zh-CN strings
authored alongside the English source, every other locale picks
them up via its existing ...en spread and the locale-key alignment
test stays green.

Intentionally untranslated surfaces: raw daemon payloads inside
the <details> debug panel (event.step / refreshId / error.message
and the JSON.stringify dump), since those are agent / connector
identifiers and stack-trace style strings, not localised copy.
The debug summary heading itself is translated; if the
debug section should be hidden in localised primary flows, that
is a separate UX call worth its own issue.

Test coverage: new render test wraps LiveArtifactRefreshHistoryPanel
in I18nProvider initial="zh-CN" and pins the Chinese rendering of
every translated label, plus negative assertions that the formerly
hardcoded English literals are NOT present in the markup. With the
no-provider fallback returning English, the existing static-markup
tests can't observe the regression this PR is meant to fix; the
zh-CN render test is the only one that would have caught the
original gap and will catch the next one.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
locales.test.ts (5/5), FileViewer.test.tsx (69/69, +1 new zh-CN
test), full web suite (92 files, 841 tests).

* fix(web): route formatRelativeTime through Intl.RelativeTimeFormat so units localise

Lefarcen's second pass on PR #1300 caught the remaining hardcoded
English path: formatRelativeTime() still emitted units like `5s ago`
and `45m ago`, so Chinese users would see those strings inside the
otherwise-translated refresh panel. The function now takes the
active locale + TranslateFn and routes through
Intl.RelativeTimeFormat with style: 'narrow', numeric: 'always'.
That preserves the historical `5s ago` shape for English while
producing locale-correct output for every other locale (zh-CN gets
`5秒前` / `45分前`, with the right past / future suffix and word
order).

The `just now` carve-out (abs < 5s) keeps using
t('liveArtifact.refresh.justNow') since Intl's narrow output for
zero-delta reads awkwardly. A try/catch around the RTF constructor
falls back to 'en' if the runtime rejects the locale, so the
function is safe on engines with limited ICU data.

Callsites threaded through:
- LiveArtifactRefreshHistoryPanel hero metric (`lastRefreshedAt`)
- Session timeline event row (`event.startedAt`)
- Session timeline event time (`event.at`)
- LiveArtifactRefreshFact for the created / last-updated facts;
  the component now accepts optional `locale` + `t` props and the
  panel passes them in.

Test coverage extension:
- The existing zh-CN render test sets a real lastRefreshedAt
  (now - 45s) and real session-event timestamps, then asserts the
  Chinese past-tense suffix `前` appears AND the legacy English
  `Xs ago` / `Xm ago` shapes do NOT. That was the gap lefarcen
  pointed at: setting `lastRefreshedAt: undefined` couldn't see
  the regression because no relative-time formatting ran.
- Added a small second test for the lastRefreshedAt-undefined
  empty hero so the original `从未` coverage still pins.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
FileViewer.test.tsx (70/70, +1 new test), locales.test.ts (5/5),
full web suite (92 files, 842 tests).

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 10:38:07 +08:00
github-actions[bot]
5fa861137d
Update docs/assets/github-metrics.svg - [Skip GitHub Action] (#1328)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-12 10:32:34 +08:00
PerishFire
819c34fd8f
fix(tools-pr): fall back on reviewDecision for unresolved-changes-requested (#1287)
* fix(tools-pr): fall back on reviewDecision for unresolved-changes-requested

Patrol classify on the live 102-PR queue missed three PRs (#1101, #1127,
#1163) where GitHub's reviewDecision is CHANGES_REQUESTED but the
classify tag did not fire.

Root cause is a divergence between two notions of "latest review state
per reviewer":

- GitHub's reviewDecision keeps a reviewer's CHANGES_REQUESTED in effect
  until that same reviewer submits APPROVED or DISMISSED. A subsequent
  COMMENTED review by the same reviewer does NOT supersede it.
- Our `reduceLatestReviewsByAuthor` collapses every reviewer to their
  latest review with no special-casing of state, so a CHANGES_REQUESTED
  followed by COMMENTED disappears from the reduced view.

`tagUnresolvedChangesRequested` filtered the reduced view for
`state === "CHANGES_REQUESTED"`, so the three PRs above (each had a
reviewer write CHANGES_REQUESTED → COMMENTED) escaped the rule even
though the PR-level reviewDecision was still CHANGES_REQUESTED.

Add a narrow fallback: when the first path returns no per-reviewer
reviewers, trust `facts.reviewDecision === "CHANGES_REQUESTED"` as the
source of truth. The fallback reason and source token differ from the
first path so report consumers can tell which signal fired.

Reducer semantics left alone on purpose — flipping COMMENTED handling
there would cascade to `bot-only-approval`, `stale-approval`, and
`humanReviewerSignalAt`, each of which has its own correctness story.

* fix(tools-pr): keep fallback reason strictly factual

Codex flagged that the fallback path's reason text asserted a specific
review sequence ("CHANGES_REQUESTED then COMMENTED") that the condition
alone does not prove. The condition only observes:

- `facts.reviewDecision === "CHANGES_REQUESTED"`, and
- after `reduceLatestReviewsByAuthor`, no review carries
  `state === "CHANGES_REQUESTED"`.

Multiple GitHub configurations satisfy that pair — a reviewer's CR
followed by COMMENTED, a CR that sits outside the `reviews(last: 30)`
fetch window, etc. Per `tools/pr/AGENTS.md`'s strictly-factual rule,
the reason must report only what is directly observed, not the most
likely upstream cause.

Drop the inferred-cause clause from `reason`; move the explanation of
possible upstream causes into the code comment above the branch where
it does not show up in classify output.

* docs(tools-pr): document fallback data source for unresolved-changes-requested

Siri-Ray and lefarcen both flagged that the tag dictionary row for
`unresolved-changes-requested` only describes the primary per-reviewer
path. The fallback added earlier in this PR emits the same tag with a
different `source` token (`gh.reviewDecision` vs the original
`gh.latestReviews[].state`), so report consumers need the dictionary to
list both paths to interpret which one fired.

Update the row to call out both: the primary per-reviewer rule, and the
PR-level reviewDecision fallback that fires when no per-reviewer CR
survives the latest-per-author reduction. The two-token source column
mirrors the actual `Tag.source` strings emitted at runtime.

* test(tools-pr): pin both emission paths of unresolved-changes-requested

lefarcen flagged the fallback was validated only by live-PR examples in
the PR body, so a refactor could silently regress the coverage.

Add a deterministic test file `tests/tags-unresolved-cr.test.ts` that
exercises `classifyPr` against crafted `PrFacts` fixtures:

- primary path (per-reviewer CR after reduction) fires with
  source=gh.latestReviews[].state and surfaces the reviewer login
- fallback path fires with source=gh.reviewDecision when no per-reviewer
  CR survives reduction (covers both the COMMENTED-follow-up shape and
  the empty-reviews shape — the latter pins the `reviews(last: 30)`
  out-of-window concern from the factual-reason fix)
- primary wins over fallback when both signals are present (single tag
  emitted, source=gh.latestReviews[].state)
- two negative cases: empty reviewDecision and APPROVED — neither emits

Also extend the fallback's code comment with the observed scale (3 of 102
open PRs hit this gap: #1101, #1127, #1163) so future maintainers can
tell this is a recurring queue pattern, not a theoretical edge case.

This is the first test under `tools/pr/tests/`; the package test script
already ran `node --import tsx --test tests/*.test.ts` against an empty
glob, so no scaffolding changes are needed.
2026-05-12 09:40:50 +08:00
Caprika
5bd9763181
[codex] Improve Claude Code exit diagnostics (#1267)
* fix daemon claude diagnostics

* fix claude custom endpoint auth diagnostics

* fix project view api empty response test props

* fix claude diagnostic review gaps

* fix silent custom endpoint claude diagnostics

* fix claude diagnostic credential redaction

* fix quoted api key redaction

* fix claude diagnostic tail redaction

* fix silent claude configured profile diagnostics
2026-05-12 00:08:31 +08:00
chaoxiaoche
a75d9938c7
feat(design-systems): add structured tokens.css schema (default + kami) (#1231)
* feat(design-systems): add structured tokens.css schema (default + kami)

Compile each brand's DESIGN.md prose into a machine-readable :root
block agents paste verbatim, removing the "Primary → --accent"
translation step where most token misuse happens. Daemon prompt
injection lands in a follow-up; lint-artifact already enforces the
shared token vocabulary so no rule changes needed.

Schema validated across two contrasting aesthetics:
- default (sans-serif, cobalt, B2B utility) — stress test the
  shallow form, 2-level fg / 2-level surface
- kami (serif, parchment, ink-blue, print-first) — stress test the
  rich form, 4-level fg ramp, 3-level surface, ring elevation, i18n
  font stacks, and solid-hex tag tints (print renderers double-paint
  alpha)

Schema growth from kami's stress test (5 new optional slots, all
backward-compatible — default aliases via var() to existing tokens):
- --fg-2 / --meta (4-level fg ramp)
- --surface-warm (3-level surface)
- --border-soft (2-level border)
- --elev-ring (ring elevation as first-class level)

Brand-specific extensions live in tokens.css with explicit "NOT in
shared schema" labels and a documented promotion path (≥2 brands
need it → promote to schema slot).

components.html in each brand is a self-contained reference fixture
that exercises every token through real layouts. Both fixtures lint
clean against apps/daemon/src/lint-artifact.ts.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): add token-fixture drift guard

Each design system in design-systems/<brand>/ ships two files agents
consume in tandem: tokens.css (canonical token bindings) and
components.html (a self-contained fixture whose first <style> embeds
the same :root paste so the file renders standalone). The fixture's
:root block is a copy of tokens.css's :root block, kept in sync only
by an inline comment.

This adds scripts/check-tokens-fixture-sync.ts and registers it in
pnpm guard. The check pairs each brand's tokens.css with its
components.html and asserts the unscoped :root block is byte-equivalent
after canonical normalization (CSS comments stripped, whitespace
collapsed, separator spacing normalized). Brands missing one half of
the pair, or with no :root rule in either file, fail the guard.

Scoped overrides like :root[lang="zh-CN"] are not required to appear
in the fixture (per the kami fixture's inline comment they are pasted
only when an artifact's <html lang> matches), so the check only
compares the unscoped :root block.

Verified: pnpm guard passes for default + kami, fails on intentional
value drift, fails on missing token, tolerates whitespace-only
formatting differences.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(design-systems): point fixture CTAs to real files

Both default and kami components.html advertised in-page anchors
(#tokens, #spec, #surface, #accent, #type, #components) but defined
no matching ids, so every CTA was a no-op when the fixture was
opened locally — flagged by mrcfps in #1231.

Re-point each link to a real artifact in the same brand directory:

- "View tokens" / "Inspect tokens" / "Inspect typography" → ./tokens.css
- "Read the spec" / "Read the rule" → ./DESIGN.md

Browsers render these as raw source views, which is the desired UX
for a reference fixture: clicking the CTA shows the underlying
contract instead of jumping to nothing. Agents copying the fixture
also learn the pattern of "buttons link to actual sibling resources".

The :root token block is unchanged, so the token-fixture drift guard
still passes for both brands.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): codify token schema (A1/A2/B/C layers)

The two-brand pilot (default + kami) settled the shape of the shared
token schema; this commit codifies it as a machine-readable contract
and enforces it in pnpm guard, addressing lefarcen's review on #1231:

> the optional-vs-required split won't generalize cleanly when brand
> #3 needs different Layer A tokens or when multiple brands converge
> on the same extension (promoting C→B→A). Consider surfacing that
> limitation in the PR narrative or in a future SCHEMA.md.

Schema lives under design-systems/_schema/ as three files:

- tokens.schema.ts   — TypeScript declaration of every shared token
                       with its layer (A1-identity / A1-structure /
                       A2 / B-slot), plus per-brand C-extension
                       allowlists and a global C-prefix allowlist
- defaults.css       — CSS mirror of A2 fallback values, used as the
                       human-readable contract reviewer's-eye copy
                       and the future input to the derive script
- AGENTS.md          — schema layer model, C → B-slot → A2 promotion
                       rules, when-not-to-add-a-token guidance

Layer model:

  A1-identity    8 tokens — bg/surface/fg/muted/border/accent +
                 font-display/font-body. The brand IS these values;
                 no fallback is defensible.

  A1-structure  18 tokens — type scale (8), leading (2), tracking
                 (1), section-y (3), container (4). Structural
                 decisions vary per brand by design and have no
                 cross-brand default.

  A2            26 tokens — accent states, semantic colors, motion,
                 base spacing scale, radius, elevation, focus,
                 font-mono. Required in every tokens.css; fallback
                 lives in defaults.css for the future derive script
                 to inline when DESIGN.md does not specify the value.

  B-slot         4 tokens — fg-2 / meta / surface-warm / border-soft.
                 Brand may bind independently or alias the named
                 sibling via var(...) for components that target the
                 richer ramp.

  C-extension    n tokens — brand-specific names (kami's tag-bg-*,
                 leading-display, accent-light, etc.). Allowlisted
                 per-brand in BRAND_EXTENSIONS or globally by prefix
                 in BRAND_EXTENSION_PREFIXES. Promote when a second
                 brand adopts the same name.

Why A2 fails the guard today:
  Artifacts are generated by agents pasting one brand's :root block
  into a single <style>; there is no global stylesheet that supplies
  fallbacks at runtime. A tokens.css missing an A2 declaration would
  silently break any var() reference in the fixture. Until the
  derive script (PR-B) lands and inlines defaults, every brand's
  tokens.css must declare every A2 token directly. The guard
  enforces this strictly.

Why --font-mono lands in A2 (not A1):
  149 brands' DESIGN.md files were surveyed: 87 (58%) declare a
  monospace stack, 62 (42%) do not — including major brands like
  bmw / nike / apple / notion / mastercard / meta. Agent paste
  cannot rely on the brand author having written it down; a
  defaultable A2 fallback (with CJK brands like kami overriding) is
  safer than forcing every brand author to add a field they may not
  realize their kbd / code-block components need.

Five guard checks, each registered as its own entry in scripts/guard.ts
so failures attribute to a specific contract:

  1. token-fixture sync       — components.html :root ↔ tokens.css :root
                                 byte-equivalent (existing)
  2. A1 required tokens       — every brand declares every A1 token
  3. A2 required tokens       — every brand declares every A2 token
  4. unknown token allowlist  — every declared token is in schema or
                                 brand-extension allowlist
  5. A2 defaults parity       — defaults.css ↔ tokens.schema.ts
                                 fallback byte-equivalent

Verified on default + kami:
  - 26 A1 tokens declared in both brands
  - 26 A2 tokens declared in both brands
  - 129 total declarations, all match shared schema or brand extensions
  - defaults.css ↔ tokens.schema.ts parity holds
  - sanity test: drifting --motion-fast in defaults.css fails check 5
    with a clear divergence message

The PR description originally listed "Dedicated SCHEMA.md" as
explicitly NOT in this PR ("Once 3+ brands ship, extracting a single
source of truth becomes worthwhile"). That boundary moves: lefarcen's
review surfaced the schema-generalization risk, and the schema must
exist as a machine-enforced contract before the derive script can
read it. The TS file replaces the markdown that was deferred.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web/tests): pass missing designTemplates prop to ProjectView

Pre-existing typecheck regression on main: PR #955 (b5eb8c16,
"generic skills + split skills/design-templates + finalize-design
API") added required `designTemplates: SkillSummary[]` to ProjectView
Props but updated only two of the three test fixtures that render
ProjectView directly. The third — ProjectView.api-empty-response.test.tsx
— was missed, so `pnpm typecheck` (and CI on any PR merging into
main) fails on:

  apps/web/tests/components/ProjectView.api-empty-response.test.tsx
    (168,6): error TS2741: Property 'designTemplates' is missing in
    type ...

The other two ProjectView tests already pass `designTemplates={[]}`,
so this aligns this fixture with the existing pattern. Out of scope
for #1231 strictly, but the regression blocks the merged-state
typecheck CI runs that #1231 triggers, and the one-line fix here
restores main's typecheck health for everyone.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): enforce B-slot required tokens in pnpm guard

Closes mrcfps + lefarcen review comment thread on #1231:

> The guard validates A2 required tokens here, but there's no
> sibling check for B-slot aliases (--fg-2, --meta, --surface-warm,
> --border-soft). Per the schema docs, every brand must declare
> A1 + A2 + B-slot names so shared components can safely read
> var(--fg-2) etc. Without a B-slot guard, a brand can omit those
> aliases, pass pnpm guard, and break any artifact that references
> them.

Same artifact-paste constraint as A2: agents render artifacts by
pasting one brand's :root block into a single <style>; there is no
runtime cascade, so a missing B-slot makes any var(--fg-2) reference
resolve to nothing. Until now the schema narrative claimed B-slots
were optional with a var() default, but no machine check enforced
declaration — a contract gap reviewers reasonably refused to merge.

This commit closes the gap in three places so machine and narrative
agree:

1. scripts/check-tokens-fixture-sync.ts
   - Add checkDesignSystemBSlotRequiredTokens, mirroring the A2
     check but using getBSlotNames() from the schema.
   - Failure message names each missing slot AND the schema-suggested
     alias (--fg-2 (default alias: var(--fg))) so a brand author
     fixing the failure has a copy-pasteable resolution.
   - Renumber section comments: 5 checks → 6 checks.

2. scripts/guard.ts
   - Register the new check between A2 required and unknown
     allowlist so failures attribute to a specific contract.

3. design-systems/_schema/AGENTS.md
   - Update the layer table: B-slot row's "If omitted" column
     changes from "resolves via var() to a richer sibling" to
     "guard fails — brand must declare, either as var(--sibling)
     (collapsed) or independent value (richer)".
   - Add a "Why B-slot is required (and what the alias is for)"
     section that distinguishes the schema-suggested alias from a
     runtime fallback, with worked examples for default (alias) and
     kami (independent bind).

Verified on default + kami:
- pnpm guard passes all 6 design-system checks
- 4 B-slot tokens declared in both brands (default aliases via var(),
  kami binds independently — both forms satisfy the contract)
- pnpm typecheck clean across the workspace
- Sanity test: removing --fg-2 + --meta from default/tokens.css fires
  the new guard with a precise per-token alias hint:
    [default] design-systems/default/tokens.css is missing 2 B-slot
    tokens (alias the named sibling via var(...) or bind
    independently):
      --fg-2 (default alias: var(--fg)),
      --meta (default alias: var(--muted))

The schema contract is now machine-enforced end-to-end (A1 + A2 +
B-slot all required-with-fixed-form-of-fallback). The derive script
in PR-B can rely on every brand's tokens.css containing every shared
slot name.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): skip leading-underscore meta-directories under design-systems/

CI for #1231 went red on `Validate workspace` after merging origin/main.
Cause is a clean collision between two recently-landed changes:

- main #1270 (be77dc03 "Default English resource i18n fallback")
  tightened tests/localized-content.test.ts so every directory under
  design-systems/ is run through assertResourceId() with the strict
  RESOURCE_ID_PATTERN /^[a-z0-9][a-z0-9-]*$/.

- this branch #1231 introduced design-systems/_schema/ as the home
  of the shared token contract (tokens.schema.ts, defaults.css,
  AGENTS.md). The leading underscore signals "meta-directory, not
  brand" — the same convention SCSS partials, Jekyll, Hugo all use.

The two changes never met until CI built the merge commit, where
assertResourceId('_schema') deterministically failed:

  Error: Design system directory _schema has malformed resource id: _schema
    at invariant tests/localized-content.test.ts:66:11
    at assertResourceId tests/localized-content.test.ts:71:3
    at readDesignSystemResources tests/localized-content.test.ts:202:8

Fix tightens readDesignSystemResources's directory filter so the
leading-underscore convention is recognised explicitly:

    .filter((entry) => entry.isDirectory() && !entry.name.startsWith('_'))

This aligns with what apps/daemon/src/design-systems.ts:listDesignSystems
already does implicitly — it requires DESIGN.md per directory, so
_schema/ was always invisible at runtime; the test was the only place
that surfaced it.

Verified locally on the post-merge tree:
- pnpm test (e2e vitest) — tests/localized-content.test.ts: 4 passed
- pnpm guard — all 6 design-system checks pass on default + kami
- pnpm typecheck — clean across the workspace (after pnpm install
  to pull deps for tools/pr that arrived with main)

The fix is intentionally narrow (one filter line in one test) and
documents the convention inline so future meta-directories under
design-systems/ (e.g. _archive/, _drafts/) are covered for free.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 22:23:34 +08:00
nettee
87a95b7fb4
Fix conversation run isolation (#1271) 2026-05-11 21:13:54 +08:00
Kaelz31
3524a43d18
fix: pretty-print JSON file previews (#1206)
* fix: pretty-print JSON file previews

* fix: avoid formatting JSON with unsafe numbers

* fix: preserve precision-sensitive JSON previews

* fix: preserve signed zero in JSON previews

* fix: scan JSON numbers without repeated slicing

---------

Co-authored-by: Kael S <YOUR_GITHUB_EMAIL_HERE>
2026-05-11 20:52:55 +08:00
eggward han
a0316d2599
fix(web): suppress autosave indicator for draft-only Connector key edits (#1232)
When the user typed a replacement Composio API key, the global Settings
autosave loop persisted `buildPersistedConfig(cfg)` — which intentionally
strips the in-flight secret — and then advanced the indicator through
'saving' -> 'saved' despite the key never actually being written. The
"All changes saved" status then contradicted the section-local "Save key"
gesture and eroded trust in the saved-state badge for a sensitive field.

The autosave effect now tracks the snapshot at the last successful save
(or the initial cfg on mount) and compares the next snapshot's persisted
shape against it via a new `isAutosaveDraftOnlyChange` helper. When the
only diffs since last save are fields that `buildPersistedConfig` strips
(today the Composio API key, generalizing to any future
save-on-explicit-confirm secret), the persist call is skipped and the
indicator settles to 'idle' instead of flashing 'saved'. The forced
media-provider sync path still runs because that is a real outbound
effect even when the persisted shape hasn't changed.

Refs #1187
2026-05-11 20:52:45 +08:00
donglrd
19f1ff7995
Reject filesystem root folder imports (#1266) 2026-05-11 20:52:35 +08:00
이용진
bbd14bd6fb
replace time-specific Orbit greetings with neutral defaults (#1291)
* replace time-specific Orbit greetings with neutral defaults

Orbit default greeting is hard-coded to a morning-specific phrase and is not suitable as generic copy issue

* fix skill trigger mistake 每日简报 -> 早安简报
2026-05-11 20:52:24 +08:00
PerishFire
c3d41c7d45
fix(tools-pr): chunk stats fetch through cursor-paginated GraphQL (#1285)
`fetchOpenPrs` was reading the stats chunk via
`gh pr list --limit 1000 --json mergeStateStatus,...`. With the default
limit raised to 1000 in #1259, this 502s reliably on the live open
queue (107 PRs): GitHub's GraphQL gateway has to recompute
mergeStateStatus for every PR up front, and the resulting query exceeds
the gateway budget once the requested page passes ~60 PRs.

Switch the stats chunk to `fetchPaginatedPrList`, the same cursor-
paginated GraphQL helper that already drives reviews / comments /
commits / assignment-timelines. Page size stays at PR_LIST_PAGE_SIZE
(30), well within the gateway budget, and the heavy stats fetch is now
consistent with the other heavy chunks.

Verified locally: `pnpm tools-pr list` now completes against the live
107-PR queue without a 502.
2026-05-11 20:51:29 +08:00
nettee
be77dc0394
Default English resource i18n fallback (#1270) 2026-05-11 20:29:05 +08:00
Joey-nexu
12ac2e988e
docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290)
Adds a public set of rules for the External Maintainer role: who qualifies,
how nominations work, what permissions Maintainers gain, what's expected
of them, and how step-down works.

The Core Team's individual roster is intentionally not enumerated. What's
public is the rules everyone plays by.

- New file: MAINTAINERS.md (English authoritative version) + 5 locale variants
  matching the existing CONTRIBUTING.md i18n surface (de, fr, ja-JP, pt-BR,
  zh-CN). Non-EN/non-zh-CN variants are machine-translated drafts marked at
  the top — native-speaker review is welcome via follow-up PRs.
- CONTRIBUTING.md (and its 5 locale variants): adds a short
  Becoming a Maintainer section that points at MAINTAINERS.md, so the
  rules live in one place and translation drift is bounded.

Decisions not in this PR (intentional):
- No internal Core Team roster.
- No internal observability dashboards.
- No nomination PR / public voting flow (Core-Team-consensus-driven for now;
  to be revisited once External Maintainers exceed 5).

Co-authored-by: Joey Li <lijinwei@open-design.ai>
2026-05-11 20:19:55 +08:00
Caprika
fb079d8115
Add reliable agent-browser skill (#1284)
* Add reliable agent browser skill

* Fix ProjectView delete conversation test props
2026-05-11 20:09:12 +08:00
PerishFire
1eb20e3807
fix(web): keep tweaks selection usable without annotations (#1268) 2026-05-11 20:06:49 +08:00
Sebastian Westberg
8962088c75
feat(daemon): guard against agent-emitted stub artifact regressions (#1171)
* feat(daemon): guard against agent-emitted stub artifact regressions

When an agent emits an <artifact> block whose body is a placeholder
("see other-file.html in this project", a bare filename string, a tiny
fallback page) instead of the full document, the daemon writes the
placeholder to disk verbatim. Users see a 25-500 byte HTML file where
their previous version had tens of kilobytes of real markup.

Add a structural regression guard in writeProjectFile: before writing
an html/deck artifact whose manifest carries metadata.identifier, scan
the project dir for prior siblings matching <identifier>(-\d+)?\.html?
and compare sizes. If the new body is below minRetainedRatio (default
0.2) of the largest prior sibling >= minPriorBytes (default 4096),
flag a regression. Three modes via env:

- OD_ARTIFACT_STUB_GUARD=warn (default) writes the file and attaches
  stubGuardWarning to the response so the frontend can surface it.
- OD_ARTIFACT_STUB_GUARD=reject throws ArtifactRegressionError before
  fs.writeFile; the route returns 422 ARTIFACT_REGRESSION with the
  prior sibling's name and size in error.details.
- OD_ARTIFACT_STUB_GUARD=off skips the guard entirely.

Cross-agent by design: anchored on size delta + identifier match,
no agent-specific stub-phrase regex, so works for any agent backend
behind the agent-adapter abstraction.

The body-then-manifest write order pre-dates this change; the reject
path throws before fs.writeFile so rejections never leave a partial
state behind.

24 unit + 8 HTTP tests cover happy paths, all three modes, deck kind,
.htm extension sibling detection, ratio=1 edge case, and verify
rejected writes leave neither the html nor its manifest sidecar on
disk.

* fix(stub-guard): close same-name, nested-dir, and non-slug bypasses

Code review on PR #1171 (lefarcen, Codex, mrcfps) found three holes
where the stub guard could be silently bypassed. All three are now
closed with HTTP test coverage.

Same-name overwrite (lefarcen P1): the writer's prior-sibling scan
deliberately skipped the file at safeName, but for an in-session
overwrite (persistArtifact reuses the same fileName when
savedArtifactRef.current matches) that file is the prior content,
not the new entry. Drop the exclude-by-name filter; the current
on-disk size at scan time is always the prior because the overwrite
happens after this check.

Subdirectory scoping (Codex/mrcfps P2): writeProjectFile creates
parent directories for nested paths like reports/overview.html, but
the guard only scanned the project root. Pass path.dirname(target)
as scanDir so nested artifacts are evaluated against their real
sibling set.

Non-slug identifier (Codex/lefarcen/mrcfps P2): the web's
persistArtifact slugifies the filename basename but stores the raw
identifier in the manifest, so an identifier like "Landing Page"
yields filename landing-page.html with metadata.identifier="Landing
Page". Build the sibling regex from both the raw identifier and a
slugified variant (mirroring the frontend's slugifier) so either
form matches the same priors.

Also surface warn-mode warnings in the web UI: ProjectView now
checks file.stubGuardWarning after writeProjectTextFile and renders
the warning via setError. Reject-mode 422 surfacing requires
restructuring writeProjectTextFile's return contract and is
deferred.

API change inside the daemon: evaluateArtifactStubGuard /
findPriorArtifactSiblings drop excludeSafeName and rename projectDir
to scanDir. Tests updated.

Tests: 4 new HTTP cases (same-name overwrite preserves prior body,
nested subdir rejects, slug-form match rejects, plus the existing
warn/off/deck/.htm cases) and 1 new unit case (slug-form sibling
match). 44 tests pass.

* fix(stub-guard): empty-slug fallback + reject-mode UI surface

Round 3 review on PR #1171 (lefarcen, mrcfps) found two remaining
holes after 9cc82430 closed the same-name / subdir / non-slug bypasses.

Empty-slug fallback bypass (lefarcen P2): an identifier like "测试"
(all-non-ASCII) strips to empty through the web slugifier, and
persistArtifact's `slice(0,60) || 'artifact'` falls back to the
literal "artifact" basename. The guard searched for raw identifier +
slug only, so a later artifact-2.html stub bypassed the prior. Add
EMPTY_SLUG_FALLBACK_NAME = 'artifact' as a sibling-name candidate
when the slug is empty, mirroring the frontend fallback exactly.

Reject-mode UI silence (mrcfps P2 + lefarcen P2): writeProjectTextFile
collapses any non-OK response (including 422 ARTIFACT_REGRESSION) to
null, and persistArtifact previously had no else branch. Users in
reject mode saw the daemon log fire but the UI was silent. Add an
else branch that surfaces a generic banner pointing at the most
likely cause and mentions checking the daemon logs for structured
details. Also clear savedArtifactRef.current on failure so retries
re-enter the persistence path.

Plumbing the structured 422 details through writeProjectTextFile
itself remains out of scope (cross-cutting client contract change
affecting 5+ call sites). The generic banner is the "at minimum"
path mrcfps suggested.

Tests: 1 new unit case (artifact.html sibling discovery for non-ASCII
identifier) + 1 new HTTP case (empty-slug stub regression rejected
end-to-end). 46 tests pass across stub-guard suites (was 44).

* fix(stub-guard): verify sidecar identity to avoid cross-identifier false positives

Round 4 review on PR #1171 (mrcfps inline + lefarcen review) caught
a false-positive introduced by the round-3 empty-slug fallback. Two
distinct identifiers that both slugify to empty (e.g. "测试" and
"首页") share the artifact*.html basename, so a brand-new save under
the second identifier was being compared against — and falsely
rejected because of — the unrelated first.

The same shape exists symmetrically: a non-empty-slug identifier
literally named "artifact" would falsely match empty-slug fallback
files written under any other identifier.

Fix: filename pattern matching is now a candidate generator, not
the source of truth. For every candidate sibling, read its
.artifact.json sidecar and verify metadata.identifier matches the
input via artifactIdentifiersMatch (raw equality OR shared non-empty
slug). Files without a sidecar are skipped — they weren't written
through the artifact-tag path this guard targets, and treating them
as priors was always a stretch.

Empty-slug equivalence is intentionally NOT honored: 测试 != 首页
even though both slugify to empty. The whole bug was conflating
distinct identifiers via the fallback name; slug-equivalence kicks
in only for non-empty slugs (Landing Page <-> landing-page).

Tests: unit fixtures now write file+sidecar pairs (mirrors prod);
new artifactIdentifiersMatch suite covers the 5 equivalence cases;
new HTTP test does NOT cross-reject distinct empty-slug identifiers
asserts the second save returns 200 instead of 422; new unit test
skips files without a sidecar.

42 tests pass across stub-guard suites.

* fix(stub-guard): require canonical-form anchor in identifier match to avoid 60-char truncation collisions

Round 5 review on PR #1171 (mrcfps) caught another false-positive in
artifactIdentifiersMatch: slugifyArtifactIdentifier truncates at 60
chars, so two distinct >60-char identifiers that share their first
60 chars (e.g. "A...A1" and "A...A2", 70 chars each) slugify to the
same string and would falsely bridge. Same shape as the empty-slug
fallback bug from round 4, just at the other end of the input range.

Tighten the rule: slug-equivalence requires at least one input to BE
its own canonical slug form. That keeps the legitimate bridge
("Landing Page" <-> "landing-page" — second input IS the slug) but
rejects truncation collisions ("A...A1" <-> "A...A2" — neither is in
canonical form).

Side effect: two non-canonical forms that slugify to the same value
no longer bridge (e.g. "Landing Page" vs "LANDING-PAGE"). This is
correct: without one canonical anchor we can't safely call them the
same lineage. Updated the slug-equivalence test to assert the new
semantics explicitly with both directions and a negative case.

Tests: 2 new cases (no bridge for >60-char truncation collision; raw
70-char to its 60-char truncated slug still bridges) + 1 negative
test for the non-canonical-pair case. 45 tests pass.

* fix(stub-guard): cover legacy sidecar-less HTML priors

Round 6 review on PR #1171 (mrcfps, non-blocking) caught a real
legacy bypass: round 4's sidecar-required policy skipped any HTML
file without an .artifact.json companion, but readManifestForPath
(projects.ts) treats those same files as legitimate artifacts via
inferLegacyManifest. So a project with an older sidecar-less
dashboard.html (pre-sidecar era, Write-tool-emitted, paste-text,
manual import, etc.) let its first stub rewrite through as a
supposed "first emission".

Fix: when the sidecar is missing, derive a synthetic identifier
from the filename (strip the (-N)?\.html? suffix) and run it
through the same artifactIdentifiersMatch rules. Synthetic
identifiers come from already-slugified filenames, so they bridge
raw inputs only via the canonical-form rule established in round
5 — no truncation collisions, no empty-slug conflation, no
unrelated cross-identifier matches.

Tests: 3 new unit cases (legacy fallback finds the prior; bridges
raw->slug under the same rules; does NOT bridge unrelated slug
forms via inference) + 1 new HTTP test that seeds a sidecar-less
prior via the artifact-manifest-less write path and asserts the
stub rewrite is rejected with 422 ARTIFACT_REGRESSION.

48 tests pass across stub-guard suites (was 45).

* fix(stub-guard): try both interpretations for legacy filename inference

Round 7 review on PR #1171 (mrcfps, non-blocking) caught a real
ambiguity in the round-6 legacy fallback: a filename like
`phase-2.html` is genuinely ambiguous without a sidecar. It could
be the identifier "phase" with a -2 collision suffix, OR the
standalone identifier "phase-2". The round-6 helper only stripped
the suffix, so a sidecar-less `phase-2.html` followed by a stub
emission with metadata.identifier="phase-2" bypassed the guard
("phase-2" doesn't match the inferred "phase").

Fix: when the sidecar is missing, generate both candidate
identifiers (full basename and suffix-stripped basename) and
accept the file as a prior if either matches. Visible false
positives are preferable to silent false negatives — and the
canonical-form anchor in artifactIdentifiersMatch still rules out
truncation collisions and empty-slug conflations regardless of
which candidate matched.

Tests: 2 new unit cases (full-basename interpretation finds
"phase-2"; suffix-stripped interpretation also finds "phase") and
1 new HTTP test that seeds a sidecar-less `phase-2.html` and
asserts the stub rewrite is rejected with 422 ARTIFACT_REGRESSION.

51 tests pass across stub-guard suites (was 48).

---------

Co-authored-by: Sebastian Westberg <sebastianwestberg@users.noreply.github.com>
2026-05-11 19:59:37 +08:00
初晨
0f0d214298
fix(web): render static previews for sketch json files (#1060)
* fix(web): render static previews for sketch json files

* fix(web): tolerate malformed sketch text items

* fix(web): harden sketch preview parsing

* fix(web): preserve sketch items on round-trip

* fix(web): clear sketch files destructively

* fix(web): unblock unsupported sketch saves
2026-05-11 19:29:46 +08:00
Dongsen
fd67b680d7
fix(contracts): pin API-mode override above discovery layer (#313) (#1207)
* fix(contracts): pin API-mode override above discovery layer (#313)

The old streamFormat='plain' rule was appended at the BOTTOM of the
composed prompt, but DISCOVERY_AND_PHILOSOPHY is pinned at the TOP with
its own 'these override anything later' header — so its hard rules
('TodoWrite on turn 3', 'brand-spec extraction via Bash + Read +
WebFetch') still won precedence in API mode. With no real tools wired
through to the Anthropic Messages path, the agent narrated pseudo-tool
markup (<todo-list>...</todo-list>, [读取 X]) instead of emitting
structured tool_use events the UI could render.

Move the API-mode override to the absolute top of the prompt so it
beats the discovery layer, name every unavailable tool, and explicitly
forbid the pseudo-tool / fake-protocol markup observed in #313.
<artifact> output and <question-form> discovery are still allowed —
both are markup the UI parses, not tool calls.

* fix(daemon): mirror API-mode override above discovery layer (#313)

Address Codex + mrcfps review on #1207: the daemon has its own copy of
composeSystemPrompt that is hit by any adapter declaring streamFormat:
'plain' (e.g. DeepSeek) via server.ts:6190. That copy still appended
the obsolete bottom '## API mode rule', which loses the precedence war
against DISCOVERY_AND_PHILOSOPHY's 'these override anything later'
header — so plain-stream daemon agents could still narrate
<todo-list> / [读取 X] pseudo-tool markup.

Mirror the same top-anchored API_MODE_OVERRIDE here (byte-identical to
the contracts copy) so both code paths produce the same behaviour.
Adds 8 daemon-side tests including the indexOf-based positional
assertion that pins the override above the discovery layer header.
2026-05-11 19:29:34 +08:00
Dongsen
12ce5ad38b
fix(web): ignore <artifact> tags inside markdown code spans and fences (#1132)
* test(web): add failing parser cases for <artifact> recitation in markdown code

Cover the three real-world prose contexts where the model legitimately
quotes the artifact tag without intending to emit one:

- inside an inline backtick span
- inside a fenced code block
- spread across streaming chunks crossing the fence boundary

Establishes the RED baseline before parser code-fence awareness lands.

* fix(web): ignore <artifact> tags inside markdown code spans and fences

The streaming artifact parser scanned the buffer with a raw indexOf,
guarded only by 'next char must be whitespace'. That meant any literal
<artifact ...> the model recited while documenting the protocol — even
inside backticks or a ```html fence — flipped the parser into artifact
mode, swallowed the rest of the reply from the chat UI, and (when a
matching </artifact> appeared in the recitation) silently wrote a
spurious file to disk via persistArtifact.

Replace findOpenTag with a linear scan that tracks fenced code blocks
(```) and inline code spans (`), skipping any <artifact prefix found
inside either. If the buffer ends mid-fence, return a partial match
anchored at the fence start so the next streaming chunk can resolve
the boundary without losing fence context.

Closes #1130.

* fix(web): match renderer fence/inline-code rules in artifact parser

Codex review on PR #1132 caught that the previous fix toggled inFence on
any triple-backtick run anywhere in the buffer, including mid-line, while
the chat renderer (apps/web/src/runtime/markdown.tsx) only treats ``` as
a fence when it occupies a whole line matching /^[ ]{0,3}```(\w[\w+-]*)?\s*$/.
That asymmetry would suppress a real <artifact> tag emitted after a prose
sentence like "the opening marker is ```html and the response then writes:".

Rework findOpenTag in three passes that mirror the renderer:

  1. Walk \n-terminated lines; only a line that matches FENCE_LINE_RE
     toggles fence state. Open fences without a close (or with an
     unterminated tail line) return partial so the next chunk can resolve.
  2. Collect inline code spans with /`[^`]+`/g — the same regex used by
     renderInline — so what the parser skips matches what the user sees as
     code. Unmatched trailing backticks after the last \n hold back.
  3. Find the first <artifact …> outside any skip range; preserve the
     existing partial-prefix tail handling.

Adds a regression test covering the exact case Codex reported.

* test(web): pin parser behavior on double-backtick and in-fence string literal recitation

Two cases raised in PR #1132 review:

- a real artifact tag wrapped in '``<artifact …>``' (double-backtick
  inline code span) should not be treated as a real artifact
- a fenced JS example whose body contains a string literal like
  'const fence = "```";' should not pop fence state early and let a
  later literal <artifact> be parsed as real

Both already pass on 96e88ca because the line-anchored fence regex and
the renderer-aligned inline regex handle them correctly. Pinning the
behavior so future regressions surface as test failures.

* fix(web): make stripArtifact markdown-aware to stop truncating literal recitations

The streaming artifact parser was hardened in 96e88ca to skip <artifact>
recitations inside backticks and fences, but the post-stream stripper at
AssistantMessage.tsx still ran a naive 'content.indexOf("<artifact")' over
the same text events. As reported by lefarcen on PR #1132, that meant
chat replies with literal protocol recitations could still get silently
truncated mid-explanation — even though the parser preserved them in the
text stream and the file panel was no longer polluted with ghost files.

Extract the renderer-aligned classification (FENCE_LINE_RE, INLINE_CODE_RE,
computeSkipRanges, rangeContains) into a single source of truth at
apps/web/src/artifacts/markdown-context.ts so the parser and the stripper
agree on what counts as code. Add apps/web/src/artifacts/strip.ts with a
markdown-aware stripArtifact that:

- ignores any <artifact open inside a fenced block or inline code span
- looks for </artifact> with the same skip-range filter, so a real open
  paired with a literal close inside backticks does not strip a literal
  body that is meant to render
- returns content unchanged when an open exists with no matching real
  close (the previous implementation sliced to end-of-string, which would
  nuke trailing prose on a malformed or still-streaming tag)

Refactor parser.ts to import the shared helpers; behavior preserved (all
seven existing parser tests still pass). New strip.test.ts covers six
cases including the empirically-verified inline-backtick regression.

* fix(web): align artifact stripper/parser fence rules with renderer exactly

Two gaps surfaced in review at a0bf05f:

- markdown-context.ts used a single FENCE_LINE_RE that allowed 0-3 leading
  spaces and reused the same pattern for opening and closing fences. The
  chat renderer (runtime/markdown.tsx:44 and :49) is asymmetric — opens
  with /^```(\w[\w+-]*)?\s*$/, closes with /^```\s*$/, and rejects any
  leading indentation on either side. Indented "   ```html" was being
  treated as a code fence even though the renderer keeps it as a paragraph,
  and a literal "```html" line inside an open fenced example was closing
  the skip range early — both could expose a real or literal <artifact …>
  to the wrong handler.
- stripArtifact discarded computeSkipRanges' unclosedFenceStart, so a
  fenced literal that ends at EOF without a trailing newline (very common
  for chat output) leaked the inner <artifact …> recitation to the
  stripper, reproducing the original #1130 truncation symptom on a
  narrower input shape.

Split FENCE_LINE_RE into FENCE_OPEN_RE / FENCE_CLOSE_RE with no leading
indentation, gate the fence state machine on the right side of the toggle,
and have stripArtifact extend skip ranges to end-of-content when a fence
is left open. Also tightened the parser's tail-line hold-back regex to
match the renderer's no-leading-space rule. Added regression tests for the
EOF-unclosed-fence case, the indented pseudo-fence (renderer treats as
paragraph, stripper must strip the real artifact), and a "```html" line
inside an open fence.

Refs nexu-io/open-design#1130

* refactor(web): align streaming tail-line fence guard with FENCE_OPEN_RE

The streaming parser's tail-line hold-back used a stricter local regex
(/^```\w*$/) than the renderer's FENCE_OPEN_RE (/^```(\w[\w+-]*)?\s*$/),
missing valid opener tails like ```c++, ```ts-, or ``` (trailing space).

In practice these tails are still held back by the unmatched-backtick
parity scan that runs immediately after — three backticks in a tail line
are odd, so firstUnmatched stays set and the parser holds from that
position. So this wasn't a runtime correctness bug, just a regex
divergence that future readers could trip on.

Drop the local regex and reuse FENCE_OPEN_RE so the tail check matches
the same shape the rest of the pipeline already uses. Pinned the
behavior with three new parser tests (`+`/`-` info-string suffix and
trailing-space tails arriving as the first chunk) — they pass at HEAD,
proving the parity scan was already covering these cases.

Refs nexu-io/open-design#1132 (lefarcen polish P2)

* fix(web): scope inline-code skip ranges per block and reject <artifact prefix-shared opens

INLINE_CODE_RE previously ran over the whole buffer, so an unmatched
backtick in one paragraph could pair with a backtick in a later
paragraph and create a phantom inline span that swallowed any real
<artifact …> between them. Mirror runtime/markdown.tsx by splitting the
buffer on fence / blank / heading / list / hr boundaries and running
INLINE_CODE_RE per block region instead.

stripArtifact accepted any unskipped `<artifact` substring as a real
open, while the streaming parser already required a following whitespace
character — so prose like `<artifactual>demo</artifact>` was being
truncated to `prefix  suffix`. Extract the parser's real-open guard into
isRealArtifactOpenAt and reuse it from both sides.

While reordering findOpenTag for the shared guard, also fix the related
hold-back ordering issue tracked at #1141: a stray tail-line backtick or
fence-opener prefix used to suppress an artifact already complete
earlier in the buffer. Scan for the earliest complete real open first,
then pick the earliest hold-back position only when no complete tag was
found.

Regressions pinned in parser.test.ts and strip.test.ts for both new
finding shapes.

* fix(web): keep HR-shaped lines inside paragraph regions for inline-code scanning

The previous walker closed inline-scan regions on lines matching the HR
regex, but `parseBlocks()` in runtime/markdown.tsx does not break a
paragraph on HR — its inner accumulation loop only breaks on blank /
fence / heading / ul / ol (runtime/markdown.tsx:95-104). HR is only an
HR block in the outer loop's first-look, never mid-paragraph.

So inputs like `intro \`\n---\n<artifact …>…</artifact>\n---\nclosing \``
are one paragraph in the renderer, whose two stray backticks pair to
cover the literal artifact recitation — but the walker was splitting on
the `---` lines, leaving the recitation outside skip ranges, and the
parser/stripper would treat it as a real tag.

Drop HR from the paragraph-break list (HR-shaped lines carry no
backticks of their own, so keeping them inside the surrounding region
is benign either way) and document the renderer-mirror rationale.

Regressions pinned on both sides.
2026-05-11 19:29:22 +08:00
Sid
156bf5a34e
fix(web): refresh home projects after deleting a conversation (#1202) (#1219)
The home design cards render their `Needs input` badge from the
cached `/api/projects` payload — App.tsx owns the `projects` state
and exposes a `refreshProjects` callback that ProjectView already
fires from every other state-changing branch (run end, live-artifact
events, project rename, etc.). The conversation-delete branch
silently skipped it: deleting a conversation that owned an unanswered
`<question-form>` flips the daemon-side flag, but the home view kept
showing the stale badge until the next manual reload.

Call `onProjectsRefresh()` immediately after a successful
`deleteConversation` API response (and only then — if the request
fails the cached state is still the truth and we must not pretend
otherwise). Adds `onProjectsRefresh` to the useCallback deps for
exhaustive-deps correctness; matches the pattern at the four
existing call sites in this file.

New regression coverage in
`apps/web/tests/components/ProjectView.deleteConversation.test.tsx`:
- triggers onProjectsRefresh after deleting a conversation
  (verified RED before this fix, GREEN after)
- does not trigger onProjectsRefresh when the delete request fails
  (defensive complement so a future "always refresh" refactor
  doesn't paper over a real failure with a stale-but-confident UI)
2026-05-11 19:29:09 +08:00
shangxinyu1
10802bb0b0
test: expand nightly UI and desktop regression coverage (#1256)
* e2e(ui): cover examples preview flows

* e2e(ui): cover Codex local CLI fallback UX

* test: expand desktop and connector regression coverage

* e2e(ui): cover workspace restoration flows

* e2e(ui): cover retry recovery workspace flow

* test: cover artifact and connector recovery flows

* e2e(ui): cover Continue in CLI stale provenance flow

* e2e(ui): cover BYOK model fetch caching

* test: expand Orbit and desktop connector coverage

* e2e(ui): cover workspace quick switcher recovery flows

* e2e(ui): cover connector pending authorization recovery

* e2e(ui): cover workspace and conversation restoration routes

* e2e(ui): cover conversation draft and attachment restoration

* e2e(ui): cover conversation history selection recovery

* e2e(ui): cover workspace surface conversation selection

* test: cover artifact presentation and orbit link behavior

* test: cover artifact external link restoration

* e2e(ui): cover root-route deep-link restoration

* e2e(specs): cover Orbit open-artifact desktop click

* e2e(specs): cover desktop artifact open link

* test: fix Orbit settings fixture type drift

* test: split Playwright critical and extended suites

* test: fix ProjectView design template fixtures

* ci: split workspace test stages

* guard: allow split Playwright suite scripts

* test: shrink Playwright critical suite

* test: restore omitted Playwright suites
2026-05-11 19:23:13 +08:00
PerishFire
8c0fb8dc01
feat(tools-pr): add maintainer PR-duty workspace (#1259)
* feat(tools-pr): add maintainer PR-duty workspace

Adds `tools/pr` as the maintainer-only control plane for PR-duty work on
this repo. Thin `gh` wrapper that encodes repo-specific knowledge:
review lanes, forbidden surfaces, lane-specific checklists, validation
command derivation from touched packages.

Subcommands:
- `list` — triage open queue by lane and review-state bucket.
- `view <num>` — agent-friendly review brief for a single PR.
- `classify [num]` — emit script-level tags for one PR or the whole
  open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/`
  with rate-limit telemetry per run.
- `assignment` — assigner-perspective view of PR ownership, idle time,
  and blockers (derived from existing tags; no new judgments).

Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase,
forbidden-surface, unlabeled, duplicate-title, non-ascii-slug,
maintainer-edits-disabled, org-member, unresolved-changes-requested,
stale-approval, and three awaiting-* timing tags. Each rule is
expressible as one factual sentence over `gh` data + repo paths — see
`tools/pr/AGENTS.md` for the full dictionary plus precision rules.

Templates in `tools/pr/templates/*.md` are aesthetic references for
recurring maintainer comments (duplicate-title ask, awaiting-author
nudge, agent-review brief shape). `templates/examples/` holds
frozen-in-time agent-review snapshots for three PR shapes.

Infrastructure:
- `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s
  backoff) on transient 5xx / network errors. Persistent failures still
  surface — retry is anti-jitter, not an exponential-backoff resilience
  layer.
- Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines)
  use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to
  stay under GitHub's GraphQL server-side timeout. Light chunks stay on
  `gh pr list --json`.
- `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members
  --paginate`.

Wiring:
- Root `package.json` adds `pnpm tools-pr` to the allowed root entry
  points.
- `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace
  packages.
- `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and
  `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level
  layout allowlist.
- Root `AGENTS.md` and `tools/AGENTS.md` document the new command
  surface, root-command-boundary update, and per-tool ownership.

* docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md

Adds a `PR-duty tooling` section to the root AGENTS.md summarising what
`pnpm tools-pr` is, listing the four common subcommands (list / view /
classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for
the full tag dictionary, operational playbook, templates, and design
rules. The section keeps root-level guidance to high-level orientation
while details stay local to the tool's own AGENTS.md.

* fix(tools-pr): drop overly broad touches-root-package.json forbidden hit

`deriveForbidden` was flagging any change to root `package.json` as a
forbidden-surface hit, but AGENTS.md §Root command boundary only forbids
specific *lifecycle* aliases (pnpm dev / test / build / daemon / preview
/ start) — tools-control-plane entrypoints like `pnpm tools-pr` are
explicitly allowed. Distinguishing "forbidden alias" from "allowed
entry" requires reading the diff content, which is `pnpm guard`'s job
rather than a path-derived classify tag.

Dogfooded on this branch's own PR (#1259), which added the `pnpm
tools-pr` script and was incorrectly flagged. Removing the hit aligns
the `forbidden-surface` tag with what tools-pr can mechanically detect
from file paths alone (apps/nextjs/, packages/shared/).

* fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator

Three review follow-ups on #1259, all factual fixes:

- `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a
  one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps
  connection page size at 100, so the previous implementation would
  fail at runtime when callers passed `--limit > 100`. The paginated
  path makes the commits fetch consistent with the other heavy chunks
  (reviews, comments, assignment timelines) and removes the artificial
  ceiling entirely. The `limit` parameter is dropped from
  `fetchOpenPrCommits`; the CLI `--limit` continues to bound the
  `gh pr list --json` chunks.
- `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision`
  and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge
  state `CLEAN` or `UNSTABLE` and carries no blockers, status renders
  as `ready to merge` instead of falling through to `in review`. The
  assignment view loses its main triage signal without this — a clean
  human-approved PR rendered identical to a REVIEW_REQUIRED one.
- `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both
  constructed the title-index key with a literal NUL byte between
  author and title, which made the file appear as binary in `git diff`
  / review tooling. Replaced the literal byte with a Unicode escape
  sequence in source; the runtime string value is identical, the
  source stays plain text and round-trips through review tooling
  cleanly.

* fix(tools-pr): raise default --limit to 1000 to cover the live open queue

mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`)
defaults to `--limit 100`, which silently drops every PR past the first
100 in the open queue. The repo currently sits at 104 open PRs, so the
out-of-the-box run was already omitting four PRs.

Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`,
and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates
internally, so a high cap is cheap. Users can still pass `--limit <small>`
for a truncated preview. CLI help text on the three subcommands updated to
match.

* fix(web): pass designTemplates to ProjectView render helper

#955 made `designTemplates` a required Prop on ProjectView, but the test
helper added in #1244 (`renderProjectView` in
`ProjectView.api-empty-response.test.tsx`) was never updated. The two
PRs landed on main without conflicting, leaving `apps/web` typecheck red
for every PR that rebases past b5eb8c16.

Pass `designTemplates={[] as SkillSummary[]}` alongside the existing
`skills={[] as SkillSummary[]}` so the helper compiles. The component
already treats the array shape (empty included) as a no-op fallback in
the empty-response paths the test exercises.

* fix(tools-pr): correct author signal + merge inline review comments

Two correctness gaps in the awaiting-* signal pipeline surfaced during
review of the new tools-pr commands:

1. `authorSignalAt` iterated every PR commit unconditionally. On
   `maintainerCanModify=true` PRs a maintainer's follow-up push would
   advance the author timestamp, masking a stalled author response.
   Filter commits to those whose `authorLogin` matches `facts.author`,
   mirroring the same filter already applied to comments.

2. `fetchOpenPrComments` (and `fetchView`) only fetched
   `pullRequest.comments` / `gh pr view --json comments`, which is the
   issue-conversation thread. Inline review-thread replies — where
   authors and reviewers actually exchange most fix-up replies — live in
   `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let
   `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point
   at the wrong side after someone replied inline. Extend the list-mode
   GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`,
   and add a parallel REST inline-comments fetch in `fetchView` that
   merges into `GhView.comments`.
2026-05-11 19:17:21 +08:00
Tom Huang
b5eb8c1647
feat: generic skills + split skills/design-templates + finalize-design API (#955)
* feat: general-purpose skills with @-mention composition and user import

Lift skills from "one mode-bound skill per project" to a generic capability
the user can compose per turn:

- Daemon: scan multiple skill roots (user-skills under runtime data, then
  the bundled `skills/`); user-imported skills can shadow built-ins by id.
- New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints,
  with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete
  protection.
- ChatRequest gains `skillIds: string[]`; the chat run concatenates each
  picked skill's body (and merges craftRequires) into the system prompt
  for that turn only — the project's persistent `skillId` is untouched.
- Web composer: `@` popover now lists skills alongside project files;
  picks render as removable chips above the textarea and ride along with
  the request as `skillIds`.
- Settings → Library: import form (name/description/triggers/body),
  per-card delete for user skills, "user" origin badge.

* chore(web): drop welcome pet teaser + add ds→prompt-template mapping util

- SettingsDialog: remove the inline pet adoption teaser from the welcome
  panel so the first-run modal stays focused on configuration.
- New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design
  system's authored metadata to prompt-template gallery categories.
  Imported by the design-system gallery wiring on a sibling branch; no
  callers in this branch yet.

* feat: split skills/design-templates and add finalize-design API

Phase 0 of the skills/design-templates refactor (specs/current/
skills-and-design-templates.md):

- Move ~104 rendering catalogue entries from skills/ to design-templates/
  and keep skills/ for the small set of functional skills that *do work*
  on user input (utilities, briefs, packagers).
- Add design-templates/AGENTS.md and skills/AGENTS.md describing the
  contract, and a brand-agnostic craft/ surface for opt-in craft rules.
- Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and
  an /api/design-templates surface mirroring /api/skills. Asset/example
  routes still span both registries so existing srcdoc URLs keep
  resolving across the rename.
- Web: split LibrarySection into SkillsSection + DesignSystemsSection,
  rename the EntryView "Examples" tab to "Templates", and update locales
  + the New-project picker accordingly.

Adds the finalize-design endpoint:

- New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/
  finalize.ts — one-shot synthesis of a project's transcript + active
  design system + current artifact into <projectDir>/DESIGN.md via the
  Anthropic Messages API. Per-project .finalize.lock mirrors the
  transcript-export hygiene from PR #493; provider credentials are not
  persisted by the daemon.

Other supporting changes:

- README + AGENTS.md updates to document the new directory split and
  craft/ surface, plus i18n strings across 13 locales.
- Test refactors and new coverage (finalize-design, runs, sidecar
  server, plus refreshed daemon integration tests).
- .gitignore: scope the *.exe ignore to /OpenDesign.exe so legitimate
  vendor binaries are no longer hidden.

* fix(merge): move clinical-case-report to design-templates/

Origin/main added the clinical-case-report skill under skills/ before
the skills/design-templates split landed. Its od.mode is prototype, so
per specs/current/skills-and-design-templates.md it is a design template
and belongs alongside the other rendering catalogue entries — not under
the slimmed-down functional skills/ root. Moving it keeps the EntryView
Templates tab consistent with origin/main's intent.

* feat(skills): curated design/creative catalogue + collapsible Settings rows

Seed ~100 curated design/creative skill stubs under skills/ sourced from
awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent).
Each stub carries an od.category tag so the new filter pill row in
Settings -> Skills can group them. The seed script
(scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills)
is idempotent: it only creates folders that don't already exist, so
hand-edited stubs are never overwritten.

- Daemon: parse and surface od.category on SkillInfo with a strict slug
  normaliser; mirror the field on SkillSummary in @open-design/contracts.
  Category is purely a UI hint — system-prompt composition is unchanged.
- Web: rewrite SkillsSection from a left-list / right-detail grid into a
  vertical stack of collapsible rows mirroring the External MCP panel
  (header always visible with name + mode/source/category pills + per-row
  enable toggle; SKILL.md preview, file tree and inline edit form expand
  on demand). Add a Category filter row above the list. Reorder Settings
  nav so Skills + External MCP sit above the Composio/MCP cluster. Update
  composer placeholder/hint across 17 locales to advertise '@ files or
  skills · / for commands'.
- Docs: extend skills/AGENTS.md with the curated catalogue rules
  (idempotency, category vocabulary, no upstream vendoring).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(skills): teach localized-content + system-prompt tests about the skills/design-templates split

mrcfps blocking review on PR #955: the skills/design-templates split
(b5993385) moved ~110 SKILL.md entries out of `skills/` and into
`design-templates/`, but two repo-level tests still hard-coded the
single-root layout, so CI gates went red on the merged branch:

- `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills`
  while the locale `skillCopy` map keeps id-keyed entries spanning
  both roots (ExamplesTab/Templates uses one lookup regardless of
  origin). Teach the helper to read both `skills/` and
  `design-templates/`, deduplicating ids so the union matches the
  localized claim.
- `apps/daemon/tests/prompts/system.test.ts` read
  `skills/live-artifact/SKILL.md`, which now lives under
  `design-templates/live-artifact/`. Update the absolute path so
  composeSystemPrompt's coverage of the live-artifact preamble is
  exercised again.

Also enroll the curated design/creative catalogue (PR #955, ~91
stubs sourced from awesome-claude-skills / awesome-agent-skills) in
the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are
English-only by design (frontmatter advertises an upstream URL); the
fallback list is exactly the place to acknowledge "we know this id
exists, English copy is fine here" so the localized-content coverage
gate passes without forcing a translation task per locale.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids

mrcfps PR #955 review: `buildSkillMarkdown` emitted `name:
${escapeYamlString(name)}` without quotes, so YAML coerced names
like `123`, `true`, `false`, or `null` into non-string scalars on
re-parse. listSkills() then read `data.name` as a number/boolean
and the import flow's follow-up `findSkillById(skills, result.id)`
missed it, falling into `/api/skills/import`'s "imported skill
could not be re-read" 500 path for those ids.

Switch the emitter to a quoted scalar (`name: "..."`) — the
double-escape already in `escapeYamlString` makes the quoted form
safe — and add a round-trip test covering `123`, `true`, `false`,
`null`, and `0` to lock in the contract.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop staged-skill chips when the matching @<id> token leaves the draft

mrcfps PR #955 review: `submit()` always forwarded every id in
`stagedSkills`, but that state was only mutated on picker click and
chip removal. Hand-deleting an `@<id>` token from the textarea left
the chip staged, so the request still carried `skillIds: [<id>]` and
the daemon composed a skill the prompt no longer referenced.

Sync the chips with the draft inside `handleChange()` by pruning
`stagedSkills` whenever the new value no longer contains the
`@<id>` token (using the same whitespace boundary as
`removeStagedSkill`'s strip regex). Comment explains why this
prune does not run for `staged` file attachments — users frequently
add files via the upload button without leaving an `@<path>` token,
so a symmetric prune there would erase legitimate uploads.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(daemon): stage @-composed skills' side files alongside the active skill

codex PR #955 review: composing a per-turn `@`-picked skill into the
system prompt appended its body (with the `withSkillRootPreamble`
guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`)
but never staged the actual folder. `startChatRun` only copied
`activeSkillDir`, so when the project's primary skill was different
(or absent) the composed skill's references/, examples/, and scripts/
files lived only at their absolute repo path — agents that honour
the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with
allowlisted gpt-image projects) couldn't reach them.

Thread the composed skills' dirs out of `composeDaemonSystemPrompt`
as `extraSkillDirs` and stage each one through the same
`stageActiveSkill` API used for the primary skill. Dedupe by folder
basename so a project whose primary skill is also `@`-composed isn't
copied twice. Each preamble already advertises its own folder, so the
prompt and the staged tree stay aligned without further changes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): respect the Library disable toggle in the project @-mention picker

codex PR #955 review: only `EntryView` received `enabledSkills`
(filtered against `config.disabledSkills`); active projects still
got `skills={skills}` raw, so a skill the user disabled in Settings
kept appearing in the project's `@`-mention popover and could ride
along to the daemon via `skillIds`. That broke the Library toggle
for any project opened on the post-split branch.

Compute a functional-skills-only enabled subset
(`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead.
Templates stay separate — design-templates are filtered through their
own `enabledDesignTemplates` memo for the Templates gallery — so
ProjectView's chat composer still only sees skills, never templates,
matching the pre-split prop surface.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): mock /api/design-templates for example-use-prompt flow

The Templates tab in EntryView fetches from /api/design-templates after
the skills/design-templates split (specs/current/skills-and-design-templates.md).
The example-use-prompt Playwright scenario only mocked /api/skills, so the
gallery card never appeared and the test timed out waiting on
example-card-warm-utility-example. Serve the same fixture summary on both
endpoints so the templates gallery renders the card the test clicks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools-pack): create design-templates fixture for resources test

The packaging resources copy now bundles the new design-templates tree
alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The
copyBundledResourceTrees fixture only created skills, design-systems,
craft, etc., so the recursive copy crashed with ENOENT on
design-templates before it could check the prompt-templates assertion.
Add the missing fixture directory so the test exercises the same set
of resource trees the packaged build does.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): clone built-in side files into the shadow on first edit

mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR
shadow folder that contained only a new SKILL.md. The next listSkills()
pass surfaced the shadow as the active dir, but every side-file resolver
(/api/skills/:id/files, /example, /assets/*, the system-prompt preamble,
and the per-turn cwd staging) reads through skill.dir. With nothing but
SKILL.md in the shadow, the bundled assets/, references/, scripts/, and
examples/ disappeared the moment the user hit save — a built-in like
last30days or live-artifact would break immediately after edit instead
of just having its body overridden.

Teach updateUserSkill() to take a `sourceDir` and clone every entry
except SKILL.md / dotfiles into the shadow on the very first edit. The
shadow stays self-contained, so all the resolvers keep working without
fallback bookkeeping. Subsequent edits detect the existing shadow and
skip the clone, so user tweaks under the side tree survive a re-save.

Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler
and add two regression tests:
- 'clones built-in side files into the shadow on the first edit' walks
  the file tree after save and asserts assets/template.html, references/
  notes.md, and scripts/helper.sh all round-trip from the built-in.
- 'preserves user-edited side files on subsequent edits' edits the
  staged assets/template.html, re-saves, and confirms the user content
  is still there.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): rename home tab from Examples to Templates

The Examples tab was renamed to Templates in EntryView (b5993385's
skills/design-templates split — entry.tabExamples became entry.tabTemplates
and the tab value moved from 'examples' to 'templates'), but
entry-chrome-flows still asserted the old label and testId. Update both.

* fix(skills+web): preserve template body in API mode and dir-based skill delete

Two follow-ups from PR #955 review:

1. ProjectView only received `enabledFunctionalSkills`, but
   `composedSystemPrompt()` still resolved `project.skillId` through that
   prop and `fetchSkill()`. Projects created from the new
   `/api/design-templates` surface keep a template id in `project.skillId`,
   so opening one in API mode dropped the template body from the system
   prompt and the upstream request ran without the project's primary
   template instructions. Now ProjectView takes a separate
   `designTemplates` prop (the unfiltered template list, so a
   later-disabled template still loads for projects already created from
   it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups
   fall back to that list, with `fetchDesignTemplate()` as the body-fetch
   fallback to `fetchSkill()`. The chat composer's `@`-picker keeps
   receiving only the enabled functional skills.

2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)`
   which re-slugified the frontmatter id and removed
   `<userSkillsDir>/<slug>/`. That matched the import shape but missed the
   install shape — `installFromTarget` writes the folder at
   `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local
   symlink), neither of which is guaranteed to equal the slugified
   frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)`
   handler at the install routes never fired because Express resolved the
   earlier registration first, leaving the install/uninstall path without
   working teardown. The handler now removes `skill.dir` (the absolute
   path listSkills already discovered) under a USER_SKILLS_DIR safety
   check, using `lstat` + `unlinkSync` so symlinked local installs unlink
   cleanly without recursing into the user's source tree. The dead
   duplicate handler is removed; `deleteUserSkill` is dropped from the
   server.ts import set (still exported and unit-tested in skills.ts).
   Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts`
   pins both shapes plus the symlink-preserves-source case.

* test(daemon): point hyperframes system-prompt test at design-templates

The merge with main brought in a hyperframes system-prompt test that
reads `skills/hyperframes/SKILL.md`, but this branch's split moved
`hyperframes` into `design-templates/` (same migration as `live-artifact`
already handled above in this file). CI was failing with ENOENT on the
old path.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 17:48:34 +08:00
PerishFire
f2db5a749c
chore: enforce PR→issue linking discipline (#1263)
PRs that omit Fixes #N break the release-time reverse lookup
(issue → closing PR → merge sha → first containing tag), since the
auto-link only fires on the explicit closing keywords. We've been doing
this by hand on recent fixes; codify it so future PRs don't drift.

- Add .github/pull_request_template.md with a Fixes # placeholder so
  the link surface is in front of the author by default.
- Add a corresponding bullet to the Bug follow-up workflow in the root
  AGENTS.md so the discipline lives next to the methodology that
  produces issue-linked work.
2026-05-11 17:24:24 +08:00
PerishFire
a797e079b1
fix(desktop): exit fullscreen before hiding window on macOS close (#1249)
* fix(desktop): exit fullscreen before hiding window on macOS close (#1215)

When a preview is in 演示 → 全屏 mode, the macOS close handler called
window.hide() directly, leaving the OS fullscreen Space orphaned as a
black screen — the window vanished but the Space stayed up.

Extract hideWindowExitingFullscreen as the named invariant ("hide,
but first leave fullscreen so the OS Space tears down with the window")
and route the darwin close handler through it. The hide is deferred
until 'leave-full-screen' fires so we don't race the OS Space teardown.

Bootstraps Vitest on apps/desktop with a single test under
tests/main/hide-window-exiting-fullscreen.test.ts that exercises the
helper through a structural mock — the bug shape is pure logic, no real
Electron window required. Spec was red against a hide-only helper and
green after the leave-full-screen sequencing.

* docs(agents): codify bug follow-up workflow

Distill the spec-first / cheapest-layer / scope-discipline /
invariant-shaped-fix / baseline-diff playbook used recently on #135 and
#1215 into a top-level subsection of root AGENTS.md, framed as a default
action shape with explicit room for case-by-case judgment rather than a
hard rule. Includes a single pointer back to the worked example spec.

* docs(agents): require staged human verification for visible bugs

Add the human-verification gate as a sixth bullet in the Bug follow-up
workflow. UI / platform-native / animation symptoms can pass green specs
and still ship the visible regression — proven by #1215, where the
desktop unit test green-lighted the helper logic but only a side-by-side
buggy-vs-fix run on a real macOS Space proved the black-screen actually
went away.

Reinforces the production-API-only seed constraint while we're there:
source-level backdoors prove a fake flow, not the real one, so they
invalidate the verification.

* fix(desktop): defer hide across the fullscreen-enter transition (#1215)

mrcfps observed on PR #1249 that the close handler only catches windows
already in fullscreen — Electron's enter-full-screen event is async on
macOS, so isFullScreen() can still read false during the OS Space
transition triggered by requestFullscreen(). A close in that window
took the plain hide() path and stranded the same black Space the fix
was meant to eliminate.

Track in-flight fullscreen entry from webContents.enter-html-full-screen
(set) and BrowserWindow.{enter,leave}-full-screen (clear), and surface
it through WindowFullscreenSurface.isEnteringFullscreen. The helper now
parks on enter-full-screen until the OS confirms the Space, then runs
the existing exit-then-hide path.

Adds a regression test ("waits out a fullscreen-enter transition before
exiting and hiding") that goes red against the previous helper.
2026-05-11 17:04:42 +08:00
Caprika
f7f2661bda
[codex] Handle empty API responses as no output (#1244)
* Handle empty API responses as no output

* Fix empty API response comment cleanup

* Stabilize API empty response detection
2026-05-11 16:57:02 +08:00
PerishFire
421ddf553c
fix(pack/win): close running app before silent reinstall (#1238) 2026-05-11 16:35:07 +08:00
nettee
e859c31574
fix(web): complete finished tool calls missing results (#1240) 2026-05-11 15:54:11 +08:00