open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
Marc Chan	f294ab4915	chore(ci): add visual regression PR workflow (#2372 ) * Add visual regression PR workflow * Allow manual visual PR comments * Post visual comments for same-repo PRs * fix(ci): surface R2 lookup failures in visual report Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Align visual workflow names	2026-05-20 15:05:59 +08:00
kami	c85da3eb40	fix: sync landing source-of-truth paths (#2066 )	2026-05-20 11:44:04 +08:00
PerishFire	2c128e0e91	refactor desktop host bridge (#2246 )	2026-05-19 18:27:05 +08:00
ashleyashli	07659b7272	feat(seo): add Search Console reporting workflows (#2229 ) * feat(blog): daily 3-day Search Console traffic digest Adds `blog-3day-report.yml` (cron 09:00 Asia/Shanghai) and a companion `report-3day.ts` script that refreshes `docs/blog-traffic-digest.md` once per day. The digest has two sections: - T-3 spotlight: posts published exactly three days ago, with their 3-day Search Analytics window plus current URL Inspection coverage state. - Rolling 30-day cohort: every post 1–30 days old with its latest 3-day Search Analytics window, sorted by impressions descending. The workflow is read-only against Google APIs (no Indexing API, no "request indexing" automation) and mirrors the secret / config plumbing already used by `blog-indexing-monitor.yml`. Output lands in a reviewable `automation/blog-traffic-digest` PR opened by the open-design bot. Also widens `querySearchAnalytics` to accept `windowDays: 3 \| 7 \| 28` and updates `docs/blog-indexing-automation.md` with the new pipeline. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(seo): post daily Search Console report to Feishu Co-authored-by: Cursor <cursoragent@cursor.com> * feat(blog): push traffic digest to Feishu Emit a compact JSON summary from the daily 3-day traffic digest and add a Feishu custom bot sender for the summary card. Wire the workflow to send the card when `FEISHU_BLOG_DIGEST_WEBHOOK` is configured while keeping Markdown PR output as the source of truth. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(landing-page): add Discord routing CTAs Add a lightweight Discord pill to the landing hero and Discord links in the landing and blog footers so community routing is visible without displacing the primary GitHub and download CTAs. Add a blog-ending conversion card that points guide and use-case readers to the internal workflows library, while keeping Discord as a secondary support path. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 18:09:44 +08:00
ashleyashli	e702a6a49f	Fix blog indexing status PR base (#2106 ) Set an explicit base branch for generated indexing status PRs so create-pull-request works after SHA-based checkouts. Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 18:08:01 +08:00
PerishFire	bb13eee765	chore: optimize CI and beta release runtime (#2231 ) * chore(ci): add runtime trace summaries * chore(ci): tighten measured workspace steps * chore(release): tighten beta setup steps * chore(release): slim beta windows smoke * chore(ci): shard daemon tests * chore(ci): harden runtime trace lookup * chore(release): avoid mac pnpm cache in beta * chore(ci): split critical playwright checks * chore(release): publish beta platforms from builders * test(e2e): update beta release workflow expectation * chore(ci): stop gating PRs on nix check * fix(release): keep beta latest complete	2026-05-19 18:06:28 +08:00
Yuhao Chen	a1e8ce480a	fix(ci): include bundled resources in Windows cache key (#2034 )	2026-05-19 16:50:39 +08:00
Marc Chan	4f116d9eaf	fix(ci): anchor PR inactivity clock to author responses (#2185 ) * fix(ci): anchor PR inactivity clock to author responses * fix(ci): add dry-run mode to PR inactivity workflow * fix(ci): read workflow dry-run input from event payload * fix(ci): log PR inactivity dry-run diagnostics * fix(ci): accept both review association field names * fix(ci): log PR 642 feedback payload shapes * fix(ci): trust PR reviewers by repo permission * fix(ci): remove temporary inactivity debug logs	2026-05-19 13:59:15 +08:00
PerishFire	bd48c597b0	chore: pin dependency versions and harden CI caches (#2189 ) * chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation	2026-05-19 13:58:27 +08:00
PerishFire	99b42726b8	Simplify CI PR gate (#2183 )	2026-05-19 13:18:41 +08:00
shangxinyu1	f650a043d9	test(e2e): align entry coverage with redesigned flows (#2101 ) * Migrate entry E2E coverage and split CI * test(e2e): relax connectors auth error assertions * ci: route scenario registry changes to extended e2e * ci: decouple packaged smoke jobs from validate gate * ci: restore pre-split workflow * test(e2e): slim critical ui smoke coverage * test(e2e): move broader entry flows out of critical * test(e2e): restore entry chrome coverage to ci * ci: parallelize workspace validation jobs * test(web): stabilize media palette bridge assertion * ci: cache e2e playwright browsers	2026-05-19 11:26:40 +08:00
Marc Chan	f403ffbfce	ci: add PR-author and stale-issue inactivity workflows (#2055 ) * ci: add PR-author and stale-issue inactivity workflows Adds two queue-management automations: - pr-author-inactivity: reminds PR authors after 72h of inactivity following human reviewer/maintainer feedback (issue comments, non-approval reviews, or inline review comments) and closes after 120h. Author response is detected via issue comments, inline review replies, or commit/force-push events. Bot-authored reviews are intentionally excluded so authors are not pressured by automated nits alone. - stale-issues: marks issues stale after 30 days of inactivity and closes after a further 7 days. Exempts good first issue, help wanted, and security labels. A pre-step also auto-applies 'exempt-from-stale' to issues opened by org members/owners/ collaborators, since actions/stale only supports label-based exemptions. PR processing is disabled (handled by the workflow above). * fix: limit PR inactivity feedback to trusted reviewers Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: count author PR reviews as inactivity responses Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-18 16:45:37 +08:00
ashleyashli	0163fa9d84	fix(landing): stabilize blog indexing workflows (#2042 ) Use full workspace installs for blog indexing workflows so root postinstall can resolve workspace build dependencies, and enrich sitemap metadata for blog URLs from frontmatter dates. Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 15:52:29 +08:00
leessju	7766582f0b	chore(ci): scope nix-check workflow permissions to contents:read (#1870 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 1s Details ci / Packaged linux headless smoke (push) Has been skipped Details The other workflows under .github/workflows declare explicit `permissions:` blocks that scope their GITHUB_TOKEN to the minimum required (contents: read for build-only flows). `nix-check.yml` was the lone outlier and inherited the repository's default token permissions instead. Add `permissions: { contents: read }` to align with the rest of the workflow suite and follow GitHub's least-privilege workflow guidance. No behavior change: the job only reads the repo, runs `nix flake check`, and uploads a logs artifact on failure (which uses an action that already declares its own permissions internally). Co-authored-by: nicejames <nicejames@gmail.com>	2026-05-17 11:28:18 +08:00
Marc Chan	6bf865a43b	fix(ci): avoid duplicate nix-check runs on PR branches (#1917 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details landing-page-ci / Validate landing page (push) Failing after 1s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details github-metrics / Generate repository metrics SVG (push) Has been skipped Details nix-check / build (push) Failing after 2s Details ci / Packaged linux headless smoke (push) Has been skipped Details * fix(ci): avoid duplicate nix-check runs on PR branches * fix(ci): keep nix-check on main pushes Restore the nix-check push trigger for main-only updates while still avoiding duplicate PR-branch runs. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-16 23:38:50 +08:00
Marc Chan	1f728ac8e3	fix(ci): only run docker image workflow for release tags (#1916 )	2026-05-16 22:32:33 +08:00
Marc Chan	d6515643ef	fix(ci): use open-design-bot for metrics PRs (#1910 ) * fix(ci): use open-design-bot for metrics PRs * fix(ci): restore metrics workflow app scopes Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-16 21:52:37 +08:00
Marc Chan	fd82384b5e	fix(ci): dispatch metrics branch validation (#1878 ) * fix(ci): dispatch metrics branch validation * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): dispatch metrics branch validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-16 16:15:44 +08:00
lefarcen	7d1adf9fd7	docs: point 0.8.0 preview contributors at main (#1846 ) * docs: point 0.8.0 preview contributors at main, not preview/v0.8.0 0.8.0 has been merged into main (#1832). Anywhere we used to tell contributors to checkout / PR against preview/v0.8.0 was actively mis-routing new PRs. Update: - docs/preview-v0.8.0-announcement.md + zh-CN: status line, Branch row, source-build checkout, and 'open a PR against' guidance now point at main - .github/ISSUE_TEMPLATE/bug-report.yml + feature-request.yml: phrase the 'use the preview template' nudge as 'about the 0.8.0 preview features (now on main)' instead of 'about the preview/v0.8.0 branch' - .github/ISSUE_TEMPLATE/config.yml: same rewording for the contact link - .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml: refresh the description and the intro body so it reads as 'preview features pre-tag', not 'features pre-merge' The preview-v0.8.0-feedback template and preview/v0.8.0 label are intentionally kept: 0.8.0 isn't tagged yet, so we still want a dedicated lane for preview-features feedback. * chore: stop treating preview/v0.8.0 as a live branch Earlier in this PR we kept the preview-v0.8.0 surface area intact — that was the wrong call. 0.8.0 is now on main; pretending there's a parallel 'preview' branch in the templates, labels, and copy was going to keep mis-routing contributors. Drop: - .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml (the dedicated template that auto-applied the preview/v0.8.0 label and prefix) - .github/ISSUE_TEMPLATE/config.yml contact_links entry pointing at it - bug-report.yml + feature-request.yml nudges that sent users there - The Preview-v0.8.0-feedback link block from both announcement docs (replaced with normal bug-report / feature-request links) Rename: - docs/preview-v0.8.0-announcement.{md,zh-CN.md} -> docs/v0.8.0-announcement.{md,zh-CN.md} so the on-disk doc title reads as a 0.8.0 announcement, not a branch-specific one. No other repo file referenced the old paths. The preview/v0.8.0 label and branch themselves are intentionally untouched — those are separate ops the maintainer will decide on later. This PR only removes mentions inside the repo. * chore: keep 0.8.0 preview-feedback template as a chooser-level ad The previous commit deleted preview-v0.8.0-feedback.yml entirely. Bring it back, but reframe it: it's now the dedicated 0.8.0 lane in the issue chooser — a high-visibility surface that tells visitors "0.8.0 is here as a preview, please share what you noticed." - Renamed in the chooser to "Open Design 0.8.0 — preview feedback" - Title prefix shortened from "[preview/v0.8.0] " to "[0.8.0] " so the branch slug no longer leaks into issue titles - label preview/v0.8.0 still auto-applied (the label entity is still in use across 26 issues; maintainer will decide on its fate separately) - Area dropdown widened from "Skills + Automations" to cover the actual 0.8.0 surface (plugins, headless, agent flow, desktop shell) - Intro body rewritten to read as a preview-release ad, not a feature-branch tester request Announcement docs (English + Chinese) also routed their "open an issue" CTA back through this template instead of the generic bug-report / feature-request links — same advertising goal.	2026-05-15 22:37:04 +08:00
lefarcen	e40399d39a	Merge pull request #1832 from nexu-io/sync/main-into-preview-v0.8.0 Release preview/v0.8.0 into main	2026-05-15 20:44:27 +08:00
lefarcen	64139db375	ci(diag): capture packaged-linux runtime logs into the headless artifact Currently the artifact only contains vitest.log + the tools-pack build log. The packaged daemon's own stdout/stderr lands in $RUNNER_TEMP/tools-pack/runtime/linux/namespaces/<ns>/logs/desktop/latest.log, outside the artifact path, so when 'headless-root.json not written within 35s' fires we have no signal on what the spawned child actually did. Copy the runtime logs/ and runtime/ subdirectories into the report dir before upload so the artifact captures daemon/web sidecar startup output, identity markers, and per-app state JSON.	2026-05-15 19:58:21 +08:00
lefarcen	41bae9b7a5	ci: mark packaged-linux-headless smoke as non-blocking until Linux ships The v0.8.0 launch doc explicitly lists Linux as "coming soon" — the packaged Linux headless runtime is known-incomplete in this release. Keep the smoke job running so we keep collecting signal, but don't block PRs on it until the Linux client lands. Re-enable (drop continue-on-error) once the Linux packaged client is ready in a future release.	2026-05-15 19:45:23 +08:00
ashleyashli	772ef97476	feat(landing): automate blog indexing monitoring (#1825 ) * feat(landing): add blog indexing automation Automate supported blog discovery checks through sitemap submission, URL Inspection monitoring, IndexNow notifications, and guarded SEO CI checks. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(landing): support oauth for blog indexing Use OAuth refresh-token auth as the preferred Search Console path while keeping service-account auth as a fallback, so the indexing workflows can run despite GSC service-account invite issues. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(landing): tighten blog indexing observability Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-15 18:32:30 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
nettee	30821f3a73	fix(ci): let metrics PRs trigger required checks (#1801 )	2026-05-15 17:01:10 +08:00
Olin Hendershot	74637f1cb5	Add Linux packaged client parity smoke coverage (#1204 ) * docs: plan linux client issue 709 * fix: complete linux headless lifecycle routing * feat: add linux packaged inspect * test: add linux headless packaged smoke * ci: add linux headless packaged smoke * ci: smoke linux AppImage release artifacts * docs: document linux packaged client status * chore: finalize linux client audit remediation * docs: add linux client publication packet * test: harden linux client smoke coverage * ci: preserve linux smoke audit evidence * refactor: consolidate linux e2e helpers Move pathExists and the desktop/web/daemon app-key array out of linux.spec.ts into linux-helpers.ts, where expectPathInside and linuxUserHome already live. Keeps the spec file focused on tests and the helpers file as the canonical home for shared Linux e2e utilities. * fix: move linux e2e helpers to lib * fix: address linux release review blockers * fix: drop npm dependency from containerized linux build writeAssembledApp() previously called runNpmInstall() which executed `npm install` directly. Inside the containerized build path, electronuserland/builder:base strips npm/npx/corepack, so the inner tools-pack build would fail at the assembled-app install step. Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets the env to the standalone pnpm binary it bootstraps, and the new resolveProductionInstallCommand helper consumes that env to run `<bin> install --prod --no-lockfile --config.node-linker=hoisted`. Host invocations with no env set keep the prior npm behavior. --config.node-linker=hoisted preserves the flat node_modules layout that electron-builder packs the same way as npm-installed trees. New tests cover the resolver branches and assert the docker-arg-to- resolver chain end-to-end so reviewers can see the container's inner build receives the env that switches its install away from npm. * fix: harden linux container bootstrap * fix: validate desktop marker liveness in headless cleanup cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping. Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.	2026-05-15 16:38:29 +08:00
lefarcen	75498838a9	chore: align issue templates to preview/v0.8.0 naming (#1723 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 3s Details ci / Validate workspace (push) Has been skipped Details landing-page-ci / Validate landing page (push) Failing after 1s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details nix-check / build (push) Failing after 1s Details Following the rename of the feature branch from preview/0.8.0 to preview/v0.8.0 (to match the release/v0.7.0 convention), update all issue-template references so the label, filename, and deep-link URL stay consistent. Changes: - git mv preview-0.8.0-feedback.yml → preview-v0.8.0-feedback.yml - update labels reference, title prefix, display name, body copy - update version placeholder example to 0.8.0-preview.2 (current build) - update cross-references in bug-report.yml and feature-request.yml - update config.yml first contact_link URL + about text	2026-05-14 23:21:37 +08:00
PerishCode	4f15c33595	Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0	2026-05-14 21:10:03 +08:00
sukumarp2022	9218fd649e	feat(ui): add copy to clipboard functionality for user messages with … (#1669 ) * feat(ui): add copy to clipboard functionality for user messages with localization support * fix(web): use setTimeout instead of window.setTimeout for correct Timeout type * docs: add copy prompt button screenshot for PR #1669 * docs: add copy button hover screenshot for PR #1669 * docs: add copy button copied state screenshot for PR #1669 * fix(ui): reset button border/background on copy prompt button The .user-copy-btn inherited border and background from the base button CSS, rendering as a bordered gray box instead of a clean icon overlay. This was especially visible in the Electron desktop app. Add border: none and background: none to the button, and a subtle hover background for feedback.	2026-05-14 20:19:20 +08:00
lefarcen	4693ddb00d	chore: add issue templates (bug, feature, preview/0.8.0) + chooser config (#1708 ) * chore: add issue template for preview/0.8.0 feedback Adds a guided issue form so community testers of the preview/0.8.0 branch (Skills tab + Automations) can submit structured feedback. The template auto-applies the preview/0.8.0 label, which lets maintainers filter all preview-related reports in one view: https://github.com/nexu-io/open-design/issues?q=is%3Aopen+label%3A%22preview%2F0.8.0%22 * chore: add generic bug-report issue template Pairs with the preview/0.8.0 template added in the previous commit. Until now the repo had no issue templates at all, which meant New Issue opened a blank textarea by default. The bug-report template: - Pre-applies the 'bug' label - Guides users through repro steps, version, platform, logs - Includes a callout pointing preview/0.8.0 testers to the dedicated feedback template so the two flows stay separate * chore: add feature-request template + chooser config Rounds out the issue-template basics: - feature-request.yml — 'what problem are you trying to solve' framing, willing-to-contribute dropdown so maintainers can route PRs - config.yml — disables blank-issue entry, redirects Q&A / Ideas / Show-and-tell / general chat to Discussions, points preview/0.8.0 reporters at the dedicated template After merge, the chooser at /issues/new/choose will be: Template 1. 🐛 Bug report 2. 💡 Feature request 3. 🧪 Preview 0.8.0 feedback Contact → Preview 0.8.0 feedback (dup, easy-access) → Ask a question (Discussions Q&A) → Discuss an idea (Discussions Ideas) → Show what you've made (Discussions Show-and-tell) → General discussion (Discussions General)	2026-05-14 20:13:36 +08:00
ashleyashli	1e9bcbf20d	fix(contributor-bot): serialize runs to avoid state.json races and duplicate cards (#1707 )	2026-05-14 20:01:13 +08:00
PerishCode	43b1b94c8e	Add preview release channel	2026-05-14 19:15:16 +08:00
PerishFire	3fa12f71be	Add release preview workflow placeholder (#1705 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 11s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 2s Details	2026-05-14 18:55:08 +08:00
PerishCode	5b1e49ac8a	Add release preview workflow placeholder	2026-05-14 18:49:05 +08:00
PerishCode	cba8bf151d	chore: align namespace lifecycle packaging	2026-05-14 16:35:46 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
shangxinyu1	2976c76fc3	test: expand Memory and Routines coverage (#1521 ) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows	2026-05-14 14:48:40 +08:00
lakatos	51d1c4e287	ci: skip upstream-only workflows on forks (#1586 )	2026-05-14 14:27:23 +08:00
Nagendhra Madishetti	ff569fa50c	feat(daemon): Critique Theater Phase 16 (M-phase rollout ratchet + /api/critique/conformance) (#1499 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' feat(daemon): Critique Theater Phase 16 (M-phase rollout ratchet) The PR that takes the rollout out of operator-flips-env-vars-by-hand and into the-fleet-conformance-numbers-decide. Stacks on Phase 12 (#1485): the ratchet reads from the conformance harness's daily output, which only exists once Phase 12's metrics + history surface land. Five surfaces: 1. apps/daemon/src/critique/ratchet.ts (new) Pure evaluator. Takes the current RolloutPhase plus a rolling window of ConformanceDay rows and returns one of three decisions: promote, hold, or demote. Spec defaults (14-day window, 0.90 shipped, 0.95 clean-parse) match specs/current/critique-theater.md. Demote floor is half the promote threshold so a single noisy day does not bounce the rollout back; only sustained breakage walks things back. M0 cannot demote and M3 cannot promote, both collapse to hold with an explicit reason string. 2. apps/daemon/src/critique/conformance-history.ts (new) JSON-lines persistence at dataDir/conformance/adapter/date.jsonl. Append-only writer + windowed reader. Last entry per (adapter, date) wins so a retry-after-failure cron writes the right answer without a read-modify-write at write time. Malformed lines, missing files, and missing adapter directories all collapse to skip-this-row since a missing day is data missing, not data wrong. 3. apps/daemon/src/server.ts GET /api/critique/conformance returns { window, decision }. Tunables come from query string (windowDays, shippedThreshold, cleanParseThreshold) with spec defaults. The recommendation does not auto-flip OD_CRITIQUE_ROLLOUT_PHASE; an operator-driven follow-up consumes the JSON and decides whether to flip or alert. 4. .github/workflows/critique-conformance.yml (new) Nightly cron at 03:00 UTC. Builds the daemon, drives the conformance harness against the synthetic-good and synthetic-bad fixtures, and uploads the .od/conformance/ snapshot as a workflow artifact. The schedule sits outside the busy generation window so the cron does not contend with user runs for adapter rate-limit budgets. 5. apps/daemon/tests/critique-ratchet.test.ts + critique-conformance-history.test.ts 17 cases. Ratchet: 10 cells of the promote / hold / demote matrix. History: 7 round-trip cases. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 17 / 17 new tests green - Phase 12 metrics + logging + adapter-degraded + conformance + rollout suites (29) untouched and still green * fix(daemon): three Phase 16 review findings (Codex P1/P2 + lefarcen P1 x3) 1. Duplicate parseRolloutPhase import in server.ts. The new standalone import collided with the existing grouped import; ESM would fail to parse at module load on every daemon startup path. Removed the standalone import; the grouped one already exports parseRolloutPhase. 2. Validation gap in evaluateRollout. A request like ?windowDays=0 fed passingDays >= windowDays = 0 >= 0 = true, returning promote with zero observed days. Now the evaluator rejects non-positive windowDays and out-of-range thresholds at the function entry with an explicit hold reason. The route also clamps query strings before they reach the evaluator (belt + suspenders so a future caller bypassing the route hits the same defense). 3. Missing nightly runner. The workflow called apps/daemon/src/critique/__fixtures__/run-nightly.ts, which the prior PR did not actually add, and \|\| echo masked the failure. Added the runner: drives every synthetic adapter through runAdapterConformance, walks the resulting events for parser_warning to compute cleanParseRate, and writes one ConformanceDay row per adapter via appendConformanceDay. Removed the \|\| echo mask so the workflow fails loudly when the runner throws. Tests for the validation fix: four new ratchet cases (windowDays=0 holds with no evidence, windowDays=-7 holds, shippedThreshold > 1 holds, cleanParseThreshold < 0 holds). Ratchet suite goes from 10 -> 14 cases. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 33 / 33 critique tests green (14 ratchet, 7 conformance-history, 7 adapter-degraded, 5 conformance) * test(daemon): explicit NaN regression cases for the ratchet evaluator (PerishCode follow-up on PR #1499) The Number.isFinite() guard already rejects NaN on every numeric input, so this is belt-and-suspenders: pinning the behavior so a future refactor of the guard (a typed parser, a clamp helper, a relaxed range check) cannot accidentally let NaN through and surface a zero-evidence promote signal. Three new assertions inside one case (windowDays=NaN, shippedThreshold=NaN, cleanParseThreshold=NaN), each asserting hold + the matching 'invalid X' reason string. Ratchet suite goes from 14 -> 15 cases. * fix(nix): regenerate lockfile + pin pnpmDepsHash for prom-client + @opentelemetry/api (lefarcen P1 on PR #1499) --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-14 11:05:57 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
lefarcen	5172e37217	Merge origin/main into release/v0.7.0 to prepare merge-back PR Resolves 7 conflicts via hybrid strategy: - apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature) - apps/web/src/components/Icon.tsx: take main (switch-case refactor) - apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance) - apps/web/src/index.css: take main (project-target-platforms / instructions chip styles) - apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion - nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics call sites), and #1540 (release light background) are preserved via auto-merge.	2026-05-13 18:19:47 +08:00
lefarcen	6341b2677a	docs(pr): require user-perspective description and surface area (#1520 ) * docs(pr): require user-perspective description and surface area The previous template asked for Summary + Validation, which encouraged code-perspective descriptions and let user-visible surface changes slip past review unnoticed. Replace with: - "Problem" — issue link + motivation - "What users will see" — first-person user-visible effect - "Surface area" — 9-item checklist (UI, shortcut, CLI/env, API/contract, extension point, i18n, top-level dependency, default behavior change, none) - "Screenshots" — required when UI surface is checked, focused on the entry point users discover rather than the feature in isolation - "Validation" — kept, retitled away from "Summary" Authoritative rules added to AGENTS.md under a new "Pull request expectations" section so external contributors' agents (Claude Code, Cursor, etc.) pick up the requirement when reading the repo. CONTRIBUTING.md gets one pointer line in "Commits & pull requests"; localized CONTRIBUTING variants (zh-CN, de, fr, ja-JP, pt-BR) are left for follow-up translation PRs per the existing docs-update workflow. The existing "Fixes #" prompt is preserved verbatim — that template addition from #1263 enforces PR-to-issue auto-linking and stays load-bearing. * docs(pr): broaden dependency surface to dev deps as well The "New top-level dependency" checkbox narrowed scope to runtime deps, but CONTRIBUTING.md L239 says "No new top-level dependencies" without limiting to runtime, and the AGENTS.md rule uses the same broad phrase. A new devDependency (tool/test/build package) belongs in the same bytes-vs-benefit explanation, so the checklist item should match. * docs(pr): add Why and Bug fix verification; scope deps to root package.json Three follow-up tweaks from review feedback: 1. Rename `## Problem` to `## Why` with a broader prompt that asks contributors to cover both their own use case (what made them write this PR today) and the pain being addressed. The old "Problem" framing only covered user-facing motivations and left no slot for the contributor's stake — a key signal for triaging external PRs. 2. New `## Bug fix verification` section between Screenshots and Validation, conditional on the PR being a bug fix. Surfaces the AGENTS.md "Bug follow-up workflow" red-spec requirement at PR-authoring time instead of leaving it implicit; asks for the test path and the red-on-main / green-on-branch confirmation. 3. Clarify the "New top-level dependency" checkbox to specify the root `package.json`. Without that word, contributors in a monorepo could read the check as applying to any workspace `package.json` (e.g. adding `react` to `apps/web/package.json` would be in scope) when CONTRIBUTING.md L239's "small on purpose" rule clearly meant root-level deps only. AGENTS.md `## Pull request expectations` and CONTRIBUTING.md's pointer line are updated to match the new section names and add the bug-fix red-spec expectation.	2026-05-13 15:28:05 +08:00
lefarcen	e1bc83a476	feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428 ) * feat(analytics): scaffold PostHog product-analytics integration - Add @open-design/contracts/analytics subpath with the 17 P0 event payload types, header constants, and code↔CSV enum mapping helpers. - Add apps/daemon/src/analytics.ts with env-gated posthog-node client, request-scoped analytics context reader, and artifact-id anonymizer. - Expose GET /api/analytics/config so the web bundle never embeds the PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST. - Add apps/web/src/analytics module (identity + lazy posthog-js client + React provider) and mount it under <I18nProvider> in app/layout. No event wiring yet — that lands in the next commit alongside trigger points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer, runs.ts). * feat(analytics): wire app_launch, home_view, home_click, project_create_result - App.tsx: fire app_launch once after first effect tick. handleCreateProject now emits project_create_result on both success and failure paths. - EntryView.tsx: home_view (page) gated on agents loading so has_available_cli isn't transiently false; home_view (asset_panel) fires per top-tab change with the right result_count. - NewProjectPanel.tsx: home_click create_button fires before delegating to the parent; a fresh request_id is generated here and threaded through onCreate so the matching project_create_result stitches via $insert_id. - contracts/analytics: tighten createTabToTracking and topTabToTracking for the worktree branch's renamed tabs (live-artifact, templates). * feat(analytics): wire settings_view + 3 settings_click events - settings_view fires on dialog mount and on every section switch, carrying the active section (mapped via settingsSectionToTracking for the 16-section worktree layout), execution_mode, and the selected CLI provider id when present. - settings_click execution_mode_tab: setMode now emits before/after values whenever the user toggles between Local CLI and BYOK. - settings_click cli_provider_card: agent card onClick reports cli_provider_id via agentIdToTracking (kiro → other). - settings_click byok_field: onFocus added to api_key, model select, and base_url inputs; provider_id widened to include google so the worktree's Gemini protocol slot type-checks. * feat(analytics): wire studio_view + studio_click chat, studio_view artifact - packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper produces a 16-hex anonymized id for (projectId, fileName). Stable cross-platform so the daemon and the web bundle resolve the same id without a Web Crypto round-trip; daemon now re-exports it. - ChatComposer: studio_view chat_panel fires once per project mount, studio_click chat_composer fires on attachment + send buttons with estimated user_query_tokens (length/4) and has_attachment. - FileViewer: studio_view artifact fires once per (project, file) at the dispatcher level, before any sub-viewer renders, with artifact_kind derived from the renderer registry / file.kind table. - Widen TrackingExportFormat to include markdown and cloudflare_pages so the worktree branch's full share menu can emit verbatim. * feat(analytics): wire studio_click share_option + artifact_export_result HtmlViewer's share menu now emits both events per click via a fireShareExport helper: - studio_click share_option fires immediately on click with the chosen export_format and a fresh request_id. - artifact_export_result fires when the export resolves — success for sync exporters (html, markdown, template) the moment the call returns, success/failed for async exporters (pdf, zip, deploy) via .then/.catch. The same request_id threads both events so PostHog stitches click → result via $insert_id. DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages slots; markdown is now a first-class export_format value. Also ignore .env.local so local POSTHOG_KEY / .env-style secrets don't get committed. * feat(analytics): emit run_created and run_finished from the daemon POST /api/runs now reads the analytics context off the x-od-analytics-* headers the web client sets on every fetch, then: - Captures run_created with project_id, conversation_id, run_id, model_id, agent_provider_id (mapped via agentIdToTracking), skill_id, design_system_id, plus the token_count_source marker. - Schedules a run_finished capture on runs.wait(run) resolution, mapping succeeded/canceled/failed to success/cancelled/failed and reporting total_duration_ms. Both events use a stable insert_id derived from the same uuid so PostHog dedupes the daemon-side mirror against any future web-side capture without double-counting. Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay omitted in v1 — the claude-stream parser only exposes input/output totals today. See tracking-doc-issues.md §3.2. * feat(analytics): emit settings_cli_test_result + settings_byok_test_result The original BLOCKING-list assumed these CSV P0 events were not implementable in this branch because main lacked Test buttons. The worktree HEAD actually wires `handleTestAgent` and `handleTestProvider` in SettingsDialog, so both events are now in scope. - handleTestAgent emits settings_cli_test_result on success and failure paths with cli_provider_id mapped via agentIdToTracking, result drawn from result.ok / catch branch, error_code from result.kind or the thrown error name, and duration_ms timed via performance.now(). - handleTestProvider emits settings_byok_test_result analogously, using apiProtocol (anthropic\|openai\|azure\|ollama\|google) directly as provider_id — wider than the CSV's 5-value enum, documented in tracking-doc-issues.md §2.5. Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps plus matching track* helpers. AnalyticsEventName union now covers all 14 P0 events this branch supports. * feat(analytics): gate PostHog on the existing telemetry.metrics consent The integration now reuses the same first-launch privacy banner + Settings → Privacy toggle that gates Langfuse, so a single user decision controls both telemetry sinks. - /api/analytics/config now consults the persisted AppConfigPrefs: it returns enabled=true only when POSTHOG_KEY is set AND the user has chosen "Share usage data" (telemetry.metrics === true). The response also echoes installationId so the web client uses the same anonymous id Langfuse keys off of — one identity per install, shared across both sinks. - Web AnalyticsProvider: - Bootstrap fetch resolves installationId and threads it through the x-od-analytics-anonymous-id header on every /api/* fetch, so daemon-side captures (run_created / run_finished / project_create_result) land on the same person record. - Exposes a setConsent(granted) method that calls posthog-js's opt_in_capturing / opt_out_capturing, wired from App.tsx via a useEffect watching config.telemetry?.metrics. Toggling Privacy → metrics now stops/resumes events immediately, no reload. - app_launch additionally gates on telemetry.metrics so a freshly- declined user fires nothing, and a freshly-opted-in user fires on the next reload. * feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env Wires PostHog product analytics through the same Langfuse-style build- secret pipeline so official Open Design builds ship with the key while fork builds compile without it (the integration short-circuits cleanly when POSTHOG_KEY is absent). tools/pack - resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from process.env at packaging time, validates them (no whitespace in the key, http(s) URL for host, trailing-slash strip), and stamps them on ToolPackConfig. Fork builds without the env vars simply omit the fields; the daemon-side gate keeps things off in that case. - Mac, Windows, and Linux packaged-config writers each append the two fields to open-design-config.json next to the existing telemetryRelayUrl entry. apps/packaged - RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost so the Electron entry and headless entry both forward them to the daemon sidecar. - buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into the daemon child env when present. The daemon's existing analytics module reads these via process.env — no daemon-side changes needed. - The headless packaged path falls back to process.env for fields the builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL is read there. CI - release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret) and POSTHOG_HOST (var) at workflow-env scope so every packaging job inherits them. PR / fork builds without these set simply skip the bake step. Tests - tools/pack: config.test.ts covers bake-through, fork-build omission, whitespace rejection, invalid-URL rejection, and trailing-slash normalization. - apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv forwarding the keys when present and omitting them when null. * feat(analytics): enable PostHog autocapture + perf + exceptions Flip on the PostHog SDK's automatic diagnostic features so we capture click paths, page transitions, web vitals, dead clicks, and browser exceptions without scattering instrumentation through the codebase. Privacy defense lives in one place — apps/web/src/analytics/scrub.ts — wired in via posthog-js's `before_send` hook so every outgoing event passes through the same audit point: - $autocapture / $rageclick / $dead_click / $copy_autocapture: strips $el_text and value/placeholder/aria-label attrs from any input, textarea, password input, or contenteditable element. PostHog autocapture does not capture input.value by default, but $el_text on a <textarea> reflects the typed content — that's the prompt body for us, so it has to be scrubbed every time. - $pageview / $pageleave: drops query string and fragment from $current_url / $referrer so any future ?q=… can't leak. - $exception: rewrites file:// and absolute filesystem paths in stack frames to app://apps/<repo-relative> so we don't ship the user's home directory. - Suppresses $opt_in entirely — duplicate of our explicit setConsent toggle in App.tsx. Element-level defense in depth is limited to the single most sensitive surface: the chat composer textarea gets `ph-no-capture` so PostHog never even generates an event for clicks inside that subtree. Every other input relies on scrub.ts — sprinkling the class through every form would be noisy and easy to forget on new surfaces. The existing Privacy → "Share usage data" toggle continues to gate every new feature: posthog-js's opt_out_capturing() halts autocapture, $pageview, $exception, web vitals, and dead clicks alongside the explicit capture() calls — one global switch. 11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts. * ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions Adding posthog-js to apps/web and posthog-node to apps/daemon changed pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by sha256. The CI nix flake check failed with: specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc= got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s= Copy the new hash into both nix/package-web.nix and nix/package-daemon.nix per the procedure documented in nix/README.md §"First-build hash pinning". * feat(analytics): unify PostHog identity with Langfuse installationId PostHog's distinct_id is the installationId stamped by /api/analytics/ config; Langfuse already reads the same id off app-config.json to populate trace.userId. With both sinks keying off the same anonymous identity, dashboards can correlate user actions (PostHog events) with LLM runs (Langfuse traces) without re-identifying. Two gaps closed: 1. applyConsent(false) — clear posthog-js's persisted ph__posthog localStorage entry on opt-out via posthog.reset(). Without this, a user who opts out, then clicks Delete my data, then re-opts in would see PostHog stitch their new session to the deleted identity because bootstrap.distinctID only takes effect on first init. 2. applyIdentity(newInstallationId) — Delete my data rotates the installationId in app-config; App.tsx now watches config.installationId and calls posthog.reset() then identify(newId) so the next event batch is fully decoupled from the deleted one. Idempotent on same-id re-renders so benign config refreshes don't churn PostHog identities. The fetch wrapper's x-od-analytics-anonymous-id header also flips to the new id on rotation so daemon-side captures (run_created / run_finished) land on the same person record from the very next API call, not after a reload. The end-to-end rotation flow is verified against a live PostHog project; these unit tests pin the safety guards (no-client paths, null inputs) since stubbing posthog-js's init-loaded callback chain is brittle. fix(langfuse): require both metrics AND content consent for trace reports Tightens the Langfuse gate so a user who shares anonymous metrics but NOT conversation content stops emitting Langfuse traces entirely — Langfuse is used for turn-quality evals which only make sense with prompt/output bodies. PostHog (product analytics, content-free) stays gated on `metrics` alone and is unaffected. i18n: "Conversation content" → "Conversation and tool content" with hints expanded to mention tool inputs/outputs so the consent surface matches what the trace actually carries (en + zh-CN). Bundled here per PR scope — change originated outside this PostHog PR but lands cleanly on the same files; gating Langfuse strictly on `content` makes the dual-sink consent model (PostHog = metrics, Langfuse = metrics + content) symmetric across both i18n locales and the daemon-side gate. * feat(analytics): wire byok_provider_option + fix PR review P1s Adds the BYOK protocol-chip click event (5-value provider_id mirroring the apiProtocol Settings UI) and resolves four P1 review threads on PR #1428. byok_provider_option: - New SettingsClickByokProviderOptionProps in contracts (provider_id = anthropic\|openai\|azure\|google\|ollama; maps to CSV's 5 values per tracking-doc-issues.md §2.5). - trackSettingsClickByokProviderOption helper in apps/web/src/analytics. - SettingsDialog hooks it on the protocol-chip onClick alongside the existing setApiProtocol call; is_selected reflects whether the chip was already active. Review fixes: 1. client.ts (Siri-Ray): clear `initPromise` when the resolution is null so a Privacy → metrics opt-in after a previous decline triggers a fresh /api/analytics/config fetch. Without this, the disabled response was cached forever — first-session opt-in needed a reload to start sending PostHog events. 2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a strict same-origin + /api/ pathname check (shared `isSameOriginApiCall` helper). Outbound third-party URLs containing `/api/` (e.g. provider.example.com/api/x) no longer receive our x-od-analytics-* headers. 3. provider.tsx (codex-connector, lefarcen): gate header injection on `resolvedAnonId` being non-null. When Privacy → metrics is off, /api/analytics/config returns enabled=false → resolvedAnonId stays null → wrapper never installs → daemon can't read consent-bearing headers → no daemon-side PostHog event. setConsent now also clears resolvedAnonId on opt-out and re-fetches on opt-in. 4. daemon/analytics.ts (defense in depth): createAnalyticsService now takes dataDir and capture() re-reads app-config to check telemetry.metrics inside the fire-and-forget wrapper. Even if a stale header somehow reaches the daemon after opt-out, the capture is dropped before posthog-node.capture is called. * fix(web): place "Share usage data" on the right in privacy consent banner Swap button order in PrivacyConsentModal and the in-settings ConsentCard so the affirmative "Share usage data" lands on the right and "Not now" on the left. Matches the OK-on-the-right pattern users expect for primary actions. Both buttons keep equal visual prominence (same .privacy-consent-action styling) so the swap doesn't change the EDPB equal-prominence stance called out in the original Langfuse telemetry spec. * feat(analytics): populate run_finished token totals from claude-stream usage Daemon's claude-stream parser already emits agent usage events with input_tokens / output_tokens totals; the run service buffers them in run.events and Langfuse reads them out the same way. The run_finished PostHog event was leaving these fields empty. Scan run.events for the most recent agent usage frame on terminal transition and emit input_tokens / output_tokens / total_tokens when present. token_count_source flips to 'provider_usage' only when at least one count landed; runs without provider-side usage data keep 'unknown'. Provider does not break the input down into the 7 sub-fields the tracking doc lists (memory / context / attachment / system_prompt / …); those stay omitted until a parser change exposes them. * feat(analytics): estimate user_query_tokens from prompt length The user_query_tokens field for run_created / run_finished was hardcoded to 0. We can't tokenize without bundling a model-specific tokenizer, but the character/4 heuristic is the industry-standard estimate when one isn't available and is enough for funnel analysis (prompt-length cohorts, short-vs-long-query conversion rates). Extracted from req.body via the same telemetryPromptFromRunRequest pattern the daemon already uses for langfuse-bridge (currentPrompt then message fallback). Only the integer count goes to PostHog — the prompt text itself never leaves the daemon. token_count_source flips appropriately: - run_created with a prompt: 'estimated' (was 'unknown') - run_created with no prompt: 'unknown' - run_finished with provider usage: 'provider_usage' (overrides baseProps' 'estimated' value) - run_finished without provider usage: inherits 'estimated' or 'unknown' from baseProps so input/output absent doesn't mask the estimate.	2026-05-12 22:32:42 +08:00
Joey-nexu	5077a1cd38	feat(landing-page): split catalog into per-facet pages + auto-deploy on content changes (#1158 ) * feat(landing-page): split catalog into per-facet pages + auto-deploy on content changes Convert the single-page landing into a content-driven multi-page site sourced directly from the canonical Markdown bundles in the repo root, and close the deploy loop so contributor edits go live without manual follow-up. ## What's new - `/skills/`, `/systems/`, `/craft/`, `/templates/` index + detail pages, generated from `skills/<slug>/SKILL.md`, `design-systems/<slug>/DESIGN.md`, `craft/.md`, and `templates/live-artifacts/<slug>/README.md` via Astro content collections (`app/content.config.ts`). No mirroring of content into the landing-page package — `glob` re-scans on every build. - Faceted sub-routes generated from frontmatter: - `/skills/mode/<slug>/` — 8 pages (deck, prototype, image, …) - `/skills/scenario/<slug>/` — 18 pages after alias collapse - `/systems/category/<slug>/` — 21 pages Each page owns its own `<title>`, meta description, and `CollectionPage` JSON-LD; chips on the parent index pages are now real anchors that link to these facet routes. - Updated top-bar nav (`_components/header.tsx`) to point at the new internal routes with live counts pulled from the catalog. Counts in the homepage hero meta description likewise driven by `getCatalogCounts()` so they never drift. - Per-skill / per-template thumbnails. A Playwright generator (`scripts/generate-previews.ts`) walks every `example.html` and `templates/live-artifacts/<slug>/index.html`, screenshots them at 1440×900@2x, and writes PNGs to `public/previews/`. The catalog data layer auto-detects presence and degrades gracefully when an artifact has no renderable HTML. ## Plumbing the auto-update loop - `landing-page-deploy.yml` and `landing-page-ci.yml` now trigger on changes under `skills/`, `design-systems/`, `craft/`, and `templates/`. Without this, a contributor adding a new SKILL.md to `main` would silently skip the deploy and the published site would fall behind. - Both workflows now install Playwright Chromium (cached by version) and run `pnpm previews` before `astro build`, so generated thumbnails ship in `out/previews/` automatically. Preview generation is `continue-on-error: true` — a single broken example.html should not block the deploy of the rest of the catalog. - `apps/landing-page/public/previews/` is gitignored: the directory is owned by CI and would otherwise add ~70MB of binary churn to the repo on every regeneration. ## Tag canonicalization - `app/_lib/catalog.ts` adds a small per-scope alias table so authoring drift like `od.scenario: operation` vs `operations`, or `live` vs `live-artifacts`, collapses to a single canonical route instead of leaking two near-empty pages. Mode and category alias tables are scaffolded but currently empty. ## Validation - `pnpm --filter @open-design/landing-page typecheck` — 0 errors, 0 warnings, 0 hints across 25 Astro files - `pnpm --filter @open-design/landing-page build` — 341 pages built (1 home + 8 mode + 18 scenario + 21 category + N detail pages + sitemap + RSS), zero external JS, ≥16 Cloudflare-resized hero image URLs intact ## Why this matters After merge, any push to `main` that adds, removes, or edits a skill, design system, craft principle, or live-artifact template automatically triggers a fresh build that: 1. picks up the new Markdown via the content-collection glob, 2. regenerates thumbnails for any matching example.html, 3. emits new sitemap entries and JSON-LD, 4. and ships to Cloudflare Pages — no landing-page-side change required. fix(landing-page): address review feedback on PR #1158 Five fixes from the review pass — none change scope, all close the "contradictory totals" / "stale data" / "silent CI failure" gaps the reviewers flagged. ## Hero / catalog claims now read live counts everywhere `apps/landing-page/app/page.tsx` previously hardcoded `31` skills and `72` systems in the hero copy and stat rings, while the nav and meta description had already moved to `getCatalogCounts()`. After this PR every visible "X skills / Y systems" claim — hero lead, hero stat rings, capabilities cards body copy, labs section meta + filter pills, selected-work fractions, the labs CTA, and the footer Library — reads from a single `counts` prop. `Header` and `Page` now both require `counts` (no optional fallback) so a future caller can never silently publish stale numbers. The labs-section filter pills also stop being decorative buttons: they now link to the actual `/skills/mode/<slug>/` and `/skills/` catalog routes the new multi-page architecture exposes. ## Craft README no longer publishes `apps/landing-page/app/_lib/catalog.ts` filtered out `e.id !== 'README'`, but Astro normalizes `craft/README.md`'s id to lowercase `readme`, so the published site shipped `/craft/readme/` as a public craft principle and the nav badge counted 12 instead of 11. Compare case-insensitively (`e.id.toLowerCase() !== 'readme'`) so any future README casing is also filtered out. Verified locally: `apps/landing-page/out/craft/` now contains exactly 11 entries. ## Preview URL preserves actual file extension `listPreviews()` was already discovering `.png`, `.webp`, `.jpg`, and `.jpeg`, but `previewUrlFor()` always emitted `.png`, so a future sharp/webp post-processor (or a manually committed template asset) would mark the record as available while the rendered `<img src>` 404'd. Switched the structure from `Set<slug>` to `Map<slug, filename>` and emit the actual on-disk filename verbatim. ## Preview script: per-artifact soft, systemic hard Previously any single failed `example.html` capture exited the script non-zero, which forced both workflows to mark the entire preview step `continue-on-error: true`. That blanket tolerance also masked systemic generator failures — a chromium launch that never finds the browser binary would silently ship a deploy with zero thumbnails. `scripts/generate-previews.ts` now distinguishes: - per-artifact failures → logged and skipped, exit 0 (catalog degrades gracefully for those skills), - discoverJobs / chromium.launch / 100%-failure run → exit 1 (systemic, must fail the build). Both workflows drop their `continue-on-error: true` flags so a real problem actually surfaces. ## AGENTS.md reflects the multi-page architecture `apps/landing-page/AGENTS.md` previously declared the landing page single-route ("Not multi-page. There is exactly one route ('/')"). That guidance is now wrong — there are six top-level route groups (`/`, `/skills/`, `/systems/`, `/craft/`, `/templates/`, plus their facet variants). Updated to describe content-collection sourcing, the no-mirror rule, the auto-deploy workflow contract, and the "never hardcode catalog claims" boundary. ## Validation - `pnpm --filter @open-design/landing-page typecheck` — 0 errors, 0 warnings, 0 hints across 25 Astro files - `pnpm --filter @open-design/landing-page build` — 340 pages built (was 341 before the README filter; the README route is now correctly absent), live counts visible in the built `out/index.html`: `driven by 125 composable skills and 149 brand-grade design systems` - Verified `out/craft/` no longer contains `readme/` - Verified preview URLs resolve to the actual on-disk filename via the regenerated catalog index page * fix(landing-page): clean up live-artifact template name + summary parsing Address @mrcfps's follow-up review on `0715d8c`. The `shapeLiveArtifactTemplate()` parser was passing the README's H1 verbatim (literal backticks intact) and using the first non-empty post-H1 line as the summary, even when that line was the `> Category: Live Artifacts` editorial blockquote. Result: `/templates/live-otd-operations-brief/` was shipping a `<meta name="description" content=">">` and a card title with raw Markdown noise — a regression for both SEO snippets and the templates catalog at-a-glance scan. ## Two new shared helpers - `stripMarkdownInline()` — strip backticks, asterisks, and link wrappers so `# \`otd-operations-brief\` · live-artifact template` becomes `otd-operations-brief · live-artifact template` before any further trimming. - `extractFirstProseParagraph()` — walk the body after the H1 and skip blockquotes (`>`), list markers, table rows, fenced code, and HR rules. Stop at the first contiguous prose paragraph and pass it through `stripMarkdownInline()` so the result is human-readable. Both helpers live next to `titleizeSlug()` and are used by `shapeCraft()` and `shapeLiveArtifactTemplate()` so they share one implementation. ## Live-artifact title boilerplate trim Live-artifact READMEs commonly title themselves `# \`<slug>\` · live-artifact template`. After stripping the inline backticks the trailing `· live-artifact template` is redundant ("Templates" already groups them) and adds a wide noisy suffix on catalog cards. Removed it via a narrow regex tail-strip. ## Result on the existing fixture Verified locally for `templates/live-artifacts/otd-operations-brief/`: - before: `<title>\`otd-operations-brief\` · live-artifact template …</title>`, `<meta name="description" content=">">` - after: `<title>otd-operations-brief — Open Design template</title>`, `<meta name="description" content="A drop-in html_template_v1 live-artifact template for an editorial On-Time Delivery brief. It ships:">` Typecheck 0/0/0, build 340 pages. --------- Co-authored-by: Joey <joey@cursor.so> Co-authored-by: Joey-nexu <236967869+joeylee12629-star@users.noreply.github.com>	2026-05-12 19:24:50 +08:00
nettee	03da01a56f	ci: use open-design bot for contributors wall refresh (#1349 )	2026-05-12 14:35:28 +08:00
ashleyashli	a4649dacb3	fix: check contributor tiers on review and comment events (#1248 ) * fix: check contributor tiers on review and comment events Expand the contributor card workflow to run tier checks for PR reviews, issue comments, PR review comments, and discussion activity. The bot now understands pull_request_target directly, so remove the event-name shim. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: drop fork-unsafe triggers (review, issue_comment, review_comment) Per @mrcfps and @chatgpt-codex-connector review: GitHub withholds repository secrets on pull_request_review, pull_request_review_comment, and issue_comment events when they originate on forked PRs, so wiring those events here would fail-closed exactly for external contributors. Keep the fork-safe triggers (pull_request_target.closed, issues.opened, discussion., discussion_comment.) and document why the three are excluded. They can be re-added later via a workflow_run handoff. --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: qiongyu1999 <qiongyu1999@gmail.com>	2026-05-12 14:06:20 +08:00
lefarcen	43f7fc536a	Add Langfuse telemetry relay (#1296 ) * Add Langfuse telemetry relay * Configure telemetry worker custom domain * Add telemetry relay health check * Harden telemetry relay config	2026-05-12 13:59:19 +08:00
shangxinyu1	10802bb0b0	test: expand nightly UI and desktop regression coverage (#1256 ) * e2e(ui): cover examples preview flows * e2e(ui): cover Codex local CLI fallback UX * test: expand desktop and connector regression coverage * e2e(ui): cover workspace restoration flows * e2e(ui): cover retry recovery workspace flow * test: cover artifact and connector recovery flows * e2e(ui): cover Continue in CLI stale provenance flow * e2e(ui): cover BYOK model fetch caching * test: expand Orbit and desktop connector coverage * e2e(ui): cover workspace quick switcher recovery flows * e2e(ui): cover connector pending authorization recovery * e2e(ui): cover workspace and conversation restoration routes * e2e(ui): cover conversation draft and attachment restoration * e2e(ui): cover conversation history selection recovery * e2e(ui): cover workspace surface conversation selection * test: cover artifact presentation and orbit link behavior * test: cover artifact external link restoration * e2e(ui): cover root-route deep-link restoration * e2e(specs): cover Orbit open-artifact desktop click * e2e(specs): cover desktop artifact open link * test: fix Orbit settings fixture type drift * test: split Playwright critical and extended suites * test: fix ProjectView design template fixtures * ci: split workspace test stages * guard: allow split Playwright suite scripts * test: shrink Playwright critical suite * test: restore omitted Playwright suites	2026-05-11 19:23:13 +08:00
PerishFire	f2db5a749c	chore: enforce PR→issue linking discipline (#1263 ) PRs that omit Fixes #N break the release-time reverse lookup (issue → closing PR → merge sha → first containing tag), since the auto-link only fires on the explicit closing keywords. We've been doing this by hand on recent fixes; codify it so future PRs don't drift. - Add .github/pull_request_template.md with a Fixes # placeholder so the link surface is in front of the author by default. - Add a corresponding bullet to the Bug follow-up workflow in the root AGENTS.md so the discipline lives next to the methodology that produces issue-linked work.	2026-05-11 17:24:24 +08:00
PerishFire	976edaf38e	test: harden e2e smoke and release reports (#1140 ) * test: harden e2e inspect specs * test: wire e2e release reports * chore: bump packaged beta base to 0.6.1 * test: run release smoke vitest directly * test: add suite-owned tools-dev lifecycle * ci: harden stable release packaging * fix(release,e2e): gate stable signing on verify and harden suite cleanup - restore `needs: [metadata, verify]` on the stable release `build_mac`, `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple signing/notarization and Windows release builds cannot run before pnpm guard, typecheck, and layout checks complete on the metadata commit. - in `runToolsDevSuite`, drop the `started` flag and always attempt `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and when the test body succeeded, escalate the stop failure to the suite result and rethrow — so orphan daemon/web processes from an interrupted `startToolsDevWeb` or a broken shutdown can no longer pass silently. Addresses PR #1140 review feedback from lefarcen and mrcfps.	2026-05-11 13:11:16 +08:00
PerishFire	cc343f8828	ci: optimize beta release packaging cache (#1095 ) * ci: optimize beta release packaging cache * fix: version windows builder cache * fix: forward linux app version in container	2026-05-10 10:11:05 +08:00
Chris Tam	c61ba320fd	feat(nix): Add official flake with home-manager and NixOS support (#402 ) * nix: add official flake with home-manager and nixos modules * Pin pnpm version * Format README.md * Populate PATH files to discover installed CLIs * Revert "Populate PATH files to discover installed CLIs" This reverts commit 18d88781a88b8781913cf5a8b680dfb38eabf7e4. * Fix missing sqlite issue * Fix system issue * Reapply "Populate PATH files to discover installed CLIs" This reverts commit `d02ea994e6`. * Handle different ports for web frontend * Provide documentation for getting pnpm hash * Enable nix flake checks for code changes * Set `OD_WEB_PORT` on daemon when declared * fix: Fix environmentFile for macOS targets * chore: Ignore nix and direnv related files * fix: Read version directly from `package.json` * feat: Make nix shell entry prettier * chore: Update pnpm hashes * chore: Bump `pnpm` hashes * docs: Add blurb about dev shell in `README.md` * Address review comments * Add support for `OD_WEB_ORIGINS` * Fix `isLocalSameOrigin` * Update pnpm checksums * docs: Update documentation on host origins * Move allowedOrigins mapping out of the webFrontend.enable guard * fix: Bump pnpm hashes * Remove changes to `daemon` with `main` changes `main` merged a feature that addressed our need for allowed origins. Since this feature branch no longer needs it, remove any remaining changes in `daemon` code so that this is a pure Nix change. * Update documentation around `OD_DAEMON_URL` * Rewrite option docs to match same-origin proxy contract The port, webFrontend, and webFrontend.port option descriptions still described OD_DAEMON_URL as the runtime contract for the SPA, but the SPA issues relative /api/, /artifacts/, /frames/* requests and there is no runtime daemon-URL injection. Rewrite the three blocks to describe what the caddy / custom proxy must actually do. * Document daemon-side requirements for custom-server proxy paths The bring-your-own-server path in section (3) and the same-origin contract in section (4) understated what the daemon needs: any proxy whose origin differs from the daemon's bind (including loopback split-port like 127.0.0.1:8080 while the daemon stays on :7457) is 403'd by the daemon's same-origin gate until told about that origin. Add a callout under section (3)'s table, expand section (4) with a decision table covering same-port, loopback split-port (OD_WEB_PORT or webFrontend.allowedOrigins), and non-loopback (webFrontend.allowedOrigins) cases, and rewrite the webFrontend.allowedOrigins option description to enumerate the cases where it's required and surface OD_WEB_PORT as an alternative for the loopback split-port case. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-09 23:50:16 +08:00
Marc Chan	3bcb3547d1	fix(actions): map pull_request_target to contributor bot event (#1092 )	2026-05-09 21:49:13 +08:00
Cursor Agent	708f37dddb	feat(deploy): GitHub Actions workflow for multi-arch image push Plan K4 / spec §15.5 / spec §16 Phase 5. .github/workflows/docker-image.yml builds and pushes ghcr.io/<owner>/od on three triggers: - push to main → :edge + :sha-<short> + :main-<UTC> - tag vX.Y.Z → :X.Y.Z + :latest + :sha-<short> - pull request → smoke build only (no push) - workflow_dispatch → manual trigger Multi-arch via QEMU + Buildx (linux/amd64 + linux/arm64). Authenticates against GHCR via GITHUB_TOKEN with packages:write. Uses GitHub Actions cache (type=gha) to keep rebuilds fast. The build-args override to node:24-bookworm-slim that spec §15.1 nominates is intentionally NOT applied yet — the in-tree deploy/Dockerfile uses alpine + apk for build tooling, and switching the base needs the apk lines re-cast as apt-get. That's a follow-up; the canonical alpine image is functionally equivalent for v1. Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 13:39:17 +00:00
Gavin Zeng	7518cfc107	feat: add macOS Intel (x64) build support to release workflows (#759 ) * feat: add macOS Intel (x64) build support to release workflows Add build_mac_intel job to both release-beta.yml and release-stable.yml using macos-13 runners (last Intel-based GitHub Actions runner). Key changes: - release-beta.yml: add enable_mac_intel input (default false), build job, and wire into publish/verify/summary - release-stable.yml: add always-on build_mac_intel job, wire into publish (downloads + copies to GitHub Release), verify, and summary - publish.sh: add ENABLE_MAC_INTEL uploads, outputs, and metadata entry - verify.sh: add mac-intel URL verification when enabled - summary.sh: add macOS x64 (Intel) row to platform/report tables - mac-intel.sh: new asset script for unsigned DMG+ZIP production Intel builds are unsigned (like Windows). No auto-update feed. Artifact naming: open-design-<ver>.unsigned-mac-x64.{dmg,zip} Closes #746 * fix: resolve beta macIntel asset name mismatch (P1) Add MAC_INTEL_ASSET_SUFFIX to publish.sh (mirroring existing WIN_ASSET_SUFFIX / LINUX_ASSET_SUFFIX pattern) so that the beta publish job can correctly locate unsigned Intel artifacts. - publish.sh: add mac_intel_asset_suffix variable with fallback - release-beta.yml: pass MAC_INTEL_ASSET_SUFFIX: .unsigned to publish --------- Co-authored-by: ZengGanghui <zghui0@gmail.com>	2026-05-09 19:50:50 +08:00
ashleyashli	84c788e7bb	feat: add contributor card bot workflow (#932 ) * feat: add contributor card bot workflow Adds a production workflow that triggers on merged pull requests and opened issues, checks out the standalone contributor bot, and runs it with the Open Design bot GitHub App credentials. The main repository only owns the workflow trigger; card rendering and Vaunt contribution lookup remain isolated in open-design-bot-sandbox. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: run contributor bot on pull_request_target Use pull_request_target for closed merged PRs so the workflow receives repository secrets when external fork PRs are merged. The job only checks out the trusted contributor-bot repository, not contributor PR code, preserving the intended security model while allowing cards to post for fork contributors. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 19:40:07 +08:00
PerishFire	dcfab797c2	[codex] Add stable nightly promotion gate (#962 ) * Upload beta e2e spec reports to R2 * Expose beta report URLs in summary * Complete Indonesian deploy locale keys * chore: factor release workflow scripts * chore: bump packaged beta base version * test: wait for mac packaged runtime health * fix: capture mac packaged startup logs * chore: improve mac release build observability * fix: ad-hoc sign unsigned mac builds * chore: diagnose mac packaged startup * fix: relax unsigned mac launch signing * chore: improve mac launch diagnostics * chore: simplify beta mac release artifacts * fix: align packaged mac smoke launch config * fix: externalize mac daemon wasm dependency * chore: require signed stable mac releases * fix: use stable app version for nightly package builds * chore: clean release artifacts after publish * chore: publish beta reports as zip * ci: disable beta mac tools-pack cache * fix: skip mac framework binary symlinks when signing * fix: sign mac framework version bundles * ci: disable beta mac pnpm cache * chore: align stable release reports * ci: require matching nightly before stable release * ci: avoid mac pnpm cache for packaged smoke	2026-05-08 21:48:54 +08:00
Marc Chan	b06f26a5fd	test: strengthen e2e PR coverage (#796 ) * test: strengthen e2e PR coverage * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * ci: cache Windows packaged smoke builds * test: fake additional agent runtimes * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Route tools-pack mac starts through a launch-time packaged config override so portable packaged smoke runs keep using the namespace runtime root that inspect and logs expect. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Fall back to the packaged app's embedded config when the build output config is missing so installed mac starts still work. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: align packaged mac PR smoke with tools-pack runtime mode Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Keep blake3-wasm out of the packaged mac daemon prebundle so the standalone runtime loads the Cloudflare asset hasher from node_modules instead of crashing in ESM. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Skip the portable mac launch override when the bundled packaged config is missing so installed fallback app targets can still boot with packaged defaults. Add a regression test covering the missing-config start path. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(pack): remove duplicate mac prebundle dependency key	2026-05-08 16:48:10 +08:00
lefarcen	2bb029cb58	release: Open Design 0.5.0 (#820 ) 0.5.0 已从 `c21cbc6` 发布（https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0）；本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史，便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。	2026-05-08 00:41:01 +08:00
PerishFire	a383b4bd3a	Preserve beta e2e spec reports in R2 (#812 ) * Upload beta e2e spec reports to R2 * Expose beta report URLs in summary * Complete Indonesian deploy locale keys	2026-05-07 20:55:03 +08:00
PerishFire	cb92c93ae0	Migrate beta release publishing to R2 (#805 ) * Prebundle standalone web packaged runtime * Harden mac standalone prebundle policy * Prebundle mac daemon packaged runtime * Prune mac Electron locales * Maximize mac release artifact compression * Publish beta mac artifacts to R2 * Use remote R2 uploads for beta releases * Fail fast on beta R2 access issues * Use S3-compatible uploads for beta R2 releases * Decouple beta versioning from GitHub releases * Remove legacy beta metadata source * Address release beta review notes	2026-05-07 19:13:52 +08:00
PerishFire	6efac8887e	Improve Windows beta packaging and installer flow (#768 ) * Optimize Windows packaged web output * Fix packaged contracts runtime build * Optimize Windows packaged size pruning * Prune Windows root Next payload * Remove Windows bundled Node runtime * Prune Windows standalone duplicate Next * Add tools-pack cache foundation * Cache Windows packaged build layers * Cache Windows workspace builds * Cache Electron-ready Windows app * Split Windows tools-pack module * Cache Windows dir build outputs * Split Windows pack build modules * Document Windows NSIS smoke namespace limits * Move Windows NSIS smoke note to agents guide * Optimize Windows beta packaging * Bump packaged beta base version * Improve Windows installer namespace UX * Improve Windows tools-pack cache keys * Stabilize Windows beta cache version keys * Cache Windows workspace build outputs * Optimize windows release beta cache layers * Cache windows release dependencies * Trim windows release cache before save * Refresh windows tools-pack cache key * Improve windows installer preflight prompts * Fallback NSIS installer strings to English * Fix Windows installer cleanup and preflight * Improve Windows NSIS state logging * Fix system NSIS Persian language alias * Use long-path removal for Windows uninstall * Fix mac tools-pack tests on Windows * Address Windows packaging review feedback * Fix Windows installer cache namespace isolation * Include web output mode in Windows tarball cache key * Use unique Windows release cache save keys	2026-05-07 16:44:15 +08:00
Joey-nexu	9af288652c	ci: notify Discord #resolved when an issue is closed by a merged PR (#685 ) * ci: notify Discord #resolved on issue close-via-merged-PR * ci: address review feedback on Discord #resolved workflow P1: - Add contents:read permission (required by listPullRequestsAssociatedWithCommit) - Drop cross-referenced timeline fallback to eliminate false positives from plain mentions; closed-event+commit_id is now the only resolver path (also fixes the cross-repo number-collision concern Codex raised) P2: - Validate webhook URL prefix before POST (reject misconfigured secrets) - Retry on Discord 429 up to 3 times honouring Retry-After header, bounded 1..60s, with sane default if header missing P3: - allowed_mentions: { parse: [] } so issue/PR titles can't @everyone or ping roles/users in #resolved	2026-05-06 21:56:46 +08:00
PerishFire	f1cdb2844a	test(e2e): gate beta packaged runtime (#637 ) * test(e2e): gate beta mac packaged runtime * test(e2e): separate ui automation layout * test(e2e): move localized content coverage * chore(release): prepare packaged 0.4.1 beta validation * test(e2e): keep ui lane playwright-only * fix(web): keep chat recoverable after conversation load failure * fix(desktop): honor native mac quit	2026-05-06 17:44:29 +08:00
nettee	8762f06297	Add i18n structure checks (#608 )	2026-05-06 11:55:59 +08:00
lefarcen	c69dee74a5	fix(release): defer Linux artifact from 0.4.0 stable	2026-05-06 01:12:26 +08:00
PerishFire	bbdd4e84b5	chore: enforce test directory conventions (#496 ) * chore: enforce test directory conventions Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only. * ci: use guard and package-scoped tests Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases. * ci: align stable release guard check Use the new repository guard in stable release verification after replacing the residual-JS-only script. * chore: tighten test layout enforcement Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests. * chore: clarify no-emit test tsconfigs Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.	2026-05-05 15:34:22 +08:00
PerishFire	3935aeb421	Optimize packaged mac artifact size (#424 ) * optimize mac package payload reporting * optimize(pack): package standalone web runtime * optimize(pack): default to standalone web runtime * chore(release): bump beta base version * fix(pack): compress mac artifacts and report packaged version * fix(pack): preserve Next server fallback * fix(pack): clarify standalone startup failures * fix(release): gate beta platform builds * fix(web): bind standalone backend to parent * fix(pack): harden standalone and beta publishing	2026-05-05 10:37:19 +08:00
Marc Chan	653c506b10	fix(landing-page): deploy wrangler with npm (#421 ) * fix(landing-page): deploy wrangler with npm * fix(landing-page): deploy pages with pnpm dlx * fix(landing-page): deploy wrangler from app workspace Generated-By: looper 0.5.1 (runner=fixer, agent=opencode)	2026-05-04 14:02:32 +08:00
Tom Huang	6c2a8ba09f	feat(editorial-collage): introduce Atelier Zero style landing page as… (#366 ) * feat(editorial-collage): introduce Atelier Zero style landing page assets and documentation - Added new design system for Atelier Zero, including a detailed `DESIGN.md` file. - Created an `editorial-collage` skill with associated assets for a magazine-grade landing page. - Included example HTML and image assets for various sections (hero, about, capabilities, etc.). - Updated README files to guide usage and customization of the new skill and design system. - Introduced a new image generation prompt pack for consistent visual style across the landing page. * fix(i18n): cover atelier-zero design system and editorial-collage skill in German content Generated-By: looper 0.4.0 (runner=fixer, agent=claude-code) * fix(editorial-collage): align manifest with shipped assets and address PR review - Update image-manifest.json widths/heights/ratios to match the actual PNGs on disk: hero/about/cap/testimonial/cta = 1024x1024 (1:1), method-1..4 = 816x816 (1:1), lab-1..5 and work-1..2 = 768x1024 (3:4). Mirror the new dimensions in imagegen-prompts.md headings and in README.md. - Mark testimonial.png as rekey_on_brand_change so the manifest agrees with SKILL.md's "regenerate at minimum testimonial.png" guidance, and add work-1/work-2 to the rekey list in SKILL.md and README.md. - Add a Hero (I.) sec-rule and renumber every following section II..VIII in example.html so the eight sections walk sequentially I -> VIII and the page-of-008 counter starts at 001. - Delete editorial-artifact-system/ (16 duplicate PNGs + index.html + skills.md draft) — the canonical version is skills/editorial-collage/ and the duplicate had no consumer references. - DESIGN.md: spell out which dimensions of each magazine reference (Monocle/Apartamento/IDEA), document the rationale for single-accent vs multi-accent, and extend the anti-pattern list with AI-image-gen artifacts the system explicitly rejects. - SKILL.md: add italic_words validation guidance (trim, cap at 4, verb->noun rewrite, punctuation strip) and replace the broken-image fallback with an inline SVG placeholder sized to the slot's manifest aspect ratio. Generated-By: looper 0.4.0 (runner=fixer, agent=claude-code) * fix(daemon): serve skill example assets via stable API route Skill example HTML such as `skills/editorial-collage/example.html` references shipped images via `./assets/.png`. The web app loads the example into a sandboxed iframe via `srcdoc`, where relative URLs resolve against `about:srcdoc` and the PNGs render as broken images in the Examples preview. Add a `GET /api/skills/:id/assets/` route that serves files under the skill's `assets/` directory with path-traversal guards, and rewrite `src='./assets/<file>'` / `href='./assets/<file>'` in the example response to point at that route. The disk preview keeps working because the on-disk files are unchanged. Generated-By: looper 0.4.0 (runner=fixer, agent=claude-code) * feat(landing-page): add new static Next.js 16 site for Open Design marketing - Introduced a new landing page application using Next.js 16, featuring a static export setup. - Added essential files including `package.json`, `next.config.ts`, and TypeScript configuration. - Implemented global styles in `globals.css` to match the Atelier Zero design system. - Created a detailed `AGENTS.md` for module-level boundaries and purpose. - Included various image assets for the landing page, ensuring a visually cohesive experience. - Established a root layout and main page structure to support the marketing content. * style(landing-page): enhance topbar layout and improve responsiveness - Added nowrap styling to topbar elements to prevent text overflow. - Introduced media query to hide mid text in the topbar for screen widths between 1200px and 1280px. - Updated layout.tsx to suppress hydration warnings for better rendering consistency. - Removed redundant "Compiled by Open Design" text from the page component. * feat(landing-page): implement scroll-reveal animations for enhanced user experience - Added a new `RevealRoot` component to manage scroll-triggered reveal animations. - Updated `globals.css` with styles for elements using the `data-reveal` attribute, including opacity, translation, and scaling effects. - Modified `layout.tsx` to include the `RevealRoot` component for managing animations. - Enhanced `page.tsx` by adding `data-reveal` attributes to various elements for staggered reveal effects. - Implemented reduced motion support to ensure accessibility for users with motion sensitivity. * fix(landing-page): update import paths and enhance link styles - Changed the import path in `next-env.d.ts` to reference the correct routes type definition. - Enhanced `globals.css` with new styles for topbar links, work cards, and partner elements, improving hover effects and transitions. - Updated `page.tsx` to include canonical project URLs and made various links point to these URLs for better navigation and accessibility. * feat(landing-page): implement headroom-style sticky header with live GitHub star count - Introduced a new `Header` component to manage sticky navigation behavior on scroll, enhancing user experience. - Updated `globals.css` to style the sticky header, including transitions and visibility toggling based on scroll direction. - Modified `page.tsx` to replace the static header with the new `Header` component, which fetches and displays the live GitHub star count. - Ensured accessibility by providing a fallback for users who prefer reduced motion. * feat(landing-page): enhance editorial landing page with global ticker and new styles - Updated `next-env.d.ts` to reference the correct routes type definition for development. - Enhanced `globals.css` with new styles for the global ticker, including responsive design and improved overflow handling. - Introduced a new `WIRE_CITIES` and `WIRE_CONTRIBS` data structure in `page.tsx` to display a counter-scrolling marquee of cities and contributors. - Added a ghost button style for the navigation call-to-action in the header. - Updated various sections in `page.tsx` to integrate the new ticker and improve overall layout and accessibility. * refactor(landing-page): update paper texture overlay and remove multica-ai link - Enhanced comments in `globals.css` to clarify the purpose and behavior of the paper texture overlay. - Adjusted z-index of the overlay to ensure proper layering with other elements. - Removed the `multica-ai` partner link from `page.tsx` to streamline the partner section. * feat(landing-page): implement dynamic contributor marquee with GitHub integration - Added a new `Wire` component to display a counter-scrolling marquee of cities and contributors. - The contributor list is fetched live from the GitHub API, ensuring up-to-date information. - Updated `page.tsx` to integrate the `Wire` component, replacing the static contributor list with dynamic content. - Enhanced comments for clarity regarding the functionality and purpose of the global wire. * fix(i18n): add German display copy for editorial-collage-deck skill The Validate workspace test asserts that GERMAN_CONTENT_IDS.skills covers every curated skill on disk; the new editorial-collage-deck skill was missing from DE_SKILL_COPY, causing src/i18n/content.test.ts to fail. Generated-By: looper 0.4.0 (runner=fixer, agent=claude-code) * feat(landing-page): migrate marketing site to Astro * perf(landing-page): remove React client runtime * perf(landing-page): serve images from Cloudflare resizing * fix(pr): address landing page review feedback --------- Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-04 13:39:58 +08:00
iulian	02638af353	Add linux x64 AppImage to tools-pack and release workflows (#369 ) * feat(tools-pack): extend config types for linux platform * feat(tools-pack): add linux resource files (icon, .desktop template) * feat(tools-pack): export linuxResources paths * feat(tools-pack): scaffold linux.ts module * chore(tools-pack): add vitest devdep for linux lane unit tests * feat(tools-pack): add buildDockerArgs helper for containerized linux builds * chore: update pnpm lockfile after adding vitest dep * feat(tools-pack): add renderDesktopTemplate helper * fix(tools-pack): use @@ICON_PATH@@ token in linux .desktop template Reviewer flagged the third .replace() in renderDesktopTemplate as dead code because the template hardcoded Icon=open-design-@@NAMESPACE@@ instead of using @@ICON_PATH@@. Switch the template to @@ICON_PATH@@ so install logic controls the icon stem name independent of namespace, and move the sanitizeNamespace assertion out of the renderDesktopTemplate describe block into its own describe. * feat(tools-pack): add matchesAppImageProcess helper * test(tools-pack): cover matchesAppImageProcess missing APPIMAGE env case Closes a coverage gap flagged by code review: a process whose executable matches /tmp/.mount_/AppRun but has no APPIMAGE env should be rejected. The implementation already returned false for this case (undefined === installPath is false); this test pins the behavior explicitly. feat(tools-pack): implement packLinux native build path * fix(tools-pack): packLinux extra resources, output pre-clear, publish never Code review flagged three plan-level omissions in packLinux that mac.ts handles correctly: 1. writeAssembledApp now writes packagedConfigPath (open-design-config.json) with namespace, nodeCommandRelative, and namespaceBaseRoot. Without it apps/packaged falls back to defaults at runtime and cannot find the namespace runtime tree. 2. writeLinuxBuilderConfig now bundles the resource tree and packaged config into the AppImage via extraResources. Without it the running app cannot find skills/, design-systems/, craft/, frames/, or the bundled bin/node. 3. runElectronBuilderLinux now pre-clears appBuilderOutputRoot and passes --publish never to electron-builder, preventing stale-artifact bleed between runs and accidental publish attempts in CI when env tokens are present. Also aligns appId with mac/win (io.open-design.desktop) and drops a no-op productNameSafe template-literal. * feat(tools-pack): implement containerized linux build via Docker * feat(tools-pack): register linux CLI commands * fix(tools-pack): align linux electron-builder config with mac.ts Smoke testing the AppImage revealed the daemon sidecar was missing from the bundled app.asar: Cannot find module '@open-design/daemon/dist/sidecar/index.js' Root cause: writeLinuxBuilderConfig was missing the 'files' field, so electron-builder used defaults that excluded transitive workspace deps from the asar. Plus several other mac.ts patterns that I dropped from the plan: artifactName, executableName, extraMetadata.main/name/ productName/version, npmRebuild=false, nodeGypRebuild=false, buildDependenciesFromSource=false, compression=maximum, top-level icon. Switch asar:true → asar:false to match mac.ts (easier to debug missing files; perf difference negligible for dev installs). * feat(tools-pack): implement linux install * feat(tools-pack): implement linux start with extract-and-run The packaged sidecar's 35-second wait timeout is exceeded when the AppImage runs from a FUSE-mounted SquashFS (Node module loads + daemon init are slow through FUSE). Pass --appimage-extract-and-run as the first arg so AppImage extracts to /tmp first; subsequent file reads go through a real filesystem and daemon boot completes in time. Wait for apps/packaged to write desktop-root.json (60s ceiling, generous to cover AppImage extraction overhead), then fetch desktop status via sidecar IPC, return the merged LinuxStartResult. * fix(tools-pack): align linux start helper with mac.ts (log echo + write semantics) Code review flagged two unjustified divergences from mac.ts in startPackedLinuxApp: 1. Missing OD_DESKTOP_LOG_ECHO=0 in spawn extraEnv. Without it the packaged logger echoes to the spawned process's stdout, which goes nowhere (logFd: null). Added the suppression to match mac.ts. 2. The desktop log truncate writeFile() was wrapped in .catch(() => undefined), silently swallowing fs errors that would later surface as confusing missing-log symptoms. Removed the .catch so errors propagate per mac.ts. Also added an inline comment explaining the 60s waitForMarker timeout (vs mac's tighter ceiling) so the rationale is preserved at the call site. * feat(tools-pack): implement linux stop with marker validation * fix(tools-pack): align linux stop with mac.ts (graceful shutdown + reason strings) Code review flagged divergences from mac.ts in stopPackedLinuxApp: 1. No graceful IPC SHUTDOWN attempt before SIGTERM/SIGKILL. Mac's pattern lets Electron renderers + sidecars flush state (SQLite WAL, logs) first. `gracefulRequested: true` was hardcoded, lying to callers about what actually happened. Now attempts SHUTDOWN with a 1500ms timeout and reports the actual outcome. 2. The dead-PID-but-marker-exists branch returned reason 'ok' (the neutral placeholder from readDesktopRootIdentityMarker), which says nothing useful. Override to 'marker-pid-not-running' to match mac.ts. 3. After a clean stop, remove the desktop-root.json marker so a subsequent start has a fresh slate (mac.ts does this too). * fix(tools-pack): clear stale desktop-root.json before linux start Smoke-testing the install/start/stop loop revealed waitForMarker returns instantly when a stale marker from a previous run still exists on disk (e.g., the previous AppImage was killed without going through 'tools-pack linux stop'). The start function then reports success without actually waiting for the new spawn to write its own marker. Defensively remove the marker file before spawning. mac.ts removes it in stop, so a clean stop->start sequence has nothing to remove here. This only matters for crash-recovery. * fix(tools-pack): linux stop validates extract-and-run AppImages Smoke testing exposed a gap from Task 7: matchesAppImageProcess only recognized FUSE-mode (/tmp/.mount_/AppRun) but Task 13 launches with --appimage-extract-and-run, which puts the live executable at /tmp/appimage_extracted_<hex>/<binary>. Stop's cmdOk validation returned false, marker validation failed, the running app was classified as 'unmanaged' and stop refused to kill it. Fix: 1. matchesAppImageProcess accepts both runner patterns. Extract-and-run regex matches /^\/tmp\/appimage_extracted_[^/]+\/[^/]+$/. 2. stopPackedLinuxApp now passes paths.installAppImagePath (or the built fallback) as the canonical install path, not marker.appPath (which apps/packaged unhelpfully writes as '/' on Linux). 3. linux.test.ts gains 2 new tests covering the extract-and-run mode (both positive and the wrong-APPIMAGE-env negative case). fix(tools-pack): resolve linux paths in stop (typecheck regression from previous commit) * feat(tools-pack): implement linux logs * feat(tools-pack): implement linux uninstall * feat(tools-pack): implement linux cleanup * docs(tools-pack): document linux lane in READMEs and AGENTS files * ci(release): add linux x64 AppImage to release-beta and release-stable Mirrors the existing build_mac/build_win pattern with a build_linux job in both release workflows. Builds via `tools-pack linux build --containerized --to appimage` so the AppImage is linked against the electronuserland/builder glibc 2.27 baseline (portable across distros) rather than the ubuntu-latest glibc 2.39. The linux asset is uploaded to the immutable version release tag alongside mac/win. The beta channel-feed release (latest-mac.yml, latest.yml) is intentionally not extended with latest-linux.yml because tools/pack/src/linux.ts has no electron-builder publish block wired, so the auto-update feed would point users at a feed that never updates. AppImage auto-update is a separate follow-on. Linux is unsigned (no signing path in tools-pack yet), so the beta asset uses the .unsigned suffix matching the windows convention; the stable asset uses no suffix, matching the stable windows convention. * fix(tools-pack): propagate --dir/--portable into containerized linux build The inner `pnpm tools-pack linux build` invocation in `buildDockerArgs` only forwarded `--to` and `--namespace`. Callers passing `--dir` (e.g. the new release workflows using `--dir $RUNNER_TEMP/tools-pack`) had their flag silently dropped: the container defaulted to writing under /project/.tmp/tools-pack while the host's `findBuiltAppImage` looked at the caller's chosen `--dir`, producing "expected AppImage not found" on any non-default tool-pack root. Callers passing `--portable` had the same drop, baking build-machine runtime roots into shipped artifacts. Fix: - Mount `${config.roots.toolPackRoot}:/tools-pack` (new third volume, alongside the existing /project, /home/builder, and cache mounts). - Forward `--dir /tools-pack` to the inner build so its output lands inside the mounted host dir. - Forward `--portable` when `config.portable` is true. The mount overlaps harmlessly with /project when toolPackRoot lives under workspaceRoot (default case): Docker exposes the same host inode at both paths. The existing .docker-home and .docker-cache/* mounts continue to shadow the parent at their specific /home/builder paths. Document the shell-interpolation safety invariant on the inner command: config.namespace is sanitized at config-time, config.to is enum-validated, config.portable is boolean -- none can carry shell metacharacters. Tests: add coverage for the new /tools-pack mount, --dir forwarding, and --portable propagation (both true and false branches). Resolves the P1 review feedback from the Codex bot on PR #369. * docs(tools-pack): polish linux README based on PR review Addresses non-blocking P2/P3 review feedback on PR #369: - AppImage launch mode: name the test distros (Ubuntu 24.04, Arch Linux) and frame the FUSE-vs-extract-and-run gap as an order-of-magnitude improvement instead of an unspecified slowdown. - Optional system tools: add a libfuse2 paragraph distinguishing FUSE launch (needs libfuse2) from extract-and-run (does not), with the Ubuntu-24-vs-pre-24 package name caveat. - New section "Format choice: why AppImage first" anchoring the AppImage-only decision against industry precedent (VS Code, Discord, Slack, Cursor, Obsidian) so the rationale survives without a reviewer. - Out of scope: convert the dense one-liner into a bulleted list, mark AppImage signing as gated on GPG infra + verification flow design (no ETA), explain the latest-linux.yml gap, and remove the now-stale "release lane" entry since this PR adds it. * fix(tools-pack): add --appimage-extract-and-run to installed .desktop launcher The XDG .desktop file installed by `tools-pack linux install` invoked the AppImage directly via `Exec=env OD_NAMESPACE=<ns> <exec> %U`. That bypassed the extract-and-run flag that `tools-pack linux start` applies, so menu launches and `od://` desktop activations could hit the FUSE slow path that was already shown to make the daemon sidecar exceed apps/packaged's 35-second startup timeout. CLI-spawned starts succeeded while menu-launched starts could fail with the same artifact. Add `--appimage-extract-and-run` to the template's `Exec=` line and update the renderDesktopTemplate test expectation. New regression test locks the flag into place so a future template edit can't silently drop it. Resolves a P1 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): treat signal-terminated container builds as failures `runBuildInContainer` resolved the build promise on `code === null`, which in Node's child-process `exit` event means the child was terminated by a signal (SIGTERM, SIGKILL, OOM-killer, parent process death). A killed Docker build could therefore make `packLinux` report a containerized build as complete even though the artifact was partial or missing. Accept the `signal` argument on the exit handler. Resolve only when `code === 0 && signal == null`. Otherwise reject with a message naming either the non-zero code or the terminating signal so the failure mode is visible in CI logs and `tools-pack linux build --json` output. Resolves a P1 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): tear down orphaned process tree on failed linux start If `startPackedLinuxApp` spawned the AppImage but the post-spawn readiness path then failed -- either because the 60s waitForMarker ceiling elapsed without the daemon writing desktop-root.json, or because fetchDesktopStatus threw -- the detached child was left running. Because the marker is the only persistent identity source used by `stopPackedLinuxApp`, future lifecycle commands could not associate the orphan with the namespace, leaving stale Electron and sidecar processes plus stale IPC sockets that would interfere with subsequent starts. Wrap the readiness wait + status fetch in try/catch. On failure, collect the spawned child's process tree via listProcessSnapshots + collectProcessTreePids and stopProcesses() it (the same path stopPackedLinuxApp uses for its tree teardown), then rethrow the original error. Cleanup errors are swallowed so the original failure is preserved in the rejection. Extract the tree-teardown helper as `teardownOrphanedStart` so the intent is documented at the call site without inlining 4 imports of implementation detail. Resolves a P2 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): use `corepack pnpm` in containerized linux build The inner command in `buildDockerArgs` started with `corepack enable`, which writes pnpm/yarn/npm shims into the directory containing the node binary. In `electronuserland/builder:base`, that directory is owned by root, but the container runs as the host's non-root uid via `--user` (so build artifacts come out owned by the caller, not root). The `corepack enable` step therefore fails with EACCES before `pnpm install` ever runs, blocking the new release `build_linux` job from publishing the Linux AppImage. Switch to `corepack pnpm install --frozen-lockfile && corepack pnpm tools-pack linux build ...`, which resolves and runs the version of pnpm pinned in package.json's `packageManager` field directly. No shims, no global mutation, no root writes — corepack just dispatches to the pinned binary as the unprivileged user. Update the existing inner-command test to match the new corepack invocation, and add a regression test that asserts the inner command contains `corepack pnpm` and never `corepack enable` so a future edit can't reintroduce the root-write requirement. Resolves a P1 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): accept menu-launched processes in linux stop/uninstall stopPackedLinuxApp validated the live process via matchesStampedProcess against the process command line, requiring a SIDECAR_SOURCES.TOOLS_PACK stamp. That worked for `tools-pack linux start` (which spawns with createProcessStampArgs), but rejected menu launches: the installed .desktop entry only sets OD_NAMESPACE and does not pass stamp args, so apps/packaged falls back to a SIDECAR_SOURCES.PACKAGED stamp written into desktop-root.json -- a perfectly valid identity, just not the one the validator accepted. Symptoms with the old behavior: - `tools-pack linux stop` reported `unmanaged` for menu-launched apps and refused to stop them. - `tools-pack linux uninstall` would happily remove the AppImage, .desktop entry, and icon while the packaged app was still running, breaking handles to the AppImage's mounted/extracted contents. Switch the validator to read marker.stamp directly (the file content written by apps/packaged itself, not the process command) and accept either TOOLS_PACK or PACKAGED. The expected app/mode/namespace/ipc fields are still required to match. Mirrors the dual-source acceptance pattern in mac.ts:709-714. The matchesAppImageProcess (cmdOk) and namespaceRoot checks are preserved -- the marker still has to point at our AppImage at a path in our namespace's runtime root. Drop the now-unused matchesStampedProcess import. Resolves a P1 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): per-platform --to help text in CLI addBuildOptions is shared across mac/win/linux but its --to help text hard-coded the mac targets (all\|app\|dmg\|zip), so: - tools-pack linux --help advertised --to all\|app\|dmg\|zip even though resolveToolPackBuildOutput accepts only all\|appimage\|dir, sending users at invalid targets and hiding the AppImage option. - tools-pack win --help had the same problem (advertised mac targets while accepting all\|dir\|nsis with default nsis). Parameterize addBuildOptions(command, platform) and back it with a TO_HELP_BY_PLATFORM table that mirrors the resolver's accepted targets in config.ts. Update the three call sites. Smoke verified by running --help for each platform: linux: all\|appimage\|dir (default: all) mac: all\|app\|dmg\|zip (default: all) win: all\|dir\|nsis (default: nsis) The misleading "--signed: build a signed/notarized mac artifact" line on win/linux is left alone -- out of scope for this fix and not part of the review feedback. Resolves a P3 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): use OD_PACKAGED_NAMESPACE in installed .desktop launcher The installed .desktop entry's Exec= line set OD_NAMESPACE=<ns>, but apps/packaged/src/config.ts:9 reads namespace overrides from OD_PACKAGED_NAMESPACE, not OD_NAMESPACE. The env assignment was a silent no-op for menu launches: the packaged app fell back to whatever namespace was baked into open-design-config.json at install time, ignoring the namespace advertised in the .desktop file. Practical effect: a .desktop launcher created for namespace "foo" could end up running as the namespace baked into the AppImage's shipped config (typically "default"), so installs created across multiple namespaces could collide silently from menu launches. CLI launches via `tools-pack linux start` were unaffected because they pass the namespace through createSidecarLaunchEnv which targets the correct env var. Switch the template to OD_PACKAGED_NAMESPACE. Update the existing renderDesktopTemplate test fixture/expectation, and add a regression test that asserts the Exec= line uses OD_PACKAGED_NAMESPACE and never the wrong OD_NAMESPACE name. Resolves a P1 review finding from mrcfps/Looper on PR #369. * fix(tools-pack): gate linux uninstall + cleanup on stop status uninstallPackedLinuxApp called stopPackedLinuxApp first, then deleted the AppImage / .desktop entry / icon unconditionally. cleanupPackedLinux Namespace did the same with the output and runtime namespace roots. Both ignored stop.status -- so when stop returned "partial" (some processes survived SIGTERM->SIGKILL) or "unmanaged" (the running PID failed marker validation), uninstall would yank the install files out from under a still-running packaged app, breaking handles to the mounted/extracted AppImage contents and leaving an orphan with stale SQLite WAL files / log handles / IPC sockets. Extract a small `isSafeToRemoveInstallFiles(stop)` helper that returns true only for "stopped" or "not-running". Both uninstall and cleanup short-circuit when it returns false: - uninstall reports "skipped-process-running" for each removal slot and "skipped" for the post-install hooks. Existing "ok" / "already- removed" / "ok"\|"missing"\|"failed" paths are unchanged. - cleanup leaves both removed* booleans false and adds a new `skipped: boolean` field set to true. Old consumers that only read the booleans see the same "nothing was removed" signal they would have seen for an already-clean namespace; new consumers can distinguish "nothing to remove" from "refused to remove." LinuxUninstallResult.removed.{appImage,desktop,icon} now also accepts "skipped-process-running"; LinuxUninstallResult.postUninstall.* now also accepts "skipped". LinuxCleanupResult gains the `skipped` field. Workspace typecheck clean -- the only consumer is the CLI's printJson, which doesn't constrain the wire shape. Resolves a P1 review finding from mrcfps/Looper on PR #369.	2026-05-04 00:49:00 +08:00
nettee	f33a7ecb0e	docs(readme): refresh contributors wall (#360 )	2026-05-03 15:39:32 +08:00
nettee	ba4055e804	Refresh contributors wall daily (#294 ) * docs(readme): refresh contributors wall daily * docs(readme): refresh contributors wall daily Generated-By: looper 0.2.7 (runner=fixer, agent=gpt-5.5)	2026-05-02 22:06:11 +08:00
Marc Chan	a93246d892	chore(ci): add GitHub CI workflow (#271 ) * Add GitHub CI workflow * Address CI workflow review feedback Generated-By: looper 0.3.0 (runner=fixer, agent=openai/gpt-5.5)	2026-05-02 16:14:33 +08:00
Marc Chan	06eac21cd8	Fix metrics workflow protected branch updates (#219 ) * fix github metrics workflow for org repo * fix metrics workflow protected branch updates * fix metrics workflow review notes Generated-By: looper 0.3.0 (runner=fixer, agent=openai/gpt-5.5)	2026-05-01 22:45:54 +08:00
lefarcen	38bdb59d86	fix(release-stable): build desktop before typecheck, drop workspace tests (#216 )	2026-05-01 20:53:10 +08:00
Marc Chan	1d6d86fa69	fix github metrics workflow for org repo (#217 )	2026-05-01 20:49:22 +08:00
lefarcen	913a6c3ea7	fix(release-stable): build daemon before workspace typecheck (#215 ) The verify job ran `pnpm typecheck` (root script) which executes `pnpm -r run typecheck` before the daemon build. The e2e workspace's typecheck imports types from `apps/daemon/dist/.js`, so on a fresh clone (every CI run) it fails with TS2307 cannot-find-module. Drive the order explicitly inside the verify job: 1. install deps 2. build daemon (produces dist/.js + .d.ts) 3. workspace typecheck 4. check:residual-js 5. workspace tests This keeps the root `typecheck` script untouched (which other dev / contributor workflows may depend on) — the workflow simply imposes the correct order itself. The atomic publish job already prevented orphan tags/releases when the first dispatch failed at typecheck. Co-authored-by: Elian <elian@EliandeMacBook-Pro.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:26:50 +08:00
lefarcen	451ae983db	release: Open Design 0.1.0 — first public release (#206 )	2026-05-01 20:15:18 +08:00
PerishFire	f604ff1ec2	Add Windows beta packaging and release assets (#191 )	2026-05-01 16:46:15 +08:00
Tom Huang	d25a7aaf42	docs(readme): refresh stats, agents, skills and add metrics workflow (#173 ) * Refactor project name from "Open Claude Design" to "Open Design" - Updated project name in package.json, package-lock.json, and README files. - Changed CLI commands and references from "ocd" to "od". - Adjusted file structure references in documentation and code to reflect new naming conventions. - Enhanced .gitignore to include new runtime data files. - Updated metadata in LICENSE file to match new project name. * chore: update next-env.d.ts route types path Made-with: Cursor * docs(readme): refresh stats, agents, skills and add metrics workflow Make all three READMEs (en / zh-CN / ko) tell the truth about what the project actually ships, and add lightweight community signals at the top and bottom. - Hero block: live for-the-badge GitHub stats (stars, forks, issues, PRs, contributors, commit activity, last commit) sit directly under the banner, with the smaller flat-square project-meta row (License, Agents, Design systems, Skills, Quickstart) below them and the language switcher below that. - Counts updated to reality: 31 skills (was 19), 72 design systems (was 71), 10 coding-agent CLIs + OpenAI-compatible BYOK (was 7). - "At a glance", architecture, and prompt-stack tables updated to cover /api/templates, /api/import/claude-design, /api/proxy/stream, /api/artifacts/lint, sidecar IPC, and per-namespace runtime data. - New "Beyond chat — what else ships" section covering Claude Design ZIP import, BYOK proxy with SSRF block, saved templates, tab persistence, artifact lint, sidecar protocol + headless desktop, and Windows-friendly spawning. - Skills tables rebuilt by mode (prototype, deck) and scenario; the "template" mode claim is removed. - Supported coding agents table expanded to all 10 CLIs (Claude Code, Codex, Gemini, OpenCode, Cursor Agent, Qwen, Copilot, Hermes, Kimi, Pi) plus a BYOK row, with accurate stream formats and argv shapes. - Roadmap re-flowed to mark shipped vs pending items. - Contributors wall (contrib.rocks), Repository activity (lowlighter metrics SVG), and Star History added to all three READMEs, with cache_bust=2026-04-30 on the contrib.rocks and star-history image URLs to bypass GitHub camo caching. - Korean README harmonised end-to-end with the English/Chinese ones. - New .github/workflows/metrics.yml regenerates docs/assets/github-metrics.svg daily; ship a placeholder SVG so the image works before the first scheduled run. Made-with: Cursor * docs(readme): address PR #173 review feedback - Replace invalid 0x14 control character in github-metrics.svg with an em-dash so the placeholder is well-formed XML and renders as an image (P1: was breaking SVG parse before the first metrics run). - Clarify the placeholder SVG subtitle to spell out the token model: GITHUB_TOKEN gives core stats; METRICS_TOKEN unlocks richer plugins (traffic, follow-up). Reduces "do I need a secret?" confusion. - Rewrite the metrics.yml inline auth comment to match: METRICS_TOKEN is optional and only enables richer plugins; GITHUB_TOKEN is enough for core metrics. Previous comment read as if METRICS_TOKEN was mandatory. - Soften the BYOK fallback row in all three READMEs (EN / zh-CN / ko) with a catch-all phrase ("or any other OpenAI-compatible provider") so the listed vendors don't read as exhaustive.	2026-04-30 23:57:19 +08:00
PerishFire	a40d817d28	Add mac packaged runtime and beta release flow (#170 ) * feat(pack): add mac packaged runtime control plane * feat(pack): harden mac packaged runtime lifecycle Keep packaged state namespace-scoped, make daemon paths explicit through sidecar launch env, and add conservative desktop identity/logging fallbacks for local mac package validation. * feat(pack): add mac beta release flow * fix(pack): generate mac update feed fallback * fix(pack): write portable beta checksums * fix(pack): make beta artifacts portable * fix(pack): clean up mac install visuals * fix(pack): address packaged runtime review feedback	2026-04-30 20:25:49 +08:00
PerishFire	3447af23f4	chore: add release beta workflow placeholder (#36 )	2026-04-29 16:00:24 +08:00
Tom Huang	d243b37d74	fix: allow Claude Code to read skill seeds and design-system specs (#6 ) (#7 ) * Allow Claude Code to read skill seeds and design-system specs (#6) The skill body's preamble points the agent at absolute paths like `<repo>/skills/guizang-ppt/assets/template.html`, but the agent's cwd is `.od/projects/<id>/`. Without an explicit allowlist Claude Code blocks Read on those paths and the user sees a permission error mid-conversation. Pass `SKILLS_DIR` and `DESIGN_SYSTEMS_DIR` through `buildArgs` and emit them as `--add-dir` for Claude so the seed template, references, and design-system DESIGN.md are all readable. Other agents ignore the extra dirs (no equivalent flag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add verification screenshot for issue #6 fix Captures the agent successfully Read-ing skills/guizang-ppt/ side files through the new --add-dir allowlist, confirming the permission error from issue #6 is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:25:32 +08:00

1 2 3

135 commits