Commit graph

96 commits

Author SHA1 Message Date
Marc Chan
135c6b42d8
fix(ci): lint workflow changes with actionlint (#2742)
* fix(ci): lint workflow changes with actionlint

* fix(ci): lint workflow changes with actionlint

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): lint workflow changes with actionlint

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): lint workflow changes with actionlint

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-23 12:12:55 +08:00
Marc Chan
866661ac65
fix(ci): run merge queue checks on merge groups (#2745) 2026-05-23 11:59:30 +08:00
Marc Chan
6592d638ce
ci: gate fork PR workflow auto-approval (#2683)
* ci: gate fork PR workflow auto-approval

* ci: rename fork PR approval workflow

* ci: normalize fork workflow paths

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): match action_required workflow runs

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): denylist tool config paths

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): retry action_required workflow lookup

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): restrict fork workflow approvals to target PR

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): keep polling fork workflow approvals

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): revalidate fork workflow approvals before approving

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): poll longer for first fork approval run

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): make fork approval poll budget configurable

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): drop stale fork approval runs

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): deny dotted tsconfig variants in fork approvals

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): run fork approval regression in guard

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): refresh Nix pnpm deps hash

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* test(web): mock useI18n in reattach restore test

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): accept status-only fork approvals

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): rerun fork approval on retarget

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): ignore base tip churn in PR association

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): broaden pending approval run fetch

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): skip non-retarget fork approval edits

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): checkout visual comment workflow head

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): paginate workflow approval run lookup

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): harden fork workflow follow-ups

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): honor full post-appearance settling window

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): validate manual visual comment checkout

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-23 11:48:36 +08:00
Marc Chan
1c7233ef10
fix(landing-page): speed up landing-page CI builds (#2734)
* fix(landing-page): speed up landing-page CI builds

* fix(landing-page): disable dev-only landing caches

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(landing-page): reuse previews across CI runs

* fix(landing-page): hash shared preview dependencies

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(landing-page): skip missing preview html reads

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(landing-page): rerun previews on cache hits

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): repair landing-page workflow cache keys
2026-05-23 00:30:31 +08:00
Marc Chan
a5b47c5f76
fix(ci): narrow workflow scope and reuse setup steps (#2708)
* fix(ci): narrow workflow scope and reuse setup steps

* fix(ci): narrow workflow scope and reuse setup steps

Repair Nix fixed-output hashes for the filtered daemon and web source trees.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): narrow workflow scope and reuse setup steps

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): narrow workflow scope and reuse setup steps

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): repair daemon and nix checks

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-22 18:58:53 +08:00
ashleyashli
558fedd207
fix(landing): wire GA4 rollout config (#2615)
Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 14:56:58 +08:00
Marc Chan
10192dcc52
fix(ci): catch nix hash drift before merge (#2530)
* fix(ci): catch nix hash drift before merge

* fix(nix): add pnpm hash refresh helper

* chore(nix): drop redundant hash alias

* fix(nix): raise update-hash output buffer

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(nix): handle current pnpm deps hash

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(nix): reject non-mismatch hash updates

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)
2026-05-21 16:08:13 +08:00
Marc Chan
23d28682da
fix(ci): fall back to manifest PR number for visual comments (#2506)
* fix(ci): fall back to manifest PR number for visual comments

* fix(ci): restore authoritative visual PR lookup

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)
2026-05-21 12:06:41 +08:00
Marc Chan
c45c5c9764
fix(ci): align visual selectors and nix hashes (#2471)
* fix(ci): align visual selectors and nix hashes

* fix(ci): add strict PR visual verification

* fix(ci): repair visual-home captures

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)
2026-05-21 10:45:37 +08:00
shangxinyu1
71044bd3d6
test(e2e): harden extended coverage state assertions (#2245)
* test(e2e): harden extended coverage contracts

* docs(testing): add e2e hardening status

* fix(web): persist artifact chips after daemon runs

* ci: install playwright browsers for e2e vitest

* Fix daemon run recovery across reloads

Pin daemon-created runs to assistant messages immediately so hard reloads before the create response can reattach.

Replay terminal and active run events from the beginning on reload so restored turns keep assistant text, thinking events, produced files, and artifacts.

Fixes #2366

Fixes #2368

Fixes #2371

* test(e2e): preserve fake runtime selection across reload

* fix(web): scope daemon run recovery to daemon mode

* fix(e2e): remove duplicate delayed smoke flag

* fix(web): scope replay artifact recovery to current run

* fix(daemon): remove duplicate run-create pin
2026-05-20 16:21:01 +08:00
ashleyashli
65e760b88a
feat(seo): add GSC report opportunities (#2388)
Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 16:19:14 +08:00
Marc Chan
f294ab4915
chore(ci): add visual regression PR workflow (#2372)
* Add visual regression PR workflow

* Allow manual visual PR comments

* Post visual comments for same-repo PRs

* fix(ci): surface R2 lookup failures in visual report

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* Align visual workflow names
2026-05-20 15:05:59 +08:00
kami
c85da3eb40
fix: sync landing source-of-truth paths (#2066) 2026-05-20 11:44:04 +08:00
PerishFire
2c128e0e91
refactor desktop host bridge (#2246) 2026-05-19 18:27:05 +08:00
ashleyashli
07659b7272
feat(seo): add Search Console reporting workflows (#2229)
* feat(blog): daily 3-day Search Console traffic digest

Adds `blog-3day-report.yml` (cron 09:00 Asia/Shanghai) and a
companion `report-3day.ts` script that refreshes
`docs/blog-traffic-digest.md` once per day. The digest has two
sections:

- T-3 spotlight: posts published exactly three days ago, with their
  3-day Search Analytics window plus current URL Inspection coverage
  state.
- Rolling 30-day cohort: every post 1–30 days old with its latest
  3-day Search Analytics window, sorted by impressions descending.

The workflow is read-only against Google APIs (no Indexing API,
no "request indexing" automation) and mirrors the secret / config
plumbing already used by `blog-indexing-monitor.yml`. Output lands
in a reviewable `automation/blog-traffic-digest` PR opened by the
open-design bot.

Also widens `querySearchAnalytics` to accept `windowDays: 3 | 7 | 28`
and updates `docs/blog-indexing-automation.md` with the new pipeline.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(seo): post daily Search Console report to Feishu

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(blog): push traffic digest to Feishu

Emit a compact JSON summary from the daily 3-day traffic digest and add a Feishu custom bot sender for the summary card. Wire the workflow to send the card when `FEISHU_BLOG_DIGEST_WEBHOOK` is configured while keeping Markdown PR output as the source of truth.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(landing-page): add Discord routing CTAs

Add a lightweight Discord pill to the landing hero and Discord links in the landing and blog footers so community routing is visible without displacing the primary GitHub and download CTAs.

Add a blog-ending conversion card that points guide and use-case readers to the internal workflows library, while keeping Discord as a secondary support path.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 18:09:44 +08:00
ashleyashli
e702a6a49f
Fix blog indexing status PR base (#2106)
Set an explicit base branch for generated indexing status PRs so create-pull-request works after SHA-based checkouts.

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 18:08:01 +08:00
PerishFire
bb13eee765
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries

* chore(ci): tighten measured workspace steps

* chore(release): tighten beta setup steps

* chore(release): slim beta windows smoke

* chore(ci): shard daemon tests

* chore(ci): harden runtime trace lookup

* chore(release): avoid mac pnpm cache in beta

* chore(ci): split critical playwright checks

* chore(release): publish beta platforms from builders

* test(e2e): update beta release workflow expectation

* chore(ci): stop gating PRs on nix check

* fix(release): keep beta latest complete
2026-05-19 18:06:28 +08:00
Yuhao Chen
a1e8ce480a
fix(ci): include bundled resources in Windows cache key (#2034) 2026-05-19 16:50:39 +08:00
Marc Chan
4f116d9eaf
fix(ci): anchor PR inactivity clock to author responses (#2185)
* fix(ci): anchor PR inactivity clock to author responses

* fix(ci): add dry-run mode to PR inactivity workflow

* fix(ci): read workflow dry-run input from event payload

* fix(ci): log PR inactivity dry-run diagnostics

* fix(ci): accept both review association field names

* fix(ci): log PR 642 feedback payload shapes

* fix(ci): trust PR reviewers by repo permission

* fix(ci): remove temporary inactivity debug logs
2026-05-19 13:59:15 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
PerishFire
99b42726b8
Simplify CI PR gate (#2183) 2026-05-19 13:18:41 +08:00
shangxinyu1
f650a043d9
test(e2e): align entry coverage with redesigned flows (#2101)
* Migrate entry E2E coverage and split CI

* test(e2e): relax connectors auth error assertions

* ci: route scenario registry changes to extended e2e

* ci: decouple packaged smoke jobs from validate gate

* ci: restore pre-split workflow

* test(e2e): slim critical ui smoke coverage

* test(e2e): move broader entry flows out of critical

* test(e2e): restore entry chrome coverage to ci

* ci: parallelize workspace validation jobs

* test(web): stabilize media palette bridge assertion

* ci: cache e2e playwright browsers
2026-05-19 11:26:40 +08:00
Marc Chan
f403ffbfce
ci: add PR-author and stale-issue inactivity workflows (#2055)
* ci: add PR-author and stale-issue inactivity workflows

Adds two queue-management automations:

- pr-author-inactivity: reminds PR authors after 72h of inactivity
  following human reviewer/maintainer feedback (issue comments,
  non-approval reviews, or inline review comments) and closes after
  120h. Author response is detected via issue comments, inline review
  replies, or commit/force-push events. Bot-authored reviews are
  intentionally excluded so authors are not pressured by automated
  nits alone.

- stale-issues: marks issues stale after 30 days of inactivity and
  closes after a further 7 days. Exempts good first issue, help
  wanted, and security labels. A pre-step also auto-applies
  'exempt-from-stale' to issues opened by org members/owners/
  collaborators, since actions/stale only supports label-based
  exemptions. PR processing is disabled (handled by the workflow
  above).

* fix: limit PR inactivity feedback to trusted reviewers

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix: count author PR reviews as inactivity responses

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-18 16:45:37 +08:00
ashleyashli
0163fa9d84
fix(landing): stabilize blog indexing workflows (#2042)
Use full workspace installs for blog indexing workflows so root postinstall can resolve workspace build dependencies, and enrich sitemap metadata for blog URLs from frontmatter dates.

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-18 15:52:29 +08:00
leessju
7766582f0b
chore(ci): scope nix-check workflow permissions to contents:read (#1870)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
nix-check / build (push) Failing after 1s
ci / Packaged linux headless smoke (push) Has been skipped
The other workflows under .github/workflows declare explicit
`permissions:` blocks that scope their GITHUB_TOKEN to the minimum
required (contents: read for build-only flows). `nix-check.yml` was
the lone outlier and inherited the repository's default token
permissions instead.

Add `permissions: { contents: read }` to align with the rest of the
workflow suite and follow GitHub's least-privilege workflow guidance.
No behavior change: the job only reads the repo, runs `nix flake
check`, and uploads a logs artifact on failure (which uses an action
that already declares its own permissions internally).

Co-authored-by: nicejames <nicejames@gmail.com>
2026-05-17 11:28:18 +08:00
Marc Chan
6bf865a43b
fix(ci): avoid duplicate nix-check runs on PR branches (#1917)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
github-metrics / Generate repository metrics SVG (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Packaged linux headless smoke (push) Has been skipped
* fix(ci): avoid duplicate nix-check runs on PR branches

* fix(ci): keep nix-check on main pushes

Restore the nix-check push trigger for main-only updates while still avoiding duplicate PR-branch runs.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-16 23:38:50 +08:00
Marc Chan
1f728ac8e3
fix(ci): only run docker image workflow for release tags (#1916) 2026-05-16 22:32:33 +08:00
Marc Chan
d6515643ef
fix(ci): use open-design-bot for metrics PRs (#1910)
* fix(ci): use open-design-bot for metrics PRs

* fix(ci): restore metrics workflow app scopes

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-16 21:52:37 +08:00
Marc Chan
fd82384b5e
fix(ci): dispatch metrics branch validation (#1878)
* fix(ci): dispatch metrics branch validation

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(ci): dispatch metrics branch validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-16 16:15:44 +08:00
lefarcen
7d1adf9fd7
docs: point 0.8.0 preview contributors at main (#1846)
* docs: point 0.8.0 preview contributors at main, not preview/v0.8.0

0.8.0 has been merged into main (#1832). Anywhere we used to tell
contributors to checkout / PR against preview/v0.8.0 was actively
mis-routing new PRs. Update:

- docs/preview-v0.8.0-announcement.md + zh-CN: status line, Branch row,
  source-build checkout, and 'open a PR against' guidance now point at
  main
- .github/ISSUE_TEMPLATE/bug-report.yml + feature-request.yml: phrase
  the 'use the preview template' nudge as 'about the 0.8.0 preview
  features (now on main)' instead of 'about the preview/v0.8.0 branch'
- .github/ISSUE_TEMPLATE/config.yml: same rewording for the contact link
- .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml: refresh the
  description and the intro body so it reads as 'preview features
  pre-tag', not 'features pre-merge'

The preview-v0.8.0-feedback template and preview/v0.8.0 label are
intentionally kept: 0.8.0 isn't tagged yet, so we still want a
dedicated lane for preview-features feedback.

* chore: stop treating preview/v0.8.0 as a live branch

Earlier in this PR we kept the preview-v0.8.0 surface area intact —
that was the wrong call. 0.8.0 is now on main; pretending there's a
parallel 'preview' branch in the templates, labels, and copy was going
to keep mis-routing contributors.

Drop:
- .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml (the dedicated
  template that auto-applied the preview/v0.8.0 label and prefix)
- .github/ISSUE_TEMPLATE/config.yml contact_links entry pointing at it
- bug-report.yml + feature-request.yml nudges that sent users there
- The Preview-v0.8.0-feedback link block from both announcement docs
  (replaced with normal bug-report / feature-request links)

Rename:
- docs/preview-v0.8.0-announcement.{md,zh-CN.md}
    -> docs/v0.8.0-announcement.{md,zh-CN.md}
  so the on-disk doc title reads as a 0.8.0 announcement, not a
  branch-specific one. No other repo file referenced the old paths.

The preview/v0.8.0 label and branch themselves are intentionally
untouched — those are separate ops the maintainer will decide on
later. This PR only removes mentions inside the repo.

* chore: keep 0.8.0 preview-feedback template as a chooser-level ad

The previous commit deleted preview-v0.8.0-feedback.yml entirely. Bring
it back, but reframe it: it's now the dedicated 0.8.0 lane in the
issue chooser — a high-visibility surface that tells visitors "0.8.0
is here as a preview, please share what you noticed."

- Renamed in the chooser to "Open Design 0.8.0 — preview feedback"
- Title prefix shortened from "[preview/v0.8.0] " to "[0.8.0] " so the
  branch slug no longer leaks into issue titles
- label preview/v0.8.0 still auto-applied (the label entity is still in
  use across 26 issues; maintainer will decide on its fate separately)
- Area dropdown widened from "Skills + Automations" to cover the
  actual 0.8.0 surface (plugins, headless, agent flow, desktop shell)
- Intro body rewritten to read as a preview-release ad, not a
  feature-branch tester request

Announcement docs (English + Chinese) also routed their "open an
issue" CTA back through this template instead of the generic bug-report
/ feature-request links — same advertising goal.
2026-05-15 22:37:04 +08:00
lefarcen
e40399d39a
Merge pull request #1832 from nexu-io/sync/main-into-preview-v0.8.0
Release preview/v0.8.0 into main
2026-05-15 20:44:27 +08:00
lefarcen
64139db375 ci(diag): capture packaged-linux runtime logs into the headless artifact
Currently the artifact only contains vitest.log + the tools-pack build
log. The packaged daemon's own stdout/stderr lands in
$RUNNER_TEMP/tools-pack/runtime/linux/namespaces/<ns>/logs/desktop/latest.log,
outside the artifact path, so when 'headless-root.json not written
within 35s' fires we have no signal on what the spawned child actually
did.

Copy the runtime logs/ and runtime/ subdirectories into the report dir
before upload so the artifact captures daemon/web sidecar startup
output, identity markers, and per-app state JSON.
2026-05-15 19:58:21 +08:00
lefarcen
41bae9b7a5 ci: mark packaged-linux-headless smoke as non-blocking until Linux ships
The v0.8.0 launch doc explicitly lists Linux as "coming soon" — the
packaged Linux headless runtime is known-incomplete in this release.
Keep the smoke job running so we keep collecting signal, but don't block
PRs on it until the Linux client lands.

Re-enable (drop continue-on-error) once the Linux packaged client is
ready in a future release.
2026-05-15 19:45:23 +08:00
ashleyashli
772ef97476
feat(landing): automate blog indexing monitoring (#1825)
* feat(landing): add blog indexing automation

Automate supported blog discovery checks through sitemap submission, URL Inspection monitoring, IndexNow notifications, and guarded SEO CI checks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(landing): support oauth for blog indexing

Use OAuth refresh-token auth as the preferred Search Console path while keeping service-account auth as a fallback, so the indexing workflows can run despite GSC service-account invite issues.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(landing): tighten blog indexing observability

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 18:32:30 +08:00
lefarcen
22a3b99a47 Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
  linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
  summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
  inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
2026-05-15 18:23:33 +08:00
nettee
30821f3a73
fix(ci): let metrics PRs trigger required checks (#1801) 2026-05-15 17:01:10 +08:00
Olin Hendershot
74637f1cb5
Add Linux packaged client parity smoke coverage (#1204)
* docs: plan linux client issue 709

* fix: complete linux headless lifecycle routing

* feat: add linux packaged inspect

* test: add linux headless packaged smoke

* ci: add linux headless packaged smoke

* ci: smoke linux AppImage release artifacts

* docs: document linux packaged client status

* chore: finalize linux client audit remediation

* docs: add linux client publication packet

* test: harden linux client smoke coverage

* ci: preserve linux smoke audit evidence

* refactor: consolidate linux e2e helpers

Move pathExists and the desktop/web/daemon app-key array out of
linux.spec.ts into linux-helpers.ts, where expectPathInside and
linuxUserHome already live. Keeps the spec file focused on tests and
the helpers file as the canonical home for shared Linux e2e utilities.

* fix: move linux e2e helpers to lib

* fix: address linux release review blockers

* fix: drop npm dependency from containerized linux build

writeAssembledApp() previously called runNpmInstall() which executed
`npm install` directly. Inside the containerized build path,
electronuserland/builder:base strips npm/npx/corepack, so the inner
tools-pack build would fail at the assembled-app install step.

Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets
the env to the standalone pnpm binary it bootstraps, and the new
resolveProductionInstallCommand helper consumes that env to run
`<bin> install --prod --no-lockfile --config.node-linker=hoisted`.
Host invocations with no env set keep the prior npm behavior.
--config.node-linker=hoisted preserves the flat node_modules layout
that electron-builder packs the same way as npm-installed trees.

New tests cover the resolver branches and assert the docker-arg-to-
resolver chain end-to-end so reviewers can see the container's inner
build receives the env that switches its install away from npm.

* fix: harden linux container bootstrap

* fix: validate desktop marker liveness in headless cleanup

cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping.

Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.
2026-05-15 16:38:29 +08:00
lefarcen
75498838a9
chore: align issue templates to preview/v0.8.0 naming (#1723)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 3s
ci / Validate workspace (push) Has been skipped
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 1s
Following the rename of the feature branch from preview/0.8.0 to preview/v0.8.0
(to match the release/v0.7.0 convention), update all issue-template
references so the label, filename, and deep-link URL stay consistent.

Changes:
- git mv preview-0.8.0-feedback.yml → preview-v0.8.0-feedback.yml
- update labels reference, title prefix, display name, body copy
- update version placeholder example to 0.8.0-preview.2 (current build)
- update cross-references in bug-report.yml and feature-request.yml
- update config.yml first contact_link URL + about text
2026-05-14 23:21:37 +08:00
PerishCode
4f15c33595 Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0 2026-05-14 21:10:03 +08:00
sukumarp2022
9218fd649e
feat(ui): add copy to clipboard functionality for user messages with … (#1669)
* feat(ui): add copy to clipboard functionality for user messages with localization support

* fix(web): use setTimeout instead of window.setTimeout for correct Timeout type

* docs: add copy prompt button screenshot for PR #1669

* docs: add copy button hover screenshot for PR #1669

* docs: add copy button copied state screenshot for PR #1669

* fix(ui): reset button border/background on copy prompt button

The .user-copy-btn inherited border and background from the base
button CSS, rendering as a bordered gray box instead of a clean icon
overlay. This was especially visible in the Electron desktop app.

Add border: none and background: none to the button, and a subtle
hover background for feedback.
2026-05-14 20:19:20 +08:00
lefarcen
4693ddb00d
chore: add issue templates (bug, feature, preview/0.8.0) + chooser config (#1708)
* chore: add issue template for preview/0.8.0 feedback

Adds a guided issue form so community testers of the preview/0.8.0
branch (Skills tab + Automations) can submit structured feedback.

The template auto-applies the preview/0.8.0 label, which lets
maintainers filter all preview-related reports in one view:
https://github.com/nexu-io/open-design/issues?q=is%3Aopen+label%3A%22preview%2F0.8.0%22

* chore: add generic bug-report issue template

Pairs with the preview/0.8.0 template added in the previous commit.
Until now the repo had no issue templates at all, which meant New
Issue opened a blank textarea by default.

The bug-report template:
- Pre-applies the 'bug' label
- Guides users through repro steps, version, platform, logs
- Includes a callout pointing preview/0.8.0 testers to the
  dedicated feedback template so the two flows stay separate

* chore: add feature-request template + chooser config

Rounds out the issue-template basics:
- feature-request.yml — 'what problem are you trying to solve' framing,
  willing-to-contribute dropdown so maintainers can route PRs
- config.yml — disables blank-issue entry, redirects Q&A / Ideas / Show-and-tell
  / general chat to Discussions, points preview/0.8.0 reporters at the
  dedicated template

After merge, the chooser at /issues/new/choose will be:
  Template     1. 🐛 Bug report
              2. 💡 Feature request
              3. 🧪 Preview 0.8.0 feedback
  Contact      → Preview 0.8.0 feedback (dup, easy-access)
              → Ask a question (Discussions Q&A)
              → Discuss an idea (Discussions Ideas)
              → Show what you've made (Discussions Show-and-tell)
              → General discussion (Discussions General)
2026-05-14 20:13:36 +08:00
ashleyashli
1e9bcbf20d
fix(contributor-bot): serialize runs to avoid state.json races and duplicate cards (#1707) 2026-05-14 20:01:13 +08:00
PerishCode
43b1b94c8e Add preview release channel 2026-05-14 19:15:16 +08:00
PerishFire
3fa12f71be
Add release preview workflow placeholder (#1705)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 11s
ci / Validate workspace (push) Has been skipped
nix-check / build (push) Failing after 2s
2026-05-14 18:55:08 +08:00
PerishCode
5b1e49ac8a Add release preview workflow placeholder 2026-05-14 18:49:05 +08:00
PerishCode
cba8bf151d chore: align namespace lifecycle packaging 2026-05-14 16:35:46 +08:00
lefarcen
6c16283850 Merge origin/main (post-7c8305f4) into reconcile branch
Brings in 10 new main commits: routine deep-link to specific
conversations (#1508), Windows resource cache fix for Orbit templates,
collapsible comment side panel (#1607), routines project radio polish,
Copilot logo swap, and minor UI fixes.

Conflicts resolved:
- router.ts: garnet's home/view + marketplace routes + main's
  per-project conversationId deep-link field coexist on Route union
- ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper +
  main's isStoppableAssistantMessage helper both kept
- ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's
  phantom-row regression test); main's three new tests for
  finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker
  / shouldClearActiveRunRefs are queued as a follow-up TODO inline.
2026-05-14 15:13:38 +08:00
shangxinyu1
2976c76fc3
test: expand Memory and Routines coverage (#1521)
* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows
2026-05-14 14:48:40 +08:00
lakatos
51d1c4e287
ci: skip upstream-only workflows on forks (#1586) 2026-05-14 14:27:23 +08:00
Nagendhra Madishetti
ff569fa50c
feat(daemon): Critique Theater Phase 16 (M-phase rollout ratchet + /api/critique/conformance) (#1499)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* feat(daemon): Critique Theater Phase 16 (M-phase rollout ratchet)

The PR that takes the rollout out of operator-flips-env-vars-by-hand and into the-fleet-conformance-numbers-decide. Stacks on Phase 12 (#1485): the ratchet reads from the conformance harness's daily output, which only exists once Phase 12's metrics + history surface land.

Five surfaces:

1. apps/daemon/src/critique/ratchet.ts (new)
   Pure evaluator. Takes the current RolloutPhase plus a rolling window of ConformanceDay rows and returns one of three decisions: promote, hold, or demote. Spec defaults (14-day window, 0.90 shipped, 0.95 clean-parse) match specs/current/critique-theater.md. Demote floor is half the promote threshold so a single noisy day does not bounce the rollout back; only sustained breakage walks things back. M0 cannot demote and M3 cannot promote, both collapse to hold with an explicit reason string.

2. apps/daemon/src/critique/conformance-history.ts (new)
   JSON-lines persistence at dataDir/conformance/adapter/date.jsonl. Append-only writer + windowed reader. Last entry per (adapter, date) wins so a retry-after-failure cron writes the right answer without a read-modify-write at write time. Malformed lines, missing files, and missing adapter directories all collapse to skip-this-row since a missing day is data missing, not data wrong.

3. apps/daemon/src/server.ts
   GET /api/critique/conformance returns { window, decision }. Tunables come from query string (windowDays, shippedThreshold, cleanParseThreshold) with spec defaults. The recommendation does not auto-flip OD_CRITIQUE_ROLLOUT_PHASE; an operator-driven follow-up consumes the JSON and decides whether to flip or alert.

4. .github/workflows/critique-conformance.yml (new)
   Nightly cron at 03:00 UTC. Builds the daemon, drives the conformance harness against the synthetic-good and synthetic-bad fixtures, and uploads the .od/conformance/ snapshot as a workflow artifact. The schedule sits outside the busy generation window so the cron does not contend with user runs for adapter rate-limit budgets.

5. apps/daemon/tests/critique-ratchet.test.ts + critique-conformance-history.test.ts
   17 cases. Ratchet: 10 cells of the promote / hold / demote matrix. History: 7 round-trip cases.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 17 / 17 new tests green
- Phase 12 metrics + logging + adapter-degraded + conformance + rollout suites (29) untouched and still green

* fix(daemon): three Phase 16 review findings (Codex P1/P2 + lefarcen P1 x3)

1. Duplicate parseRolloutPhase import in server.ts. The new standalone import collided with the existing grouped import; ESM would fail to parse at module load on every daemon startup path. Removed the standalone import; the grouped one already exports parseRolloutPhase.

2. Validation gap in evaluateRollout. A request like ?windowDays=0 fed passingDays >= windowDays = 0 >= 0 = true, returning promote with zero observed days. Now the evaluator rejects non-positive windowDays and out-of-range thresholds at the function entry with an explicit hold reason. The route also clamps query strings before they reach the evaluator (belt + suspenders so a future caller bypassing the route hits the same defense).

3. Missing nightly runner. The workflow called apps/daemon/src/critique/__fixtures__/run-nightly.ts, which the prior PR did not actually add, and || echo masked the failure. Added the runner: drives every synthetic adapter through runAdapterConformance, walks the resulting events for parser_warning to compute cleanParseRate, and writes one ConformanceDay row per adapter via appendConformanceDay. Removed the || echo mask so the workflow fails loudly when the runner throws.

Tests for the validation fix: four new ratchet cases (windowDays=0 holds with no evidence, windowDays=-7 holds, shippedThreshold > 1 holds, cleanParseThreshold < 0 holds). Ratchet suite goes from 10 -> 14 cases.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 33 / 33 critique tests green (14 ratchet, 7 conformance-history, 7 adapter-degraded, 5 conformance)

* test(daemon): explicit NaN regression cases for the ratchet evaluator (PerishCode follow-up on PR #1499)

The Number.isFinite() guard already rejects NaN on every numeric
input, so this is belt-and-suspenders: pinning the behavior so a
future refactor of the guard (a typed parser, a clamp helper, a
relaxed range check) cannot accidentally let NaN through and
surface a zero-evidence promote signal.

Three new assertions inside one case (windowDays=NaN, shippedThreshold=NaN,
cleanParseThreshold=NaN), each asserting hold + the matching
'invalid X' reason string. Ratchet suite goes from 14 -> 15 cases.

* fix(nix): regenerate lockfile + pin pnpmDepsHash for prom-client + @opentelemetry/api (lefarcen P1 on PR #1499)

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-14 11:05:57 +08:00