mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
15 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c45c5c9764
|
fix(ci): align visual selectors and nix hashes (#2471)
* fix(ci): align visual selectors and nix hashes * fix(ci): add strict PR visual verification * fix(ci): repair visual-home captures Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) |
||
|
|
7b1cc16988
|
fix(nix): force http:// scheme on bundled caddy site address (#2485)
A bare `host:port` site address lets Caddy pick the listener scheme by port heuristic, which fights `auto_https off` and surfaces as TLS errors when the browser hits plain HTTP on a non-standard port. Hardcode the `http://` prefix in both the Home Manager and NixOS Caddyfile templates — the bundled proxy is plaintext-only by design, so users who need TLS run their own front-end with `webFrontend.enable = false`. |
||
|
|
80d305858b
|
feat(diagnostics): add one-click log export from Settings → About (#798)
* feat(diagnostics): add one-click log export from Settings → About
Adds a new "Export diagnostics" entry under the About section that bundles
daemon/web/desktop logs, machine info, and recent macOS crash reports into
a zip the user can share when reporting issues.
- Browser hits a new daemon HTTP endpoint and triggers a download.
- Electron uses an IPC bridge with the native save dialog and reveals the
saved file in Finder/Explorer; the Help menu also exposes it as a
fallback when the daemon is unresponsive.
Packaging + redaction lives in a new @open-design/diagnostics package so
both surfaces share it. Sensitive JSON keys, URL query secrets, and the
current user's home path are redacted before packaging.
* build(nix): include packages/diagnostics in daemon build targets
The Nix daemon derivation builds workspace siblings in dependency order
before compiling apps/daemon. Without @open-design/diagnostics in that
list, the daemon TypeScript build fails inside the Nix sandbox with
`Cannot find module '@open-design/diagnostics'` because pnpm install
only creates the symlink — the dist output that the package.json
exports point at isn't produced until each sibling's build script runs.
* build(tools-pack): include @open-design/diagnostics in packaged INTERNAL_PACKAGES
Without this, packaged win/mac/linux builds fail with `npm error 404` when
the post-build `npm install --omit=dev --no-package-lock` step in the
assembled app tries to resolve `@open-design/diagnostics@0.2.0` from the
public npm registry. The package is workspace-private, so it has to be
tarballed via `pnpm pack` and file:-referenced from the assembled
package.json like every other internal workspace dep that daemon/desktop
depend on.
Also wires the package's `pnpm --filter ... build` into the pre-pack
workspace build step so the dist/ exists before pnpm pack runs, and
updates the two test fixtures (`win-app.test.ts`, `workspace-build.test.ts`)
that mirror INTERNAL_PACKAGES.
The diagnostics package itself is repinned to exact dependency versions
already used elsewhere in the workspace (`jszip 3.10.1`, `@types/node
20.19.39`, `esbuild 0.28.0`, `typescript 5.9.3`, `vitest 4.1.6`) so it
passes the new `pnpm guard` exact-version rule and produces a minimal
lockfile diff vs main (additions only, no resolution-string churn).
* fix(diagnostics): include `~` in bearer-token redaction char class
RFC 6750 token68 syntax allows `~`, so tokens like `Authorization: Bearer
abcd~efgh` were only partially matched by `HTTP_AUTH_SCHEME_RE`. The
regex stopped at the first `~`, leaving the tail (`~efgh`) un-redacted in
the exported diagnostics zip — a clear leak since this feature explicitly
generates support bundles for external sharing.
Add `~` to the character class and a regression test.
* fix(diagnostics): only collect renderer.log from desktop
`buildSidecarLogSources` unconditionally added `logs/${app}/renderer.log`
for daemon/web/desktop, but only the desktop runtime writes a renderer
log (see apps/desktop/src/main/runtime.ts) — daemon and web are pure
Node services with no Electron renderer. Every export therefore produced
missing-file placeholders and manifest warnings for the two phantom
paths, polluting the bundle.
Gate the renderer.log source on APP_KEYS.DESKTOP so the daemon-side
collector matches the desktop-side collector in apps/desktop/src/main/
diagnostics.ts:63.
* fix(diagnostics): mirror desktop-side renderer.log gate
The previous fix only updated the daemon-side `buildSidecarLogSources`
in `apps/daemon/src/diagnostics-export.ts`. The desktop-side collector
at `apps/desktop/src/main/diagnostics.ts` had an identical copy of the
same bug that I overlooked: it also unconditionally added
`logs/${appKey}/renderer.log` for daemon/web/desktop, producing
missing-file placeholders + manifest warnings for the two phantom paths
on every desktop-initiated export.
Apply the same `appKey === APP_KEYS.DESKTOP` gate here so both export
entry points (browser via daemon HTTP, Electron via native save dialog)
emit the same clean manifest.
* feat(diagnostics): add `od diagnostics export` CLI subcommand
AGENTS.md's dual-track capability-exposure contract requires every
user-facing feature to ship on both the web UI and the `od` CLI. The
diagnostics export was only reachable through Settings → About and the
desktop Help menu; this commit closes the loop with an `od diagnostics
export [<path>] [--json]` subcommand registered in SUBCOMMAND_MAP.
The CLI is a thin shell over the existing GET /api/diagnostics/export
endpoint — same zip output, same redaction, same crash-report scope.
Defaults to writing `open-design-diagnostics-<timestamp>.zip` in the
current directory; `--output <path>` or a positional arg overrides.
`--json` prints `{path, sizeBytes}` for shell pipelines.
Use cases this unlocks:
- A CI script can `od diagnostics export ~/artifacts/bundle.zip` after
a failed run.
- Bug reporters on headless boxes can grab a bundle without booting
the web UI.
- `od doctor` follow-ups can collect a full snapshot when a probe fails.
* fix(diagnostics): surface non-sidecar launch in manifest warnings
`buildSidecarLogSources()` returns `[]` when the daemon has no sidecar
runtime context, which is the standard `od` (plain) launch path —
`runDaemonCliStartup()` -> `startDaemonRuntime()` does not pass a
runtime. Settings → About and the new `od diagnostics export` previously
reported success but produced a bundle with only the summary JSONs, so
operators could not tell "no logs because plain launch" from "no logs
because something genuinely broke."
- Extend `DiagnosticsContext` with an optional upstream `warnings:
string[]` that `buildManifest` merges into the manifest warnings.
- Emit STANDALONE_LAUNCH_WARNING from the daemon handler when
`options.runtime == null`. The warning names the limitation and
points the user at the sidecar entry points that DO capture logs.
- Add a regression spec at `apps/daemon/tests/diagnostics-export.test.ts`
that drives the handler with `runtime: null` and asserts the warning
surfaces in `summary/manifest.json` (and that `files` is empty so a
user reading the bundle does not confuse "no log sources" with
"missing files").
|
||
|
|
60bd58a1f3
|
fix(nix): prebuild host package for web build (#2280)
Some checks failed
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 2s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Preflight (push) Failing after 2s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 2s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 2s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 2s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
|
||
|
|
bb13eee765
|
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries * chore(ci): tighten measured workspace steps * chore(release): tighten beta setup steps * chore(release): slim beta windows smoke * chore(ci): shard daemon tests * chore(ci): harden runtime trace lookup * chore(release): avoid mac pnpm cache in beta * chore(ci): split critical playwright checks * chore(release): publish beta platforms from builders * test(e2e): update beta release workflow expectation * chore(ci): stop gating PRs on nix check * fix(release): keep beta latest complete |
||
|
|
bd48c597b0
|
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation |
||
|
|
14cff69435 |
fix(nix): refresh pnpmDepsHash after merging main
The merge brought new entries into pnpm-lock.yaml (Leonardo.ai provider, custom select primitive, Critique Theater wireup, etc.), so the vendored pnpm store hash changed. CI computed the new hash from the freshly built fixed-output derivation. specified: sha256-mzGiD5K1l5SEFpQ++L4wq775ne+ViG4ruXYZYEdT6zQ= (pre-merge) got: sha256-lROdH5HgKFf3R7DYGbc8n/GrmINwLbfVwC4Xp7SrHN4= (post-merge) |
||
|
|
545aed642e |
Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0
# Conflicts: # nix/package-daemon.nix # scripts/postinstall.mjs |
||
|
|
883598f556 | Build registry protocol in packaged workspaces | ||
|
|
dbfe08eaf3 | Update Nix pnpm deps hash | ||
|
|
76defffb93
|
Garnet hemisphere (#1702)
* feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec |
||
|
|
38a5ab69e6
|
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)
Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.
Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
sticky against further lifecycle events for the same run, except
for parser_warning which can land late and is recorded in a side
channel without changing phase.
- A new run_started for a different runId at any time discards the
prior state and reboots, so the UI can launch consecutive runs
without an explicit reset action.
- Events whose runId does not match the active run return the same
state reference, so React's useReducer doesn't re-render
subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
so an out-of-order panelist_dim for round 1 arriving after a
round 2 dim does not corrupt the round 2 bucket.
Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.
* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)
createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.
useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.
Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
shape back to PanelEvent; malformed JSON is swallowed and does
not stop the stream; exponential backoff schedule and ready-reset
semantics are pinned with a setTimeout seam; close() cancels
pending reconnects and shuts the live source; no-op fallback
when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
reducer driven by synthetic actions, no connection when disabled
or projectId is null, clean close on unmount, projectId change
reopens cleanly.
* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)
Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.
speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
can watch the run unfold.
- paused parses the transcript but holds events back until the
caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
feature, currently treated as instant; replay timestamps are not
yet persisted with each event so honest pacing requires a
follow-up Phase 7+ task.
gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.
Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
still land.
- .gz transcripts route through the gunzip seam.
Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.
* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)
Two P1 fixes from lefarcen's review on PR #1307:
SSE payload override
`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.
Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.
Replay pause/resume
`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).
Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
the next scheduled timer, flips to instant, and confirms the
remaining transcript drains exactly once (cursor was preserved)
Doc consistency
The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.
Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.
* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)
* feat(web): Theater PanelistLane component (Phase 8.1)
* feat(web): Theater ScoreTicker component (Phase 8.2)
* feat(web): Theater RoundDivider component (Phase 8.3)
* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)
* feat(web): Theater TheaterDegraded chip (Phase 8.5)
* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)
* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)
* feat(web): Theater TheaterStage top-level container (Phase 8.8)
* feat(web): Theater CSS using existing semantic tokens (no hex literals)
* feat(web): Theater public exports barrel
* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)
Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.
State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
Host hooks dispatch it when their gating prop changes so a stale
run from a prior project / transcript cannot bleed into the next
context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
connection effect, so a workspace switch from project A (which
streamed a critique) to project B clears the reducer before the
new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
parse effect, so transcriptUrl swaps (including swap-to-null after
a replay reached `shipped`) lift the reducer back to idle before
the new fetch starts.
SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
check after the cheap `isPanelEvent` predicate. A
`critique.ship` frame missing `composite` / `round` / `status` /
`artifactRef` is rejected before reaching the reducer, so
TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
Every variant's required fields are validated: run_started
(protocolVersion, non-empty cast, maxRounds, threshold, scale),
panelist_* (round, role, plus variant-specific shape), round_end
(round, composite, mustFix, decision in {continue,ship}, reason),
ship (round, composite, status, artifactRef.{projectId,artifactId},
summary), degraded (reason, adapter), interrupted (bestRound,
composite), failed (cause), parser_warning (kind, position).
Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
the in-progress lane the instant the tag opens. Before this, a
stream that emitted only `panelist_open` after `run_started` left
`rounds = []` and the UI rendered no current round until a later
`panelist_dim` arrived.
Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
`var(--purple, var(--accent))`. `--purple` is actually defined
across the design systems; `--magenta` is not, so Brand was
silently falling through to `--accent` and looking identical to
Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
interrupted-collapse copy ("Interrupted at round N, best
composite X.X"). Previously the interrupted branch reused
`shippedSummary` and the UI read "Shipped at round..." for a run
that specifically did not ship. Native value in en + zh-CN; other
locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
hardcoded `theater-degraded-heading`, so two chips rendered on
the same page (chat history with multiple completed runs) keep
their aria-labelledby references unambiguous.
Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.
Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales
* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)
* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)
* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)
Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.
Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
/api/projects/:id/critique/:runId/interrupt alongside the
optimistic local dispatch. Before this fix, clicking Interrupt
only flipped the React state to interrupted while the daemon job
kept running. The fetch is best-effort: a 404 (endpoint not wired
yet, lands in Phase 15) is swallowed with a dev-mode console.warn
so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
and simulate the "daemon not ready yet" path. Two tests pin both:
the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
rejected fetch still flips the UI.
interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
one we saw; when it changes, interruptPending is cleared. A user
who interrupts run-1 and then triggers run-2 from the same mount
now gets a fresh, enabled kill button instead of one stuck in
"Interrupting…". Pinned by a new mount test.
Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
input, textarea, select, or contenteditable element is ignored
(and any ancestor of those via closest() is treated the same
way). Body-level focus still fires the keybind so the Theater
area's affordance keeps working. Four new tests cover textarea,
input, contenteditable, and the body-focus positive case.
userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
critiqueTheater.userFacingName key so the "Design Jury" label can
be renamed without touching code. Phase 8 introduced
critiqueTheater.title by mistake; renamed across types.ts, en.ts,
zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
TheaterStage.tsx. The locale alignment test stays green.
Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.
* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)
In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.
Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.
7/7 vitest cases green.
* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)
Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.
* feat(daemon): adapter conformance harness (Phase 10.3)
runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.
5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.
* test(e2e): Critique Theater Playwright suite (Phase 11)
Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.
Coverage:
1. Happy path: a single mounted theater plays the full
fixture (1 run_started, 5 panelists open / dim / must_fix /
close, 1 round_end, 1 ship) and ends on the score badge.
2. Interrupt mid-run: the panelist that is open at the time
the interrupt button is clicked closes with an interrupted
marker and the transcript freezes there.
3. Visual regression at 375x720 mobile.
4. Visual regression at 768x1024 tablet.
5. Visual regression at 1280x800 desktop.
6. A11y role tree: the theater region exposes a labelled
landmark, each panelist lane is a group with an accessible
name, the score is a status live region.
All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.
* test(web): reducer p99 bench at 10k iterations (Phase 13.1)
Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.
The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.
* test(web): critique surface coverage walker (Phase 13.2)
Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.
62 assertions, one per symbol per corpus.
* docs: Critique Theater user guide (Phase 14.1)
Seven sections aimed at end users (not contributors):
1. What is Design Jury
2. How it works (the five panelists, auto-converging rounds,
the composite formula)
3. Settings (the M1 toggle and what it does)
4. Reading the score badge
5. Replay surface
6. Troubleshooting (degraded, interrupted, failed)
7. FAQ
The composite formula is documented as
designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.
* docs(daemon): critique module AGENTS map (Phase 14.2)
Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.
* docs(web): Theater module AGENTS map (Phase 14.3)
Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.
* feat(daemon): rollout flag resolver (Phase 15.1)
Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:
1. Skill-level policy (required wins, opt-out wins inversely)
2. Per-project override from the Settings toggle
3. OD_CRITIQUE_ENABLED env override
4. Rollout phase default
M0 dark-launch false
M1 settings only false (toggle is off until the user flips it)
M2 per-skill true if skill opted in
M3 global default true
OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.
10/10 vitest cases green covering every cell of the matrix.
* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)
React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:
- the platform storage event (cross-tab)
- a open-design:critique-theater-toggle CustomEvent (same-tab)
Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.
setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.
The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.
6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.
* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)
lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:
1. Returns false on a stored JSON value that parses to an array
(non-object). Catches a regression where the guard treats
anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
typeof null === 'object' in JS, so the guard has to check null
explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
disabled storage / SecurityError). The hook must swallow and
return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
CustomEvent when localStorage.setItem throws (quota exceeded /
disabled storage). The dispatch path is the in-session
broadcast that keeps every mounted hook coherent even when
persistence is unavailable; verified by mounting two probes
and asserting both flip after the setter is called with a
throwing setItem.
10/10 vitest cases green (6 existing + 4 new).
* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)
Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in
|
||
|
|
a1859b7f40 |
fix(nix): update pnpmDepsHash for merged lockfile
The merged pnpm-lock.yaml (release/v0.7.0 contents on top of main) has a different hash than either parent. Adopt the value Nix computed in CI. |
||
|
|
f621dbbfea
|
feat(web): Add Tailwind foundation (#1388) | ||
|
|
c61ba320fd
|
feat(nix): Add official flake with home-manager and NixOS support (#402)
* nix: add official flake with home-manager and nixos modules
* Pin pnpm version
* Format README.md
* Populate PATH files to discover installed CLIs
* Revert "Populate PATH files to discover installed CLIs"
This reverts commit 18d88781a88b8781913cf5a8b680dfb38eabf7e4.
* Fix missing sqlite issue
* Fix system issue
* Reapply "Populate PATH files to discover installed CLIs"
This reverts commit
|