🎨 Local-first, open-source alternative to Anthropic's Claude Design. 19 Skills · 71 brand-grade Design Systems 🖼 Generate web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sandboxed preview · HTML/PDF/PPTX/MP4 export 🤖 Runs on Claude Code / Codex / Cursor / Gemini / OpenCode / Qwen / Copilot / Hermes / Kimi CLI.
Find a file
Nagendhra Madishetti 38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
.github Merge origin/main into release/v0.7.0 to prepare merge-back PR 2026-05-13 18:19:47 +08:00
.vaunt feat: enable Vaunt contributor recognition with 5-tier system (#908) 2026-05-08 17:44:12 +08:00
apps feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
assets fix: remove Trump pet from bundled community pets (#1103) 2026-05-10 12:11:00 +08:00
craft refine typography-hierarchy craft docs — clarify edge cases and make lint measurable (#979) 2026-05-09 08:13:35 +08:00
deploy docs(deploy): document Colima build swap helper (#967) 2026-05-09 02:17:22 +08:00
design-systems feat(design-systems): add structured tokens.css schema (default + kami) (#1231) 2026-05-11 22:23:34 +08:00
design-templates feat(landing-page): split catalog into per-facet pages + auto-deploy on content changes (#1158) 2026-05-12 19:24:50 +08:00
docs feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
e2e feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
nix feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
packages [codex] Add Cursor Agent auth diagnostics (#1538) 2026-05-13 20:25:34 +08:00
prompt-templates feat(hyperframes): land HTML-in-Canvas across web + skills (#866) 2026-05-11 15:45:12 +08:00
scripts Revert "fix(web): restore consistent app header layout (#1432)" 2026-05-13 11:20:16 +08:00
skills Implement manual edit inspector (#1448) 2026-05-13 13:25:58 +08:00
specs Make de/fr/ru content i18n optional (#1511) 2026-05-13 12:17:17 +08:00
story Refactor project name from "Open Claude Design" to "Open Design" (#1) 2026-04-28 16:03:35 +08:00
templates add otd-operations-brief live-artifact template (#794) 2026-05-08 12:53:24 +08:00
tools feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
.dockerignore Add Docker Compose deployment workflow (#65) 2026-05-08 11:51:51 +08:00
.gitignore Merge origin/main into release/v0.7.0 to prepare merge-back PR 2026-05-13 18:19:47 +08:00
.node-version add support for VP_HOME environment variable in agent resolution (#859) 2026-05-08 15:14:37 +08:00
AGENTS.md docs(pr): require user-perspective description and surface area (#1520) 2026-05-13 15:28:05 +08:00
CHANGELOG.md Revert "chore(changelog): note #1432 app header layout fix" 2026-05-13 11:20:16 +08:00
CLAUDE.md docs: add git commit co-author policy (#131) 2026-04-30 15:08:22 +08:00
CONTRIBUTING.de.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
CONTRIBUTING.fr.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
CONTRIBUTING.ja-JP.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
CONTRIBUTING.md docs(pr): require user-perspective description and surface area (#1520) 2026-05-13 15:28:05 +08:00
CONTRIBUTING.pt-BR.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
CONTRIBUTING.zh-CN.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
edited_image.png feat(editorial-collage): introduce Atelier Zero style landing page as… (#366) 2026-05-04 13:39:58 +08:00
flake.lock feat(nix): Add official flake with home-manager and NixOS support (#402) 2026-05-09 23:50:16 +08:00
flake.nix fix: set writable OD_DATA_DIR default for nix run (#1159) 2026-05-11 10:52:53 +08:00
LICENSE Refactor project name from "Open Claude Design" to "Open Design" (#1) 2026-04-28 16:03:35 +08:00
MAINTAINERS.de.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
MAINTAINERS.fr.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
MAINTAINERS.ja-JP.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
MAINTAINERS.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
MAINTAINERS.pt-BR.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
MAINTAINERS.zh-CN.md docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290) 2026-05-11 20:19:55 +08:00
package.json Merge origin/main into release/v0.7.0 to prepare merge-back PR 2026-05-13 18:19:47 +08:00
pnpm-lock.yaml feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485) 2026-05-13 22:11:27 +08:00
pnpm-workspace.yaml Refresh desktop integration control plane (#123) 2026-04-30 14:23:53 +08:00
QUICKSTART.de.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
QUICKSTART.fr.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
QUICKSTART.ja-JP.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
QUICKSTART.md Revert "fix(web): restore consistent app header layout (#1432)" 2026-05-13 11:20:16 +08:00
QUICKSTART.pt-BR.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
QUICKSTART.zh-CN.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
QUICKSTART.zh-TW.md docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753) 2026-05-09 21:19:08 +08:00
README.ar.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.de.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.es.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.fr.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.ja-JP.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.ko.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.md docs: add Windows launcher note (#1546) 2026-05-13 16:46:42 +08:00
README.pt-BR.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.ru.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.tr.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.uk.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.zh-CN.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
README.zh-TW.md docs(readme): refresh contributors wall (#1494) 2026-05-13 13:26:31 +08:00
TRANSLATIONS.md i18n: add full Thai translation (th) (#1018) 2026-05-09 15:19:47 +08:00
vercel.json feat: add vercel.json configuration file (#169) 2026-04-30 22:54:47 +08:00

Open Design

The open-source alternative to Claude Design. Local-first, web-deployable, BYOK at every layer — 16 coding-agent CLIs auto-detected on your PATH (Claude Code, Codex, Devin for Terminal, Cursor Agent, Gemini CLI, OpenCode, Qwen, Qoder CLI, GitHub Copilot CLI, Hermes, Kimi, Pi, Kiro, Kilo, Mistral Vibe, DeepSeek TUI) become the design engine, driven by 31 composable Skills and 72 brand-grade Design Systems. No CLI? An OpenAI-compatible BYOK proxy is the same loop minus the spawn.

Open Design — editorial cover: design with the agent on your laptop

Stars Forks Issues Pull Requests Contributors Commit activity Last commit

Download Latest release License Agents Design systems Skills Discord Follow @nexudotio on X Quickstart

English · Español · Português (Brasil) · Deutsch · Français · 简体中文 · 繁體中文 · 한국어 · 日本語 · العربية · Русский · Українська · Türkçe


Why this exists

Anthropic's Claude Design (released 2026-04-17, Opus 4.7) showed what happens when an LLM stops writing prose and starts shipping design artifacts. It went viral — and stayed closed-source, paid-only, cloud-only, locked to Anthropic's model and Anthropic's skills. There is no checkout, no self-host, no Vercel deploy, no swap-in-your-own-agent.

Open Design (OD) is the open-source alternative. Same loop, same artifact-first mental model, none of the lock-in. We don't ship an agent — the strongest coding agents already live on your laptop. We wire them into a skill-driven design workflow that runs locally with pnpm tools-dev, can deploy the web layer to Vercel, and stays BYOK at every layer.

Type make me a magazine-style pitch deck for our seed round. The interactive question form pops up before the model improvises a single pixel. The agent picks one of five curated visual directions. A live TodoWrite plan streams into the UI. The daemon builds a real on-disk project folder with a seed template, layout library, and self-check checklist. The agent reads them — pre-flight enforced — runs a five-dimensional critique against its own output, and emits a single <artifact> that renders in a sandboxed iframe seconds later.

That's not "AI tries to design something". That's an AI that has been trained, by the prompt stack, to behave like a senior designer with a working filesystem, a deterministic palette library, and a checklist culture — exactly the bar Claude Design set, but open and yours.

OD stands on four open-source shoulders:

  • alchaincyf/huashu-design — the design-philosophy compass. Junior-Designer workflow, the 5-step brand-asset protocol, the anti-AI-slop checklist, the 5-dimensional self-critique, and the "5 schools × 20 design philosophies" idea behind our direction picker — all distilled into apps/daemon/src/prompts/discovery.ts.
  • op7418/guizang-ppt-skill — the deck mode. Bundled verbatim under skills/guizang-ppt/ with original LICENSE preserved; magazine-style layouts, WebGL hero, P0/P1/P2 checklists.
  • OpenCoworkAI/open-codesign — the UX north star and our closest peer. The first open-source Claude-Design alternative. We borrow its streaming-artifact loop, its sandboxed-iframe preview pattern (vendored React 18 + Babel), its live agent panel (todos + tool calls + interruptible generation), and its five-format export list (HTML / PDF / PPTX / ZIP / Markdown). We deliberately diverge on form factor — they are a desktop Electron app bundling pi-ai; we are a web app + local daemon that delegates to your existing CLI.
  • multica-ai/multica — the daemon-and-runtime architecture. PATH-scan agent detection, the local daemon as the only privileged process, the agent-as-teammate worldview.

At a glance

What you get
Coding-agent CLIs (16) Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on PATH, swap with one click
BYOK fallback Protocol-specific API proxy at /api/proxy/{anthropic,openai,azure,google}/stream — paste baseUrl + apiKey + model, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini, and the daemon normalizes SSE back to the same chat stream. Internal-IP/SSRF blocked at the daemon edge.
Design systems built-in 129 — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from awesome-design-md, plus 57 design skills from awesome-design-skills added directly under design-systems/
Skills built-in 31 — 27 in prototype mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in deck mode (guizang-ppt · simple-deck · replit-deck · weekly-update). Grouped in the picker by scenario: design / marketing / operation / engineering / product / finance / hr / sale / personal.
Media generation Image · video · audio surfaces ship alongside the design loop. gpt-image-2 (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · Seedance 2.0 (ByteDance) for cinematic 15s text-to-video and image-to-video · HyperFrames (heygen-com/hyperframes) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). 93 ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under prompt-templates/, with preview thumbnails and source attribution. Same chat surface as code; outputs a real .mp4 / .png chip into the project workspace.
Visual directions 5 curated schools (Editorial Monocle · Modern Minimal · Warm Soft · Tech Utility · Brutalist Experimental) — each ships a deterministic OKLch palette + font stack (apps/daemon/src/prompts/directions.ts)
Device frames iPhone 15 Pro · Pixel · iPad Pro · MacBook · Browser Chrome — pixel-accurate, shared across skills under assets/frames/
Agent runtime Local daemon spawns the CLI in your project folder — agent gets real Read, Write, Bash, WebFetch against a real on-disk environment, with Windows ENAMETOOLONG fallbacks (stdin / prompt-file) on every adapter
Imports Drop a Claude Design export ZIP onto the welcome dialog — POST /api/import/claude-design parses it into a real project so your agent can keep editing where Anthropic left off
Persistence SQLite at .od/app.sqlite: projects · conversations · messages · tabs · saved templates. Reopen tomorrow, todo card and open files are exactly where you left them.
Lifecycle One entry point: pnpm tools-dev (start / stop / run / status / logs / inspect / check) — boots daemon + web (+ desktop) under typed sidecar stamps
Desktop Optional Electron shell with sandboxed renderer + sidecar IPC (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN) — drives tools-dev inspect desktop screenshot for E2E
Deployable to Local (pnpm tools-dev) · Vercel web layer · packaged Electron desktop app for macOS (Apple Silicon) and Windows (x64) — download from open-design.ai or the latest release
License Apache-2.0

Demo

01 · Entry view
Entry view — pick a skill, pick a design system, type the brief. The same surface for prototypes, decks, mobile apps, dashboards, and editorial pages.
02 · Turn-1 discovery form
Turn-1 discovery form — before the model writes a pixel, OD locks the brief: surface, audience, tone, brand context, scale. 30 seconds of radios beats 30 minutes of redirects.
03 · Direction picker
Direction picker — when the user has no brand, the agent emits a second form with 5 curated directions (Monocle / Modern Minimal / Tech Utility / Brutalist / Soft Warm). One radio click → a deterministic palette + font stack, no model freestyle.
04 · Live todo progress
Live todo progress — the agent's plan streams as a live card. in_progresscompleted updates land in real time. The user can redirect cheaply, mid-flight.
05 · Sandboxed preview
Sandboxed preview — every <artifact> renders in a clean srcdoc iframe. Editable in place via the file workspace; downloadable as HTML, PDF, ZIP.
06 · 72-system library
72-system library — every product system shows its 4-color signature. Click for the full DESIGN.md, swatch grid, and live showcase.
07 · Magazine deck
Deck mode (guizang-ppt) — the bundled guizang-ppt-skill drops in unchanged. Magazine layouts, WebGL hero backgrounds, single-file HTML output, PDF export.
08 · Mobile prototype
Mobile prototype — pixel-accurate iPhone 15 Pro chrome (Dynamic Island, status bar SVGs, home indicator). Multi-screen prototypes use the shared /frames/ assets so the agent never re-draws a phone.

Skills

31 skills ship in the box. Each is a folder under skills/ following the Claude Code SKILL.md convention with an extended od: frontmatter that the daemon parses verbatim — mode, platform, scenario, preview.type, design_system.requires, default_for, featured, fidelity, speaker_notes, animations, example_prompt (apps/daemon/src/skills.ts).

Two top-level modes carry the catalog: prototype (27 skills — anything that renders as a single-page artifact, from a magazine landing to a phone screen to a PM spec doc) and deck (4 skills — horizontal-swipe presentations with deck-framework chrome). The scenario field is what the picker groups them by: design · marketing · operation · engineering · product · finance · hr · sale · personal.

Showcase examples

The visually distinctive skills you'll most likely run first. Each ships a real example.html you can open straight from the repo to see exactly what the agent will produce — no auth, no setup.

dating-web
dating-web · prototype
Consumer dating / matchmaking dashboard — left rail nav, ticker bar, KPIs, 30-day mutual-matches chart, editorial typography.
digital-eguide
digital-eguide · template
Two-spread digital e-guide — cover (title, author, TOC teaser) + lesson spread with pull-quote and step list. Creator / lifestyle tone.
email-marketing
email-marketing · prototype
Brand product-launch HTML email — masthead, hero image, headline lockup, CTA, specs grid. Centered single-column, table-fallback safe.
gamified-app
gamified-app · prototype
Three-frame gamified mobile-app prototype on a dark showcase stage — cover, today's quests with XP ribbons + level bar, quest detail.
mobile-onboarding
mobile-onboarding · prototype
Three-frame mobile onboarding flow — splash, value-prop, sign-in. Status bar, swipe dots, primary CTA.
motion-frames
motion-frames · prototype
Single-frame motion-design hero with looping CSS animations — rotating type ring, animated globe, ticking timer. Hand-off ready for HyperFrames.
social-carousel
social-carousel · prototype
Three-card 1080×1080 social-media carousel — cinematic panels with display headlines that connect across the series, brand mark, loop affordance.
sprite-animation
sprite-animation · prototype
Pixel / 8-bit animated explainer slide — full-bleed cream stage, animated pixel mascot, kinetic Japanese display type, looping CSS keyframes.

Design & marketing surfaces (prototype mode)

Skill Platform Scenario What it produces
web-prototype desktop design Single-page HTML — landings, marketing, hero pages (default for prototype)
saas-landing desktop marketing Hero / features / pricing / CTA marketing layout
dashboard desktop operation Admin / analytics with sidebar + dense data layout
pricing-page desktop sale Standalone pricing + comparison tables
docs-page desktop engineering 3-column documentation layout
blog-post desktop marketing Editorial long-form
mobile-app mobile design iPhone 15 Pro / Pixel framed app screen(s)
mobile-onboarding mobile design Multi-screen mobile onboarding flow (splash · value-prop · sign-in)
gamified-app mobile personal Three-frame gamified mobile-app prototype
email-marketing desktop marketing Brand product-launch HTML email (table-fallback safe)
social-carousel desktop marketing 3-card 1080×1080 social carousel
magazine-poster desktop marketing Single-page magazine-style poster
motion-frames desktop marketing Motion-design hero with looping CSS animations
sprite-animation desktop marketing Pixel / 8-bit animated explainer slide
dating-web desktop personal Consumer dating dashboard mockup
digital-eguide desktop marketing Two-spread digital e-guide (cover + lesson)
wireframe-sketch desktop design Hand-drawn ideation sketch — for the "show something visible early" pass
critique desktop design Five-dimensional self-critique scoresheet (Philosophy · Hierarchy · Detail · Function · Innovation)
tweaks desktop design AI-emitted tweaks panel — the model surfaces the parameters worth nudging

Deck surfaces (deck mode)

Skill Default for What it produces
guizang-ppt default for deck Magazine-style web PPT — bundled verbatim from op7418/guizang-ppt-skill, original LICENSE preserved
simple-deck Minimal horizontal-swipe deck
replit-deck Product-walkthrough deck (Replit-style)
weekly-update Team weekly cadence as a swipe deck (progress · blockers · next)

Office & operations surfaces (prototype mode, document-flavored scenarios)

Skill Scenario What it produces
pm-spec product PM specification doc with TOC + decision log
team-okrs product OKR scoresheet
meeting-notes operation Meeting decision log
kanban-board operation Board snapshot
eng-runbook engineering Incident runbook
finance-report finance Exec finance summary
invoice finance Single-page invoice
hr-onboarding hr Role onboarding plan

Adding a skill takes one folder. Read docs/skills-protocol.md for the extended frontmatter, fork an existing skill, restart the daemon, it appears in the picker. The catalog endpoint is GET /api/skills; per-skill seed assembly (template + side-file references) lives at GET /api/skills/:id/example.

Six load-bearing ideas

1 · We don't ship an agent. Yours is good enough.

The daemon scans your PATH for claude, codex, devin, cursor-agent, gemini, opencode, qwen, qodercli, copilot, hermes, kimi, pi, kiro-cli, kilo, vibe-acp, and deepseek on startup. Whichever ones it finds become candidate design engines — driven over stdio with one adapter per CLI, swappable from the model picker. Inspired by multica and cc-switch. No CLI installed? The API mode is the same pipeline minus the spawn — choose Anthropic, OpenAI-compatible, Azure OpenAI, or Google Gemini and the daemon forwards normalized SSE chunks back. Loopback is allowed for local LLM providers such as Ollama and LM Studio; non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets are rejected at the daemon edge.

2 · Skills are files, not plugins.

Following Claude Code's SKILL.md convention, each skill is SKILL.md + assets/ + references/. Drop a folder into skills/, restart the daemon, it appears in the picker. The bundled magazine-web-ppt is op7418/guizang-ppt-skill committed verbatim — original license preserved, attribution preserved.

3 · Design Systems are portable Markdown, not theme JSON.

The 9-section DESIGN.md schema from VoltAgent/awesome-design-md — color, typography, spacing, layout, components, motion, voice, brand, anti-patterns. Every artifact reads from the active system. Switch system → next render uses the new tokens. The dropdown ships with Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Apple, Anthropic, Cursor, Supabase, Figma, Resend, Raycast, Lovable, Cohere, Mistral, ElevenLabs, X.AI, Spotify, Webflow, Sanity, PostHog, Sentry, MongoDB, ClickHouse, Cal, Replicate, Clay, Composio, Xiaohongshu… — plus 57 design skills sourced from awesome-design-skills.

4 · The interactive question form prevents 80% of redirects.

OD's prompt stack hard-codes a RULE 1: every fresh design brief begins with a <question-form id="discovery"> instead of code. Surface · audience · tone · brand context · scale · constraints. A long brief still leaves design decisions open — visual tone, color stance, scale — exactly the things the form locks down in 30 seconds. The cost of a wrong direction is one chat round, not one finished deck.

This is the Junior-Designer mode distilled from huashu-design: batch the questions up front, show something visible early (even a wireframe with grey blocks), let the user redirect cheaply. Combined with the brand-asset protocol (locate · download · grep hex · write brand-spec.md · vocalise), it's the single biggest reason output stops feeling like AI freestyle and starts feeling like a designer who paid attention before painting.

5 · The daemon makes the agent feel like it's on your laptop, because it is.

The daemon spawns the CLI with cwd set to the project's artifact folder under .od/projects/<id>/. The agent gets Read, Write, Bash, WebFetch — real tools against a real filesystem. It can Read the skill's assets/template.html, grep your CSS for hex values, write a brand-spec.md, drop generated images, and produce .pptx / .zip / .pdf files that show up in the file workspace as download chips when the turn ends. Sessions, conversations, messages, tabs persist in a local SQLite DB — pop the project open tomorrow and the agent's todo card is right where you left it.

6 · The prompt stack is the product.

What you compose at send time isn't "system + user". It's:

DISCOVERY directives  (turn-1 form, turn-2 brand branch, TodoWrite, 5-dim critique)
  + identity charter   (OFFICIAL_DESIGNER_PROMPT, anti-AI-slop, junior-pass)
  + active DESIGN.md   (72 systems available)
  + active SKILL.md    (31 skills available)
  + project metadata   (kind, fidelity, speakerNotes, animations, inspiration ids)
  + skill side files   (auto-injected pre-flight: read assets/template.html + references/*.md)
  + (deck kind, no skill seed) DECK_FRAMEWORK_DIRECTIVE   (nav / counter / scroll / print)

Every layer is composable. Every layer is a file you can edit. Read apps/daemon/src/prompts/system.ts and apps/daemon/src/prompts/discovery.ts to see the actual contract.

Architecture

┌────────────────────── browser (Next.js 16) ──────────────────────┐
│  chat · file workspace · iframe preview · settings · imports     │
└──────────────┬───────────────────────────────────┬───────────────┘
               │ /api/* (rewritten in dev)          │
               ▼                                    ▼
   ┌──────────────────────────────────┐   /api/proxy/{provider}/stream (SSE)
   │  Local daemon (Express + SQLite) │   ─→ any OpenAI-compat
   │                                  │       endpoint (BYOK)
   │  /api/agents          /api/skills│       w/ SSRF blocking
   │  /api/design-systems  /api/projects/…
   │  /api/chat (SSE)      /api/proxy/{provider}/stream (SSE)
   │  /api/templates       /api/import/claude-design
   │  /api/artifacts/save  /api/artifacts/lint
   │  /api/upload          /api/projects/:id/files…
   │  /artifacts (static)  /frames (static)
   │
   │  optional: sidecar IPC at /tmp/open-design/ipc/<ns>/<app>.sock
   │  (STATUS · EVAL · SCREENSHOT · CONSOLE · CLICK · SHUTDOWN)
   └─────────┬────────────────────────┘
             │ spawn(cli, [...], { cwd: .od/projects/<id> })
             ▼
   ┌──────────────────────────────────────────────────────────────────┐
   │  claude · codex · devin (ACP) · gemini · opencode · cursor-agent │
   │  qwen · qoder · copilot · hermes (ACP) · kimi (ACP) · pi (RPC) · kiro (ACP) · kilo (ACP) · vibe (ACP) · deepseek  │
   │  reads SKILL.md + DESIGN.md, writes artifacts to disk            │
   └──────────────────────────────────────────────────────────────────┘
Layer Stack
Frontend Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable
Daemon Node 24 · Express · SSE streaming · better-sqlite3; tables: projects · conversations · messages · tabs · templates
Agent transport child_process.spawn; typed-event parsers for claude-stream-json (Claude Code), qoder-stream-json (Qoder CLI), copilot-stream-json (Copilot), json-event-stream per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), acp-json-rpc (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), pi-rpc (Pi via stdio JSON-RPC), plain (Qwen Code / DeepSeek TUI)
BYOK proxy POST /api/proxy/{anthropic,openai,azure,google}/stream → provider-specific upstream APIs, normalized delta/end/error SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge
Storage Plain files in .od/projects/<id>/ + SQLite at .od/app.sqlite + credentials at .od/media-config.json (gitignored, auto-created). OD_DATA_DIR=<dir> relocates all daemon data (used for test isolation and read-only-install setups); OD_MEDIA_CONFIG_DIR=<dir> further narrows the override to just media-config.json for setups that want to keep API keys outside the data dir
Preview Sandboxed iframe via srcdoc + per-skill <artifact> parser (apps/web/src/artifacts/parser.ts)
Export HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown
Lifecycle pnpm tools-dev start | stop | run | status | logs | inspect | check; ports via --daemon-port / --web-port, namespaces via --namespace
Desktop (optional) Electron shell — discovers the web URL through sidecar IPC, no port guessing; same STATUS/EVAL/SCREENSHOT/CONSOLE/CLICK/SHUTDOWN channel powers tools-dev inspect desktop … for E2E

Quickstart

Download the desktop app (no build required)

The fastest way to try Open Design is the prebuilt desktop app — no Node, no pnpm, no clone:

Run with Docker

Run Open Design without installing Node.js or pnpm locally.

Requirements

  • Docker Desktop
  • Docker Compose v2

Verify Docker:

docker compose version

Start Open Design

git clone https://github.com/nexu-io/open-design.git
cd open-design/deploy
docker compose up -d

Open in your browser:

http://localhost:7456

Common Commands

# View logs
docker compose logs -f

# Restart containers
docker compose restart

# Stop containers
docker compose down

# Pull latest image
docker compose pull
docker compose up -d

For advanced Docker configuration and environment variables, see QUICKSTART.md.

Run from source

git clone https://github.com/nexu-io/open-design.git
cd open-design
corepack enable
corepack pnpm --version   # should print 10.33.2
pnpm install
pnpm tools-dev run web
# open the web URL printed by tools-dev

Environment requirements: Node ~24 and pnpm 10.33.x. nvm/fnm are optional helpers only; if you use one, run nvm install 24 && nvm use 24 or fnm install 24 && fnm use 24 before pnpm install.

Windows users can follow docs/windows-troubleshooting.md for the native setup path and a tiny double-click launcher.

For desktop/background startup, fixed-port restarts, and media generation dispatcher checks (OD_BIN, OD_DAEMON_URL, apps/daemon/dist/cli.js), see QUICKSTART.md.

The first load:

  1. Detects which agent CLIs you have on PATH and picks one automatically.
  2. Loads 31 skills + 72 design systems.
  3. Pops the welcome dialog so you can paste an Anthropic key (only needed for the BYOK fallback path).
  4. Auto-creates ./.od/ — the local runtime folder for the SQLite project DB, per-project artifacts, and saved renders. There is no od init step; the daemon mkdirs everything it needs on boot.

Type a prompt, hit Send, watch the question form arrive, fill it, watch the todo card stream, watch the artifact render. Click Save to disk or download as a project ZIP.

First-run state (./.od/)

The daemon owns one hidden folder at the repo root. Everything in it is gitignored and machine-local — never commit it.

.od/
├── app.sqlite                 ← projects · conversations · messages · open tabs
├── artifacts/                 ← one-off "Save to disk" renders (timestamped)
└── projects/<id>/             ← per-project working dir, also the agent's cwd
Want to… Do this
Inspect what's in there ls -la .od && sqlite3 .od/app.sqlite '.tables'
Reset to a clean slate pnpm tools-dev stop, rm -rf .od, run pnpm tools-dev run web again
Move it elsewhere OD_DATA_DIR=<absolute-or-relative-path> pnpm tools-dev run web — the daemon resolves ~/ and anchors relative paths to the repo root. OD_MEDIA_CONFIG_DIR=<dir> narrows the override to just media-config.json if you want credentials in a separate location.

Migrating a pre-desktop-app .od/ into the installed Desktop app

If you ran the repo first and only later installed the packaged Desktop app, the two writers point at different roots:

  • Repo dev-server (pnpm tools-dev start web) writes to <repo-root>/.od/.

  • Installed Desktop app writes under <appData>/Open Design/namespaces/<channel>/data/, where <appData> is Electron's per-OS app-data base (everything before the Open Design segment that app.getPath("userData") already includes). The channel suffix is platform-specific — the release workflows append -win/-linux:

    Platform <appData> (Electron appData base) Stable channel Beta channel
    macOS ~/Library/Application Support release-stable release-beta
    Windows %APPDATA% (= %USERPROFILE%\AppData\Roaming) release-stable-win release-beta-win
    Linux $XDG_CONFIG_HOME (default ~/.config) release-stable-linux release-beta-linux

    Example resolved paths:

    • macOS beta: ~/Library/Application Support/Open Design/namespaces/release-beta/data/
    • Windows beta: %APPDATA%\Open Design\namespaces\release-beta-win\data\
    • Linux beta: ~/.config/Open Design/namespaces/release-beta-linux/data/

    If unsure, inspect the packaged daemon log right after the app boots; it logs the resolved daemonDataRoot.

⚠️ Do this in a clean state. Migration replaces (not merges) the Desktop app's data dir with your repo .od/. Both writers must be fully stopped before copying — quit the Desktop app and stop the repo dev-server. SQLite-WAL needs to flush cleanly on both sides; if either daemon is still running it can write SQLite/WAL pages or project/artifact files mid-snapshot, leaving the staged copy inconsistent. If the Desktop app already has projects you care about, decide which side is authoritative before continuing — the steps below back up the Desktop's current data/ to a sibling but do not merge.

Option A: one-shot auto-migration via OD_LEGACY_DATA_DIR

Use this when the Desktop app's data/ is still empty, which is the typical state right after the upgrade that surfaced #710. Quit the Desktop app first (so its daemon is not holding app.sqlite), then re-launch with OD_LEGACY_DATA_DIR pointed at your old repo .od/. The daemon stages your payload into a sibling tmp directory and only promotes it into data/ on success; on any failure the staging directory is removed so the next boot retries cleanly.

The daemon refuses, with a visible startup error, when:

  • the path in OD_LEGACY_DATA_DIR does not contain app.sqlite (typo, deleted source, wrong path), or
  • the Desktop's data/ already contains any of app.sqlite, projects/, artifacts/, media-config.json, etc. SQLite/WAL pairs and project trees cannot be safely interleaved, so the daemon refuses to merge instead of silently corrupting either side. If the Desktop has already booted and seeded its own data/, use Option B and decide explicitly which side wins.

A .migrated-from marker is written on success so subsequent boots no-op.

Quit the Desktop app first, then re-launch with this env set. The launcher must put the variable into the app process environment, not just the shell that runs open / xdg-open.

macOS (LaunchServices does not inherit shell env, so use the direct binary):

OD_LEGACY_DATA_DIR="/path/to/old/repo/.od" \
  "/Applications/Open Design.app/Contents/MacOS/Open Design"

If you prefer the Dock launcher, set the variable in launchctl first, open the app, then unset it:

launchctl setenv OD_LEGACY_DATA_DIR "/path/to/old/repo/.od"
open "/Applications/Open Design.app"
# After the migration log line appears:
launchctl unsetenv OD_LEGACY_DATA_DIR

Linux (run the binary directly so the env var actually reaches it):

OD_LEGACY_DATA_DIR="/path/to/old/repo/.od" /path/to/open-design
# (e.g. the AppImage you launched, or the unpacked binary under /opt)

Windows (PowerShell):

$env:OD_LEGACY_DATA_DIR="C:\path\to\old\repo\.od"
& "$env:LOCALAPPDATA\Programs\Open Design\Open Design.exe"

The daemon log records [od-migrate] migration complete: copied N entries (...). After the first launch you can clear the env variable; the marker prevents re-migration even on subsequent runs.

Option B: manual copy

To carry your existing projects, SQLite, artifacts, and media-config.json over to the Desktop app, when Option A is not viable (Desktop already has its own data and you want to replace it explicitly).

macOS / Linux (bash):

set -euo pipefail
# 1. Stop both writers so the source and target are quiescent.
#    - Quit the Desktop app (Cmd+Q on macOS, File → Exit on Linux).
#    - Stop the repo dev-server: `pnpm tools-dev stop` from the repo root.
# 2. Set REPO and APP_DATA to your actual paths; the example below is macOS + beta.
REPO="/path/to/open-design"
APP_DATA="$HOME/Library/Application Support/Open Design/namespaces/release-beta/data"

# 3. Preflight: see what (if anything) the Desktop app already has.
ls "$APP_DATA/projects" 2>/dev/null && echo "Desktop already has projects, confirm this is a replace, not a merge."

# 4. Stage into a sibling first, then atomically swap into place. `set -e` plus
#    the explicit rsync exit check guarantee a non-zero copy aborts before any
#    `mv` runs, so the Desktop data dir cannot end up half-populated.
STAGE="${APP_DATA}.staged-$(date +%F-%H%M)"
mkdir -p "$STAGE"
rsync -a --exclude='backup-*' "$REPO/.od/" "$STAGE/" || { echo "rsync failed, aborting before swap"; exit 1; }

# 5. Backup the Desktop's current data, then promote the staged copy.
mv "$APP_DATA" "${APP_DATA}.fresh-baseline-$(date +%F-%H%M)"
mv "$STAGE" "$APP_DATA"

# 6. Relaunch the Desktop app. The daemon applies forward schema changes on boot.

Windows (PowerShell):

$ErrorActionPreference = 'Stop'
# 1. Stop both writers so the source and target are quiescent.
#    - Quit the Desktop app (File > Exit).
#    - Stop the repo dev-server: `pnpm tools-dev stop` from the repo root.
# 2. Set $Repo and $AppData to your actual paths; the example below is stable channel.
$Repo    = 'C:\path\to\open-design'
$AppData = Join-Path $env:APPDATA 'Open Design\namespaces\release-stable-win\data'

# 3. Preflight: see what (if anything) the Desktop app already has.
if (Test-Path (Join-Path $AppData 'projects')) {
  Write-Host 'Desktop already has projects, confirm this is a replace, not a merge.'
}

# 4. Stage into a sibling first. Robocopy /MIR mirrors source to staging, and
#    its exit codes >= 8 are real errors (0..7 are success/info), so we guard
#    explicitly before promoting.
$Stamp = Get-Date -Format 'yyyy-MM-dd-HHmm'
$Stage = "$AppData.staged-$Stamp"
robocopy "$Repo\.od" $Stage /MIR /XD 'backup-*' | Out-Null
if ($LASTEXITCODE -ge 8) { throw "robocopy failed (exit $LASTEXITCODE), aborting before swap" }

# 5. Backup the Desktop's current data, then promote the staged copy.
if (Test-Path $AppData) { Rename-Item $AppData "$AppData.fresh-baseline-$Stamp" }
Rename-Item $Stage $AppData

# 6. Relaunch the Desktop app. The daemon applies forward schema changes on boot.

If anything looks wrong after relaunch, restore the original Desktop data by deleting $APP_DATA (or $AppData on Windows) and renaming the .fresh-baseline-* directory back into place.

⚠️ Schema migrations are forward-only. The daemon applies CREATE TABLE IF NOT EXISTS / ALTER TABLE changes on boot; there is no version guard. After migrating, do not open the same data dir with an older repo checkout — unsupported columns or behavior mismatches can leave the workspace inconsistent. Back up app.sqlite* before the first launch with the new app.

⚠️ Advanced: sharing one data dir between repo dev-server and Desktop app. Pointing both at the same dir via OD_DATA_DIR is possible but only safe one-at-a-time. The daemon opens app.sqlite in WAL mode and writes uncoordinated files under projects/ and artifacts/; running both writers concurrently can corrupt SQLite or clobber artifacts. Always stop the Desktop app before starting the dev-server, and stop the dev-server before opening the Desktop app:

OD_DATA_DIR="$HOME/Library/Application Support/Open Design/namespaces/release-beta/data" \
  pnpm tools-dev start web

Full file map, scripts, and troubleshooting → QUICKSTART.md.

Running the Project

Open Design can run as a web app in your browser or as an Electron desktop application. Both modes share the same local daemon + web architecture.

Web / Localhost (Default)

# Foreground mode — keeps the lifecycle command in the foreground (logs written to files)
pnpm tools-dev run web

# View recent logs:
pnpm tools-dev logs

# Background mode — daemon + web run as background processes
pnpm tools-dev start web

By default, tools-dev binds to available ephemeral ports and prints the actual URLs on startup. To use fixed ports from a stopped state:

pnpm tools-dev run web --daemon-port 17456 --web-port 17573

If daemon/web are already running, use restart to switch ports in the existing session:

pnpm tools-dev restart --daemon-port 17456 --web-port 17573

Desktop / Electron

# Start daemon + web + desktop in the background
pnpm tools-dev

# Check desktop status
pnpm tools-dev inspect desktop status

# Take a screenshot of the desktop app
pnpm tools-dev inspect desktop screenshot --path /tmp/open-design.png

The desktop app discovers the web URL automatically via sidecar IPC — no port guessing required.

Other Useful Commands

Command What it does
pnpm tools-dev status Show running sidecar statuses
pnpm tools-dev logs Show daemon/web/desktop log tails
pnpm tools-dev stop Stop all running sidecars
pnpm tools-dev restart Stop then restart all sidecars
pnpm tools-dev check Status + recent logs + common diagnostics

For fixed-port restarts, background startup, and full troubleshooting see QUICKSTART.md.

Nix

A flake is published at the repo root. Home Manager is the recommended path for individual developers; a NixOS module is also exposed for shared/server installs. See nix/README.md for the full surface (data dir, secrets, webFrontend vs. bringing your own server, OD_DAEMON_URL).

# Home Manager
inputs.open-design.url = "github:nexu-io/open-design";
# then: imports = [ inputs.open-design.homeManagerModules.default ];
nix run github:nexu-io/open-design       # boot the daemon (`od`) without installing

For developers, a Nix dev shell is available and can be used with direnv too:

nix develop   # dev shell with required dependencies to work on Open Design

Use Open Design from your coding agent

Open Design ships a stdio MCP server. Wire it into Claude Code, Codex, Cursor, VS Code, Antigravity, Zed, Windsurf, or any MCP-compatible client and the agent in another repo can read files from your local Open Design projects directly. Replaces the export-then-attach loop. When the agent calls search_files, get_file, or get_artifact without a project argument, the MCP defaults to whatever project (and file) you have open in Open Design right now, so prompts like "build this in my app" or "match these styles" just work.

Why MCP? Exporting and re-attaching a zip every design iteration breaks flow. The MCP server exposes your design source directly -- tokens CSS, JSX components, entry HTML -- as a structured API the agent can query by name. The agent always sees the live file, not a stale copy from the last export.

Open Settings → MCP server in the Open Design app for a per-client install flow. The panel bakes the absolute path to your node binary and the daemon's built cli.js into every snippet, so it works on a fresh source clone where od is not on your PATH. Cursor gets a one-click deeplink; the rest get a copy-paste JSON snippet in the schema their config file expects (Claude Code includes a claude mcp add-json one-liner so you do not have to hand-edit ~/.claude.json). Restart or reload your client after install for the server to show up.

The daemon must be running locally for MCP tool calls to succeed. If the agent was started before Open Design, restart the agent after Open Design is up so it can reach the live daemon. Tool calls made while the daemon is offline return a clear "daemon not reachable" error rather than a crash.

Security model. The MCP server is read-only; it exposes file reads, file metadata, and search -- nothing that writes to disk or calls an external service. It runs as a child process of the coding agent over stdio, so any MCP client you register inherits read access to your local Open Design projects. Treat it like installing a VS Code extension: only register clients you trust. The daemon binds to 127.0.0.1 by default; LAN-wide exposure requires an explicit OD_BIND_HOST opt-in. If you also front the SPA with a non-loopback static server, set OD_ALLOWED_ORIGINS=<origin1>,<origin2>,... (comma-separated scheme://host[:port] entries) so the daemon's same-origin gate accepts API writes from those origins on both the Origin and Host checks; without it the browser will see 403s on every PUT/POST (Caddy v2 reverse_proxy preserves the original Host header upstream by default, so loopback alone is not enough). Connector-credential and live-artifact preview routes stay loopback-only regardless.

Repository structure

open-design/
├── README.md                      ← this file
├── README.de.md                   ← Deutsch
├── README.ru.md                   ← Русский
├── README.zh-CN.md                ← 简体中文
├── QUICKSTART.md                  ← run / build / deploy guide
├── package.json                   ← pnpm workspace, single bin: od
│
├── apps/
│   ├── daemon/                    ← Node + Express, the only server
│   │   ├── src/                   ← TypeScript daemon source
│   │   │   ├── cli.ts             ← `od` bin source, compiled to dist/cli.js
│   │   │   ├── server.ts          ← /api/* routes (projects, chat, files, exports)
│   │   │   ├── agents.ts          ← PATH scanner + per-CLI argv builders
│   │   │   ├── claude-stream.ts   ← streaming JSON parser for Claude Code stdout
│   │   │   ├── skills.ts          ← SKILL.md frontmatter loader
│   │   │   └── db.ts              ← SQLite schema (projects/messages/templates/tabs)
│   │   ├── sidecar/               ← tools-dev daemon sidecar wrapper
│   │   └── tests/                 ← daemon package tests
│   │
│   └── web/                       ← Next.js 16 App Router + React client
│       ├── app/                   ← App Router entrypoints
│       ├── next.config.ts         ← dev rewrites + prod static export to out/
│       └── src/                   ← React + TypeScript client modules
│           ├── App.tsx            ← routing, bootstrap, settings
│           ├── components/        ← chat, composer, picker, preview, sketch, …
│           ├── prompts/
│           │   ├── system.ts      ← composeSystemPrompt(base, skill, DS, metadata)
│           │   ├── discovery.ts   ← turn-1 form + turn-2 branch + 5-dim critique
│           │   └── directions.ts  ← 5 visual directions × OKLch palette + font stack
│           ├── artifacts/         ← streaming <artifact> parser + manifests
│           ├── runtime/           ← iframe srcdoc, markdown, export helpers
│           ├── providers/         ← daemon SSE + BYOK API transports
│           └── state/             ← config + projects (localStorage + daemon-backed)
│
├── e2e/                           ← Playwright UI + external integration/Vitest harness
│
├── packages/
│   ├── contracts/                 ← shared web/daemon app contracts
│   ├── sidecar-proto/             ← Open Design sidecar protocol contract
│   ├── sidecar/                   ← generic sidecar runtime primitives
│   └── platform/                  ← generic process/platform primitives
│
├── skills/                        ← 31 SKILL.md skill bundles (27 prototype + 4 deck)
│   ├── web-prototype/             ← default for prototype mode
│   ├── saas-landing/  dashboard/  pricing-page/  docs-page/  blog-post/
│   ├── mobile-app/  mobile-onboarding/  gamified-app/
│   ├── email-marketing/  social-carousel/  magazine-poster/
│   ├── motion-frames/  sprite-animation/  digital-eguide/  dating-web/
│   ├── critique/  tweaks/  wireframe-sketch/
│   ├── pm-spec/  team-okrs/  meeting-notes/  kanban-board/
│   ├── eng-runbook/  finance-report/  invoice/  hr-onboarding/
│   ├── simple-deck/  replit-deck/  weekly-update/   ← deck mode
│   └── guizang-ppt/               ← bundled magazine-web-ppt (default for deck)
│       ├── SKILL.md
│       ├── assets/template.html   ← seed
│       └── references/{themes,layouts,components,checklist}.md
│
├── design-systems/                ← 72 DESIGN.md systems
│   ├── default/                   ← Neutral Modern (starter)
│   ├── warm-editorial/            ← Warm Editorial (starter)
│   ├── linear-app/  vercel/  stripe/  airbnb/  notion/  cursor/  apple/  …
│   └── README.md                  ← catalog overview
│
├── assets/
│   └── frames/                    ← shared device frames (used cross-skill)
│       ├── iphone-15-pro.html
│       ├── android-pixel.html
│       ├── ipad-pro.html
│       ├── macbook.html
│       └── browser-chrome.html
│
├── templates/
│   ├── deck-framework.html        ← deck baseline (nav / counter / print)
│   └── kami-deck.html             ← kami-flavored deck starter (parchment / ink-blue serif)
│
├── scripts/
│   └── sync-design-systems.ts     ← re-import upstream awesome-design-md tarball
│
├── docs/
│   ├── spec.md                    ← product spec, scenarios, differentiation
│   ├── architecture.md            ← topologies, data flow, components
│   ├── skills-protocol.md         ← extended SKILL.md od: frontmatter
│   ├── agent-adapters.md          ← per-CLI detection + dispatch
│   ├── modes.md                   ← prototype / deck / template / design-system
│   ├── references.md              ← long-form provenance
│   ├── roadmap.md                 ← phased delivery
│   ├── schemas/                   ← JSON schemas
│   └── examples/                  ← canonical artifact examples
│
└── .od/                           ← runtime data, gitignored, auto-created
    ├── app.sqlite                 ← projects / conversations / messages / tabs
    ├── projects/<id>/             ← per-project working folder (agent's cwd)
    └── artifacts/                 ← saved one-off renders

Design Systems

The 72 design systems library — style guide spread

72 systems out of the box, each as a single DESIGN.md:

Full catalog (click to expand)

AI & LLMclaude · cohere · mistral-ai · minimax · together-ai · replicate · runwayml · elevenlabs · ollama · x-ai

Developer Toolscursor · vercel · linear-app · framer · expo · clickhouse · mongodb · supabase · hashicorp · posthog · sentry · warp · webflow · sanity · mintlify · lovable · composio · opencode-ai · voltagent

Productivitynotion · figma · miro · airtable · superhuman · intercom · zapier · cal · clay · raycast

Fintechstripe · coinbase · binance · kraken · mastercard · revolut · wise

E-Commerceshopify · airbnb · uber · nike · starbucks · pinterest

Mediaspotify · playstation · wired · theverge · meta

Automotivetesla · bmw · ferrari · lamborghini · bugatti · renault

Otherapple · ibm · nvidia · vodafone · sentry · resend · spacex

Startersdefault (Neutral Modern) · warm-editorial

The product-system library is imported via scripts/sync-design-systems.ts from VoltAgent/awesome-design-md. Re-run to refresh. The 57 design skills are sourced from bergside/awesome-design-skills and added directly in design-systems/.

Visual directions

When the user has no brand spec, the agent emits a second form with five curated directions — the OD adaptation of huashu-design's "5 schools × 20 design philosophies" fallback. Each direction is a deterministic spec — palette in OKLch, font stack, layout posture cues, references — that the agent binds verbatim into the seed template's :root. One radio click → a fully specified visual system. No improvisation, no AI-slop.

Direction Mood Refs
Editorial — Monocle / FT Print magazine, ink + cream + warm rust Monocle · FT Weekend · NYT Magazine
Modern minimal — Linear / Vercel Cool, structured, minimal accent Linear · Vercel · Stripe
Tech utility Information density, monospace, terminal Bloomberg · Bauhaus tools
Brutalist Raw, oversized type, no shadows, harsh accents Bloomberg Businessweek · Achtung
Soft warm Generous, low contrast, peachy neutrals Notion marketing · Apple Health

Full spec → apps/daemon/src/prompts/directions.ts.

Media generation

OD doesn't stop at code. The same chat surface that produces <artifact> HTML also drives image, video, and audio generation, with model adapters wired into the daemon's media pipeline (apps/daemon/src/media-models.ts, apps/web/src/media/models.ts). Every render lands as a real file in the project workspace — .png for image, .mp4 for video — and shows up as a download chip when the turn ends.

Three model families carry the load today:

Surface Model Provider What it's for
Image gpt-image-2 Azure / OpenAI Posters, profile avatars, illustrated maps, infographics, magazine-style social cards, photo restoration, exploded-view product art
Video seedance-2.0 ByteDance Volcengine 15s cinematic t2v + i2v with audio — narrative shorts, character close-ups, product films, MV-style choreography
Video hyperframes-html HeyGen / OSS HTML→MP4 motion graphics — product reveals, kinetic typography, data charts, social overlays, logo outros, TikTok-style verticals with karaoke captions

A growing prompt gallery at prompt-templates/ ships 93 ready-to-replicate prompts — 43 image (prompt-templates/image/*.json), 39 Seedance (prompt-templates/video/*.json excluding hyperframes-*), 11 HyperFrames (prompt-templates/video/hyperframes-*.json). Each carries a preview thumbnail, the prompt body verbatim, the target model, the aspect ratio, and a source block for license + attribution. The daemon serves them at GET /api/prompt-templates, the web app surfaces them as a card grid in the Image templates and Video templates tabs of the entry view; one click drops a prompt into the composer with the right model preselected.

3D Stone Staircase Evolution
3D Stone Staircase Evolution Infographic
3-step infographic, hewn-stone aesthetic
Illustrated City Food Map
Illustrated City Food Map
Editorial hand-illustrated travel poster
Cinematic Elevator Scene
Cinematic Elevator Scene
Single-frame editorial fashion still
Cyberpunk Anime Portrait
Cyberpunk Anime Portrait
Profile avatar — neon face text
Glamorous Woman in Black
Glamorous Woman in Black Portrait
Editorial studio portrait

Full set → prompt-templates/image/. Sources: most pull from YouMind-OpenLab/awesome-gpt-image-prompts (CC-BY-4.0) with author attribution preserved per template.

Music Podcast Guitar
Music Podcast & Guitar Technique
4K cinematic studio film
Emotional Face
Emotional Face Close-up
Cinematic micro-expression study
Luxury Supercar
Luxury Supercar Cinematic
Narrative product film
Forbidden City Cat
Forbidden City Cat Satire
Stylised satire short
Japanese Romance
Japanese Romance Short Film
15s Seedance 2.0 narrative

Click any thumbnail to play the actual rendered MP4. Full set → prompt-templates/video/ (the *-seedance-* and Cinematic-tagged entries). Sources: YouMind-OpenLab/awesome-seedance-2-prompts (CC-BY-4.0) with original tweet links and author handles preserved.

HyperFrames — HTML→MP4 motion graphics (11 ready-to-replicate templates)

heygen-com/hyperframes is HeyGen's open-source agent-native video framework — you (or the agent) write HTML + CSS + GSAP, HyperFrames renders it to a deterministic MP4 via headless Chrome + FFmpeg. Open Design ships HyperFrames as a first-class video model (hyperframes-html) wired into the daemon dispatch, plus the skills/hyperframes/ skill that teaches the agent the timeline contract, scene-transition rules, audio-reactive patterns, captions/TTS, and the catalog blocks (npx hyperframes add <slug>).

Eleven hyperframes prompts ship under prompt-templates/video/hyperframes-*.json, each one a concrete brief that produces a specific archetype:

Product reveal
5s minimal product reveal · 16:9 · push-in title card with shader transition
SaaS promo
30s SaaS product promo · 16:9 · Linear/ClickUp-style with UI 3D reveals
TikTok karaoke
TikTok karaoke talking-head · 9:16 · TTS + word-synced captions
Brand sizzle
30s brand sizzle reel · 16:9 · beat-synced kinetic typography, audio-reactive
Data chart
Animated bar-chart race · 16:9 · NYT-style data infographic
Flight map
Flight map (origin → dest) · 16:9 · Apple-style cinematic route reveal
Logo outro
4s cinematic logo outro · 16:9 · piece-by-piece assembly + bloom
Money counter
$0 → $10K money counter · 9:16 · Apple-style hype with green flash + burst
App showcase
3-phone app showcase · 16:9 · floating phones with feature callouts
Social overlay
Social overlay stack · 9:16 · X · Reddit · Spotify · Instagram in sequence
Website to video
Website-to-video pipeline · 16:9 · captures site at 3 viewports + transitions
 

Pattern is the same as the rest: pick a template, edit the brief, send. The agent reads the bundled skills/hyperframes/SKILL.md (which carries the OD-specific render workflow — composition source files into a .hyperframes-cache/ so they don't clutter the file workspace, daemon dispatches npx hyperframes render to dodge the macOS sandbox-exec / Puppeteer hang, only the final .mp4 lands as a project chip), authors the composition, and ships an MP4. Catalog block thumbnails © HeyGen, served from their CDN; the OSS framework itself is Apache-2.0.

Also wired but not surfaced as templates yet: Kling 2.0 / 1.6 / 1.5, Veo 3 / Veo 2, Sora 2 / Sora 2-Pro (via Fal), MiniMax video-01 — all live in VIDEO_MODELS (apps/web/src/media/models.ts). Suno v5 / v4.5, Udio v2, Lyria 2 (music) and gpt-4o-mini-tts, MiniMax TTS (speech) cover the audio surface. Templates for these are open contributions — drop a JSON into prompt-templates/video/ or prompt-templates/audio/ and it shows up in the picker.

Beyond chat — what else ships

The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:

  • Claude Design ZIP import. Drop an export from claude.ai onto the welcome dialog. POST /api/import/claude-design extracts it into a real .od/projects/<id>/, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". (apps/daemon/src/server.ts/api/import/claude-design)
  • Multi-provider BYOK proxy. POST /api/proxy/{anthropic,openai,azure,google}/stream takes { baseUrl, apiKey, model, messages }, builds the provider-specific upstream request, normalizes SSE chunks into delta/end/error, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry /openai/v1, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + api-version; Google uses Gemini :streamGenerateContent.
  • User-saved templates. Once you like a render, POST /api/templates snapshots the HTML + metadata into the SQLite templates table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
  • Tab persistence. Every project remembers its open files and active tab in the tabs table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
  • Artifact lint API. POST /api/artifacts/lint runs structural checks on a generated artifact (broken <artifact> framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
  • Sidecar protocol + desktop automation. Daemon, web, and desktop processes carry typed five-field stamps (app · mode · namespace · ipc · source) and expose a JSON-RPC IPC channel at /tmp/open-design/ipc/<namespace>/<app>.sock. tools-dev inspect desktop status \| eval \| screenshot drives that channel, so headless E2E works against a real Electron shell without bespoke harnesses (packages/sidecar-proto/, apps/desktop/src/main/).
  • Windows-friendly spawning. Every adapter that would otherwise blow CreateProcess's ~32 KB argv limit on long composed prompts (Codex, Gemini, OpenCode, Cursor Agent, Qwen, Qoder CLI, Pi) feeds the prompt over stdin instead. Claude Code and Copilot keep -p; the daemon falls back to a temp prompt-file when even that overflows.
  • Per-namespace runtime data. OD_DATA_DIR and --namespace give you fully isolated .od/-style trees, so Playwright, beta channels, and your real projects never share a SQLite file.

Anti-AI-slop machinery

The whole machinery below is the huashu-design playbook, ported into OD's prompt-stack and made enforceable per-skill via the side-file pre-flight. Read apps/daemon/src/prompts/discovery.ts for the live wording:

  • Question form first. Turn 1 is <question-form> only — no thinking, no tools, no narration. The user chooses defaults at radio speed.
  • Brand-spec extraction. When the user attaches a screenshot or URL, the agent runs a five-step protocol (locate · download · grep hex · codify brand-spec.md · vocalise) before writing CSS. Never guesses brand colors from memory.
  • Five-dim critique. Before emitting <artifact>, the agent silently scores its output 15 across philosophy / hierarchy / execution / specificity / restraint. Anything under 3/5 is a regression — fix and rescore. Two passes is normal.
  • P0/P1/P2 checklist. Every skill ships a references/checklist.md with hard P0 gates. The agent must pass P0 before emitting.
  • Slop blacklist. Aggressive purple gradients, generic emoji icons, rounded card with left-border accent, hand-drawn SVG humans, Inter as a display face, invented metrics — explicitly forbidden in the prompt.
  • Honest placeholders > fake stats. When the agent doesn't have a real number, it writes or a labelled grey block, not "10× faster".

Comparison

Axis Claude Design (Anthropic) Open CoDesign Open Design
License Closed MIT Apache-2.0
Form factor Web (claude.ai) Desktop (Electron) Web app + local daemon
Deployable on Vercel
Agent runtime Bundled (Opus 4.7) Bundled (pi-ai) Delegated to user's existing CLI
Skills Proprietary 12 custom TS modules + SKILL.md 31 file-based SKILL.md bundles, droppable
Design system Proprietary DESIGN.md (v0.2 roadmap) DESIGN.md × 129 systems shipped
Provider flexibility Anthropic only 7+ via pi-ai 16 CLI adapters + OpenAI-compatible BYOK proxy
Init question form Hard rule, turn 1
Direction picker 5 deterministic directions
Live todo progress + tool stream (UX pattern from open-codesign)
Sandboxed iframe preview (pattern from open-codesign)
Claude Design ZIP import n/a POST /api/import/claude-design — keep editing where Anthropic left off
Comment-mode surgical edits 🟡 partial — preview element comments + chat attachments; surgical patch reliability still in progress
AI-emitted tweaks panel 🚧 roadmap — dedicated chat-side panel UX is not implemented yet
Filesystem-grade workspace partial (Electron sandbox) Real cwd, real tools, persisted SQLite (projects · conversations · messages · tabs · templates)
5-dim self-critique Pre-emit gate
Artifact lint POST /api/artifacts/lint — findings fed back to the agent
Sidecar IPC + headless desktop Stamped processes + tools-dev inspect desktop status | eval | screenshot
Export formats Limited HTML / PDF / PPTX / ZIP / Markdown HTML / PDF / PPTX (agent-driven) / ZIP / Markdown
PPT skill reuse N/A Built-in guizang-ppt-skill drops in (default for deck mode)
Minimum billing Pro / Max / Team BYOK BYOK — paste any OpenAI-compatible baseUrl

Supported coding agents

Auto-detected from PATH on daemon boot. No config required. Streaming dispatch lives in apps/daemon/src/agents.ts (AGENT_DEFS); per-CLI parsers live alongside it. Models are populated either by probing <bin> --list-models / <bin> models / ACP handshake, or from a curated fallback list when the CLI doesn't expose a list.

Agent Bin Stream format Argv shape (composed prompt path)
Claude Code claude claude-stream-json (typed events) claude -p <prompt> --output-format stream-json --verbose [--include-partial-messages] [--add-dir …] --permission-mode bypassPermissions
Codex CLI codex json-event-stream + codex parser codex exec --json --skip-git-repo-check --sandbox workspace-write -c sandbox_workspace_write.network_access=true [-C cwd] [--add-dir …] [--model …] [-c model_reasoning_effort=…] (prompt on stdin)
Devin for Terminal devin acp-json-rpc devin --permission-mode dangerous --respect-workspace-trust false acp
Gemini CLI gemini json-event-stream + gemini parser GEMINI_CLI_TRUST_WORKSPACE=true gemini --output-format stream-json --yolo [--model …] (prompt on stdin)
OpenCode opencode json-event-stream + opencode parser opencode run --format json --dangerously-skip-permissions [--model …] - (prompt on stdin)
Cursor Agent cursor-agent json-event-stream + cursor-agent parser cursor-agent --print --output-format stream-json --stream-partial-output --force --trust [--workspace cwd] [--model …] - (prompt on stdin)
Qwen Code qwen plain (raw stdout chunks) qwen --yolo [--model …] - (prompt on stdin)
Qoder CLI qodercli qoder-stream-json (typed events) qodercli -p --output-format stream-json --permission-mode bypass_permissions [--cwd cwd] [--model …] [--add-dir …] (prompt on stdin)
GitHub Copilot CLI copilot copilot-stream-json (typed events) copilot -p <prompt> --allow-all-tools --output-format json [--model …] [--add-dir …]
Hermes hermes acp-json-rpc (Agent Client Protocol) hermes acp --accept-hooks
Kimi CLI kimi acp-json-rpc kimi acp
Kiro CLI kiro-cli acp-json-rpc kiro-cli acp
Kilo kilo acp-json-rpc kilo acp
Mistral Vibe CLI vibe-acp acp-json-rpc vibe-acp
DeepSeek TUI deepseek plain (raw stdout chunks) deepseek exec --auto [--model …] <prompt> (prompt as positional arg)
Pi pi pi-rpc (stdio JSON-RPC) pi --mode rpc [--model …] [--thinking …] (prompt sent as RPC prompt command)
Multi-provider BYOK n/a SSE normalization POST /api/proxy/{provider}/stream → Anthropic / OpenAI-compatible / Azure OpenAI / Gemini; SSRF-guarded with loopback local providers allowed, non-loopback internal ranges blocked, and upstream redirects disabled

Adding a new CLI is one entry in apps/daemon/src/agents.ts. Streaming format is one of claude-stream-json, qoder-stream-json, copilot-stream-json, json-event-stream (with a per-CLI eventParser), acp-json-rpc, pi-rpc, or plain.

References & lineage

Every external project this repo borrows from. Each link goes to the source so you can verify the provenance.

Project Role here
Claude Design The closed-source product this repo is the open-source alternative to.
alchaincyf/huashu-design The design-philosophy core. Junior-Designer workflow, the 5-step brand-asset protocol, anti-AI-slop checklist, 5-dimensional self-critique, and the "5 schools × 20 design philosophies" library behind our direction picker — all distilled into apps/daemon/src/prompts/discovery.ts and apps/daemon/src/prompts/directions.ts.
op7418/guizang-ppt-skill Magazine-web-PPT skill bundled verbatim under skills/guizang-ppt/ with original LICENSE preserved. Default for deck mode. P0/P1/P2 checklist culture borrowed for every other skill.
multica-ai/multica The daemon + adapter architecture. PATH-scan agent detection, local daemon as the only privileged process, agent-as-teammate worldview. We adopt the model; we do not vendor the code.
OpenCoworkAI/open-codesign The first open-source Claude-Design alternative and our closest peer. UX patterns adopted: streaming-artifact loop, sandboxed-iframe preview (vendored React 18 + Babel), live agent panel (todos + tool calls + interruptible), five-format export list (HTML/PDF/PPTX/ZIP/Markdown), local-first storage hub, SKILL.md taste-injection, and the first pass of comment-mode preview annotations. UX patterns still on our roadmap: full surgical-edit reliability and AI-emitted tweaks panel. We deliberately do not vendor pi-ai — open-codesign bundles it as the agent runtime; we delegate to whichever CLI the user already has.
VoltAgent/awesome-claude-design / awesome-design-md Source of the 9-section DESIGN.md schema and 70 product systems imported via scripts/sync-design-systems.ts.
bergside/awesome-design-skills Source of 57 design skills added directly as normalized DESIGN.md files under design-systems/.
farion1231/cc-switch Inspiration for symlink-based skill distribution across multiple agent CLIs.
Claude Code skills The SKILL.md convention adopted verbatim — any Claude Code skill drops into skills/ and is picked up by the daemon.

Long-form provenance write-up — what we take from each, what we deliberately don't — lives at docs/references.md.

Roadmap

  • Daemon + agent detection (16 CLI adapters) + skill registry + design-system catalog
  • Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
  • 31 skills + 72 design systems + 5 visual directions + 5 device frames
  • SQLite-backed projects · conversations · messages · tabs · templates
  • Multi-provider BYOK proxy (/api/proxy/{anthropic,openai,azure,google}/stream) with SSRF guard
  • Claude Design ZIP import (/api/import/claude-design)
  • Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
  • Artifact lint API + 5-dim self-critique pre-emit gate
  • Comment-mode surgical edits — partial shipped: preview element comments and chat attachments; reliable targeted patching remains in progress
  • AI-emitted tweaks panel UX — not implemented yet
  • Vercel + tunnel deployment recipe (Topology B)
  • One-command npx od init to scaffold a project with DESIGN.md
  • Skill marketplace (od skills install <github-repo>) and od skill add | list | remove | test CLI surface (drafted in docs/skills-protocol.md, implementation pending)
  • Packaged Electron build out of apps/packaged/ — macOS (Apple Silicon) and Windows (x64) downloads on open-design.ai and the GitHub releases page

Phased delivery → docs/roadmap.md.

Status

This is an early implementation — the closed loop (detect → pick skill + design system → chat → parse <artifact> → preview → save) runs end-to-end. The prompt stack and skill library are where most of the value lives, and they're stable. The component-level UI is shipping daily.

Stay in the loop

Follow @nexudotio on X for release notes, new skills, new design systems, and the occasional behind-the-scenes thread on what's shipping next. Discord is for chat, X is for the milestones — both links are in the badges above.

Star us

Star Open Design on GitHub — github.com/nexu-io/open-design

If this saved you thirty minutes — give it a ★. Stars don't pay rent, but they tell the next designer, agent, and contributor that this experiment is worth their attention. One click, three seconds, real signal: github.com/nexu-io/open-design.

Contributing

Issues, PRs, new skills, and new design systems are all welcome. The highest-leverage contributions are usually one folder, one Markdown file, or one PR-sized adapter:

Full walkthrough, bar-for-merging, code style, and what we don't accept → CONTRIBUTING.md (Deutsch, Français, 简体中文).

Contributors

Thanks to everyone who has helped move Open Design forward — through code, docs, feedback, new skills, new design systems, or even a sharp issue. Every real contribution counts, and the wall below is the easiest way to say so out loud.

Open Design contributors

If you've shipped your first PR — welcome. The good-first-issue/help-wanted label is the entry point.

Repository activity

Open Design — repository metrics

The SVG above is regenerated daily by .github/workflows/metrics.yml using lowlighter/metrics. Trigger a manual refresh from the Actions tab if you want it sooner; for richer plugins (traffic, follow-up time), add a METRICS_TOKEN repository secret with a fine-grained PAT.

Star History

Open Design star history

If the curve bends up, that's the signal we look for. ★ this repo to push it.

Credits

The HTML PPT Studio family of skills — the master skills/html-ppt/ and the per-template wrappers under skills/html-ppt-*/ (15 full-deck templates, 36 themes, 31 single-page layouts, 27 CSS animations + 20 canvas FX, the keyboard runtime, and the magnetic-card presenter mode) — are integrated from the open-source project lewislulu/html-ppt-skill (MIT). The upstream LICENSE ships in-tree at skills/html-ppt/LICENSE and authorship credit goes to @lewislulu. Each per-template Examples card (html-ppt-pitch-deck, html-ppt-tech-sharing, html-ppt-presenter-mode, html-ppt-xhs-post, …) delegates authoring guidance to the master skill so the upstream's prompt → output behavior is preserved end-to-end when you click Use this prompt.

The magazine / horizontal-swipe deck flow under skills/guizang-ppt/ is integrated from op7418/guizang-ppt-skill (MIT). Authorship credit goes to @op7418.

License

Apache-2.0. The bundled skills/guizang-ppt/ retains its original LICENSE (MIT) and authorship attribution to op7418. The bundled skills/html-ppt/ retains its original LICENSE (MIT) and authorship attribution to lewislulu.