mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

🎨 Local-first, open-source alternative to Anthropic's Claude Design. ⚡ 19 Skills · ✨ 71 brand-grade Design Systems 🖼 Generate web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sandboxed preview · HTML/PDF/PPTX/MP4 export 🤖 Runs on Claude Code / Codex / Cursor / Gemini / OpenCode / Qwen / Copilot / Hermes / Kimi CLI.

Find a file

Nagendhra Madishetti 38a5ab69e6 feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>		2026-05-13 22:11:27 +08:00
.github	Merge origin/main into release/v0.7.0 to prepare merge-back PR	2026-05-13 18:19:47 +08:00
.vaunt	feat: enable Vaunt contributor recognition with 5-tier system (#908 )	2026-05-08 17:44:12 +08:00
apps	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
assets	fix: remove Trump pet from bundled community pets (#1103 )	2026-05-10 12:11:00 +08:00
craft	refine typography-hierarchy craft docs — clarify edge cases and make lint measurable (#979 )	2026-05-09 08:13:35 +08:00
deploy	docs(deploy): document Colima build swap helper (#967 )	2026-05-09 02:17:22 +08:00
design-systems	feat(design-systems): add structured tokens.css schema (default + kami) (#1231 )	2026-05-11 22:23:34 +08:00
design-templates	feat(landing-page): split catalog into per-facet pages + auto-deploy on content changes (#1158 )	2026-05-12 19:24:50 +08:00
docs	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
e2e	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
nix	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
packages	[codex] Add Cursor Agent auth diagnostics (#1538 )	2026-05-13 20:25:34 +08:00
prompt-templates	feat(hyperframes): land HTML-in-Canvas across web + skills (#866 )	2026-05-11 15:45:12 +08:00
scripts	Revert "fix(web): restore consistent app header layout (#1432 )"	2026-05-13 11:20:16 +08:00
skills	Implement manual edit inspector (#1448 )	2026-05-13 13:25:58 +08:00
specs	Make de/fr/ru content i18n optional (#1511 )	2026-05-13 12:17:17 +08:00
story	Refactor project name from "Open Claude Design" to "Open Design" (#1 )	2026-04-28 16:03:35 +08:00
templates	add otd-operations-brief live-artifact template (#794 )	2026-05-08 12:53:24 +08:00
tools	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
.dockerignore	Add Docker Compose deployment workflow (#65 )	2026-05-08 11:51:51 +08:00
.gitignore	Merge origin/main into release/v0.7.0 to prepare merge-back PR	2026-05-13 18:19:47 +08:00
.node-version	add support for VP_HOME environment variable in agent resolution (#859 )	2026-05-08 15:14:37 +08:00
AGENTS.md	docs(pr): require user-perspective description and surface area (#1520 )	2026-05-13 15:28:05 +08:00
CHANGELOG.md	Revert "chore(changelog): note #1432 app header layout fix"	2026-05-13 11:20:16 +08:00
CLAUDE.md	docs: add git commit co-author policy (#131 )	2026-04-30 15:08:22 +08:00
CONTRIBUTING.de.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
CONTRIBUTING.fr.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
CONTRIBUTING.ja-JP.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
CONTRIBUTING.md	docs(pr): require user-perspective description and surface area (#1520 )	2026-05-13 15:28:05 +08:00
CONTRIBUTING.pt-BR.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
CONTRIBUTING.zh-CN.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
edited_image.png	feat(editorial-collage): introduce Atelier Zero style landing page as… (#366 )	2026-05-04 13:39:58 +08:00
flake.lock	feat(nix): Add official flake with home-manager and NixOS support (#402 )	2026-05-09 23:50:16 +08:00
flake.nix	fix: set writable OD_DATA_DIR default for nix run (#1159 )	2026-05-11 10:52:53 +08:00
LICENSE	Refactor project name from "Open Claude Design" to "Open Design" (#1 )	2026-04-28 16:03:35 +08:00
MAINTAINERS.de.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
MAINTAINERS.fr.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
MAINTAINERS.ja-JP.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
MAINTAINERS.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
MAINTAINERS.pt-BR.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
MAINTAINERS.zh-CN.md	docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290 )	2026-05-11 20:19:55 +08:00
package.json	Merge origin/main into release/v0.7.0 to prepare merge-back PR	2026-05-13 18:19:47 +08:00
pnpm-lock.yaml	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 )	2026-05-13 22:11:27 +08:00
pnpm-workspace.yaml	Refresh desktop integration control plane (#123 )	2026-04-30 14:23:53 +08:00
QUICKSTART.de.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
QUICKSTART.fr.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
QUICKSTART.ja-JP.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
QUICKSTART.md	Revert "fix(web): restore consistent app header layout (#1432 )"	2026-05-13 11:20:16 +08:00
QUICKSTART.pt-BR.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
QUICKSTART.zh-CN.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
QUICKSTART.zh-TW.md	docs: add Traditional Chinese QUICKSTART and fix Chinese doc links (#753 )	2026-05-09 21:19:08 +08:00
README.ar.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.de.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.es.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.fr.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.ja-JP.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.ko.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.md	docs: add Windows launcher note (#1546 )	2026-05-13 16:46:42 +08:00
README.pt-BR.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.ru.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.tr.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.uk.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.zh-CN.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
README.zh-TW.md	docs(readme): refresh contributors wall (#1494 )	2026-05-13 13:26:31 +08:00
TRANSLATIONS.md	i18n: add full Thai translation (th) (#1018 )	2026-05-09 15:19:47 +08:00
vercel.json	feat: add vercel.json configuration file (#169 )	2026-04-30 22:54:47 +08:00

README.md

Open Design

The open-source alternative to Claude Design. Local-first, web-deployable, BYOK at every layer — 16 coding-agent CLIs auto-detected on your PATH (Claude Code, Codex, Devin for Terminal, Cursor Agent, Gemini CLI, OpenCode, Qwen, Qoder CLI, GitHub Copilot CLI, Hermes, Kimi, Pi, Kiro, Kilo, Mistral Vibe, DeepSeek TUI) become the design engine, driven by 31 composable Skills and 72 brand-grade Design Systems. No CLI? An OpenAI-compatible BYOK proxy is the same loop minus the spawn.

English · Español · Português (Brasil) · Deutsch · Français · 简体中文 · 繁體中文 · 한국어 · 日本語 · العربية · Русский · Українська · Türkçe

Why this exists

Anthropic's Claude Design (released 2026-04-17, Opus 4.7) showed what happens when an LLM stops writing prose and starts shipping design artifacts. It went viral — and stayed closed-source, paid-only, cloud-only, locked to Anthropic's model and Anthropic's skills. There is no checkout, no self-host, no Vercel deploy, no swap-in-your-own-agent.

Open Design (OD) is the open-source alternative. Same loop, same artifact-first mental model, none of the lock-in. We don't ship an agent — the strongest coding agents already live on your laptop. We wire them into a skill-driven design workflow that runs locally with pnpm tools-dev, can deploy the web layer to Vercel, and stays BYOK at every layer.

Type make me a magazine-style pitch deck for our seed round. The interactive question form pops up before the model improvises a single pixel. The agent picks one of five curated visual directions. A live TodoWrite plan streams into the UI. The daemon builds a real on-disk project folder with a seed template, layout library, and self-check checklist. The agent reads them — pre-flight enforced — runs a five-dimensional critique against its own output, and emits a single <artifact> that renders in a sandboxed iframe seconds later.

That's not "AI tries to design something". That's an AI that has been trained, by the prompt stack, to behave like a senior designer with a working filesystem, a deterministic palette library, and a checklist culture — exactly the bar Claude Design set, but open and yours.

OD stands on four open-source shoulders:

alchaincyf/huashu-design — the design-philosophy compass. Junior-Designer workflow, the 5-step brand-asset protocol, the anti-AI-slop checklist, the 5-dimensional self-critique, and the "5 schools × 20 design philosophies" idea behind our direction picker — all distilled into apps/daemon/src/prompts/discovery.ts.
op7418/guizang-ppt-skill — the deck mode. Bundled verbatim under skills/guizang-ppt/ with original LICENSE preserved; magazine-style layouts, WebGL hero, P0/P1/P2 checklists.
OpenCoworkAI/open-codesign — the UX north star and our closest peer. The first open-source Claude-Design alternative. We borrow its streaming-artifact loop, its sandboxed-iframe preview pattern (vendored React 18 + Babel), its live agent panel (todos + tool calls + interruptible generation), and its five-format export list (HTML / PDF / PPTX / ZIP / Markdown). We deliberately diverge on form factor — they are a desktop Electron app bundling pi-ai; we are a web app + local daemon that delegates to your existing CLI.
multica-ai/multica — the daemon-and-runtime architecture. PATH-scan agent detection, the local daemon as the only privileged process, the agent-as-teammate worldview.

At a glance

	What you get
Coding-agent CLIs (16)	Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click
BYOK fallback	Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini, and the daemon normalizes SSE back to the same chat stream. Internal-IP/SSRF blocked at the daemon edge.
Design systems built-in	129 — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from `awesome-design-md`, plus 57 design skills from `awesome-design-skills` added directly under `design-systems/`
Skills built-in	31 — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal.
Media generation	Image · video · audio surfaces ship alongside the design loop. gpt-image-2 (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · Seedance 2.0 (ByteDance) for cinematic 15s text-to-video and image-to-video · HyperFrames (heygen-com/hyperframes) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). 93 ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under `prompt-templates/`, with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace.
Visual directions	5 curated schools (Editorial Monocle · Modern Minimal · Warm Soft · Tech Utility · Brutalist Experimental) — each ships a deterministic OKLch palette + font stack (`apps/daemon/src/prompts/directions.ts`)
Device frames	iPhone 15 Pro · Pixel · iPad Pro · MacBook · Browser Chrome — pixel-accurate, shared across skills under `assets/frames/`
Agent runtime	Local daemon spawns the CLI in your project folder — agent gets real `Read`, `Write`, `Bash`, `WebFetch` against a real on-disk environment, with Windows `ENAMETOOLONG` fallbacks (stdin / prompt-file) on every adapter
Imports	Drop a Claude Design export ZIP onto the welcome dialog — `POST /api/import/claude-design` parses it into a real project so your agent can keep editing where Anthropic left off
Persistence	SQLite at `.od/app.sqlite`: projects · conversations · messages · tabs · saved templates. Reopen tomorrow, todo card and open files are exactly where you left them.
Lifecycle	One entry point: `pnpm tools-dev` (start / stop / run / status / logs / inspect / check) — boots daemon + web (+ desktop) under typed sidecar stamps
Desktop	Optional Electron shell with sandboxed renderer + sidecar IPC (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN) — drives `tools-dev inspect desktop screenshot` for E2E
Deployable to	Local (`pnpm tools-dev`) · Vercel web layer · packaged Electron desktop app for macOS (Apple Silicon) and Windows (x64) — download from open-design.ai or the latest release
License	Apache-2.0

Demo

_{Entry view — pick a skill, pick a design system, type the brief. The same surface for prototypes, decks, mobile apps, dashboards, and editorial pages.}	_{Turn-1 discovery form — before the model writes a pixel, OD locks the brief: surface, audience, tone, brand context, scale. 30 seconds of radios beats 30 minutes of redirects.}
_{Direction picker — when the user has no brand, the agent emits a second form with 5 curated directions (Monocle / Modern Minimal / Tech Utility / Brutalist / Soft Warm). One radio click → a deterministic palette + font stack, no model freestyle.}	_{Live todo progress — the agent's plan streams as a live card. in_progress → completed updates land in real time. The user can redirect cheaply, mid-flight.}
_{Sandboxed preview — every <artifact> renders in a clean srcdoc iframe. Editable in place via the file workspace; downloadable as HTML, PDF, ZIP.}	_{72-system library — every product system shows its 4-color signature. Click for the full DESIGN.md, swatch grid, and live showcase.}
_{Deck mode (guizang-ppt) — the bundled guizang-ppt-skill drops in unchanged. Magazine layouts, WebGL hero backgrounds, single-file HTML output, PDF export.}	_{Mobile prototype — pixel-accurate iPhone 15 Pro chrome (Dynamic Island, status bar SVGs, home indicator). Multi-screen prototypes use the shared /frames/ assets so the agent never re-draws a phone.}

Skills

31 skills ship in the box. Each is a folder under skills/ following the Claude Code SKILL.md convention with an extended od: frontmatter that the daemon parses verbatim — mode, platform, scenario, preview.type, design_system.requires, default_for, featured, fidelity, speaker_notes, animations, example_prompt (apps/daemon/src/skills.ts).

Two top-level modes carry the catalog: prototype (27 skills — anything that renders as a single-page artifact, from a magazine landing to a phone screen to a PM spec doc) and deck (4 skills — horizontal-swipe presentations with deck-framework chrome). The scenario field is what the picker groups them by: design · marketing · operation · engineering · product · finance · hr · sale · personal.

Showcase examples

The visually distinctive skills you'll most likely run first. Each ships a real example.html you can open straight from the repo to see exactly what the agent will produce — no auth, no setup.

_{dating-web · prototype Consumer dating / matchmaking dashboard — left rail nav, ticker bar, KPIs, 30-day mutual-matches chart, editorial typography.}	_{digital-eguide · template Two-spread digital e-guide — cover (title, author, TOC teaser) + lesson spread with pull-quote and step list. Creator / lifestyle tone.}
_{email-marketing · prototype Brand product-launch HTML email — masthead, hero image, headline lockup, CTA, specs grid. Centered single-column, table-fallback safe.}	_{gamified-app · prototype Three-frame gamified mobile-app prototype on a dark showcase stage — cover, today's quests with XP ribbons + level bar, quest detail.}
_{mobile-onboarding · prototype Three-frame mobile onboarding flow — splash, value-prop, sign-in. Status bar, swipe dots, primary CTA.}	_{motion-frames · prototype Single-frame motion-design hero with looping CSS animations — rotating type ring, animated globe, ticking timer. Hand-off ready for HyperFrames.}
_{social-carousel · prototype Three-card 1080×1080 social-media carousel — cinematic panels with display headlines that connect across the series, brand mark, loop affordance.}	_{sprite-animation · prototype Pixel / 8-bit animated explainer slide — full-bleed cream stage, animated pixel mascot, kinetic Japanese display type, looping CSS keyframes.}

Design & marketing surfaces (prototype mode)

Skill	Platform	Scenario	What it produces
`web-prototype`	desktop	design	Single-page HTML — landings, marketing, hero pages (default for prototype)
`saas-landing`	desktop	marketing	Hero / features / pricing / CTA marketing layout
`dashboard`	desktop	operation	Admin / analytics with sidebar + dense data layout
`pricing-page`	desktop	sale	Standalone pricing + comparison tables
`docs-page`	desktop	engineering	3-column documentation layout
`blog-post`	desktop	marketing	Editorial long-form
`mobile-app`	mobile	design	iPhone 15 Pro / Pixel framed app screen(s)
`mobile-onboarding`	mobile	design	Multi-screen mobile onboarding flow (splash · value-prop · sign-in)
`gamified-app`	mobile	personal	Three-frame gamified mobile-app prototype
`email-marketing`	desktop	marketing	Brand product-launch HTML email (table-fallback safe)
`social-carousel`	desktop	marketing	3-card 1080×1080 social carousel
`magazine-poster`	desktop	marketing	Single-page magazine-style poster
`motion-frames`	desktop	marketing	Motion-design hero with looping CSS animations
`sprite-animation`	desktop	marketing	Pixel / 8-bit animated explainer slide
`dating-web`	desktop	personal	Consumer dating dashboard mockup
`digital-eguide`	desktop	marketing	Two-spread digital e-guide (cover + lesson)
`wireframe-sketch`	desktop	design	Hand-drawn ideation sketch — for the "show something visible early" pass
`critique`	desktop	design	Five-dimensional self-critique scoresheet (Philosophy · Hierarchy · Detail · Function · Innovation)
`tweaks`	desktop	design	AI-emitted tweaks panel — the model surfaces the parameters worth nudging

Deck surfaces (deck mode)

Skill	Default for	What it produces
`guizang-ppt`	default for deck	Magazine-style web PPT — bundled verbatim from op7418/guizang-ppt-skill, original LICENSE preserved
`simple-deck`	—	Minimal horizontal-swipe deck
`replit-deck`	—	Product-walkthrough deck (Replit-style)
`weekly-update`	—	Team weekly cadence as a swipe deck (progress · blockers · next)

Office & operations surfaces (prototype mode, document-flavored scenarios)

Skill	Scenario	What it produces
`pm-spec`	product	PM specification doc with TOC + decision log
`team-okrs`	product	OKR scoresheet
`meeting-notes`	operation	Meeting decision log
`kanban-board`	operation	Board snapshot
`eng-runbook`	engineering	Incident runbook
`finance-report`	finance	Exec finance summary
`invoice`	finance	Single-page invoice
`hr-onboarding`	hr	Role onboarding plan

Adding a skill takes one folder. Read docs/skills-protocol.md for the extended frontmatter, fork an existing skill, restart the daemon, it appears in the picker. The catalog endpoint is GET /api/skills; per-skill seed assembly (template + side-file references) lives at GET /api/skills/:id/example.

Six load-bearing ideas

1 · We don't ship an agent. Yours is good enough.

The daemon scans your PATH for claude, codex, devin, cursor-agent, gemini, opencode, qwen, qodercli, copilot, hermes, kimi, pi, kiro-cli, kilo, vibe-acp, and deepseek on startup. Whichever ones it finds become candidate design engines — driven over stdio with one adapter per CLI, swappable from the model picker. Inspired by multica and cc-switch. No CLI installed? The API mode is the same pipeline minus the spawn — choose Anthropic, OpenAI-compatible, Azure OpenAI, or Google Gemini and the daemon forwards normalized SSE chunks back. Loopback is allowed for local LLM providers such as Ollama and LM Studio; non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets are rejected at the daemon edge.

2 · Skills are files, not plugins.

Following Claude Code's SKILL.md convention, each skill is SKILL.md + assets/ + references/. Drop a folder into skills/, restart the daemon, it appears in the picker. The bundled magazine-web-ppt is op7418/guizang-ppt-skill committed verbatim — original license preserved, attribution preserved.

3 · Design Systems are portable Markdown, not theme JSON.

The 9-section DESIGN.md schema from VoltAgent/awesome-design-md — color, typography, spacing, layout, components, motion, voice, brand, anti-patterns. Every artifact reads from the active system. Switch system → next render uses the new tokens. The dropdown ships with Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Apple, Anthropic, Cursor, Supabase, Figma, Resend, Raycast, Lovable, Cohere, Mistral, ElevenLabs, X.AI, Spotify, Webflow, Sanity, PostHog, Sentry, MongoDB, ClickHouse, Cal, Replicate, Clay, Composio, Xiaohongshu… — plus 57 design skills sourced from awesome-design-skills.

4 · The interactive question form prevents 80% of redirects.

OD's prompt stack hard-codes a RULE 1: every fresh design brief begins with a <question-form id="discovery"> instead of code. Surface · audience · tone · brand context · scale · constraints. A long brief still leaves design decisions open — visual tone, color stance, scale — exactly the things the form locks down in 30 seconds. The cost of a wrong direction is one chat round, not one finished deck.

This is the Junior-Designer mode distilled from huashu-design: batch the questions up front, show something visible early (even a wireframe with grey blocks), let the user redirect cheaply. Combined with the brand-asset protocol (locate · download · grep hex · write brand-spec.md · vocalise), it's the single biggest reason output stops feeling like AI freestyle and starts feeling like a designer who paid attention before painting.

5 · The daemon makes the agent feel like it's on your laptop, because it is.

The daemon spawns the CLI with cwd set to the project's artifact folder under .od/projects/<id>/. The agent gets Read, Write, Bash, WebFetch — real tools against a real filesystem. It can Read the skill's assets/template.html, grep your CSS for hex values, write a brand-spec.md, drop generated images, and produce .pptx / .zip / .pdf files that show up in the file workspace as download chips when the turn ends. Sessions, conversations, messages, tabs persist in a local SQLite DB — pop the project open tomorrow and the agent's todo card is right where you left it.

6 · The prompt stack is the product.

What you compose at send time isn't "system + user". It's:

DISCOVERY directives  (turn-1 form, turn-2 brand branch, TodoWrite, 5-dim critique)
  + identity charter   (OFFICIAL_DESIGNER_PROMPT, anti-AI-slop, junior-pass)
  + active DESIGN.md   (72 systems available)
  + active SKILL.md    (31 skills available)
  + project metadata   (kind, fidelity, speakerNotes, animations, inspiration ids)
  + skill side files   (auto-injected pre-flight: read assets/template.html + references/*.md)
  + (deck kind, no skill seed) DECK_FRAMEWORK_DIRECTIVE   (nav / counter / scroll / print)

Every layer is composable. Every layer is a file you can edit. Read apps/daemon/src/prompts/system.ts and apps/daemon/src/prompts/discovery.ts to see the actual contract.

Architecture

┌────────────────────── browser (Next.js 16) ──────────────────────┐
│  chat · file workspace · iframe preview · settings · imports     │
└──────────────┬───────────────────────────────────┬───────────────┘
               │ /api/* (rewritten in dev)          │
               ▼                                    ▼
   ┌──────────────────────────────────┐   /api/proxy/{provider}/stream (SSE)
   │  Local daemon (Express + SQLite) │   ─→ any OpenAI-compat
   │                                  │       endpoint (BYOK)
   │  /api/agents          /api/skills│       w/ SSRF blocking
   │  /api/design-systems  /api/projects/…
   │  /api/chat (SSE)      /api/proxy/{provider}/stream (SSE)
   │  /api/templates       /api/import/claude-design
   │  /api/artifacts/save  /api/artifacts/lint
   │  /api/upload          /api/projects/:id/files…
   │  /artifacts (static)  /frames (static)
   │
   │  optional: sidecar IPC at /tmp/open-design/ipc/<ns>/<app>.sock
   │  (STATUS · EVAL · SCREENSHOT · CONSOLE · CLICK · SHUTDOWN)
   └─────────┬────────────────────────┘
             │ spawn(cli, [...], { cwd: .od/projects/<id> })
             ▼
   ┌──────────────────────────────────────────────────────────────────┐
   │  claude · codex · devin (ACP) · gemini · opencode · cursor-agent │
   │  qwen · qoder · copilot · hermes (ACP) · kimi (ACP) · pi (RPC) · kiro (ACP) · kilo (ACP) · vibe (ACP) · deepseek  │
   │  reads SKILL.md + DESIGN.md, writes artifacts to disk            │
   └──────────────────────────────────────────────────────────────────┘

Layer	Stack
Frontend	Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable
Daemon	Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates`
Agent transport	`child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI)
BYOK proxy	`POST /api/proxy/{anthropic,openai,azure,google}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge
Storage	Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir
Preview	Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser (`apps/web/src/artifacts/parser.ts`)
Export	HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown
Lifecycle	`pnpm tools-dev start \| stop \| run \| status \| logs \| inspect \| check`; ports via `--daemon-port` / `--web-port`, namespaces via `--namespace`
Desktop (optional)	Electron shell — discovers the web URL through sidecar IPC, no port guessing; same `STATUS`/`EVAL`/`SCREENSHOT`/`CONSOLE`/`CLICK`/`SHUTDOWN` channel powers `tools-dev inspect desktop …` for E2E

Quickstart

Download the desktop app (no build required)

The fastest way to try Open Design is the prebuilt desktop app — no Node, no pnpm, no clone:

open-design.ai — official download page
GitHub releases

Run with Docker

Run Open Design without installing Node.js or pnpm locally.

Requirements

Docker Desktop
Docker Compose v2

Verify Docker:

docker compose version

Start Open Design

git clone https://github.com/nexu-io/open-design.git
cd open-design/deploy
docker compose up -d

Open in your browser:

http://localhost:7456

Common Commands

# View logs
docker compose logs -f

# Restart containers
docker compose restart

# Stop containers
docker compose down

# Pull latest image
docker compose pull
docker compose up -d

For advanced Docker configuration and environment variables, see QUICKSTART.md.

Run from source

git clone https://github.com/nexu-io/open-design.git
cd open-design
corepack enable
corepack pnpm --version   # should print 10.33.2
pnpm install
pnpm tools-dev run web
# open the web URL printed by tools-dev

Environment requirements: Node ~24 and pnpm 10.33.x. nvm/fnm are optional helpers only; if you use one, run nvm install 24 && nvm use 24 or fnm install 24 && fnm use 24 before pnpm install.

Windows users can follow docs/windows-troubleshooting.md for the native setup path and a tiny double-click launcher.

For desktop/background startup, fixed-port restarts, and media generation dispatcher checks (OD_BIN, OD_DAEMON_URL, apps/daemon/dist/cli.js), see QUICKSTART.md.

The first load:

Detects which agent CLIs you have on PATH and picks one automatically.
Loads 31 skills + 72 design systems.
Pops the welcome dialog so you can paste an Anthropic key (only needed for the BYOK fallback path).
Auto-creates ./.od/ — the local runtime folder for the SQLite project DB, per-project artifacts, and saved renders. There is no od init step; the daemon mkdirs everything it needs on boot.

Type a prompt, hit Send, watch the question form arrive, fill it, watch the todo card stream, watch the artifact render. Click Save to disk or download as a project ZIP.

First-run state (`./.od/`)

The daemon owns one hidden folder at the repo root. Everything in it is gitignored and machine-local — never commit it.

.od/
├── app.sqlite                 ← projects · conversations · messages · open tabs
├── artifacts/                 ← one-off "Save to disk" renders (timestamped)
└── projects/<id>/             ← per-project working dir, also the agent's cwd

Want to…	Do this
Inspect what's in there	`ls -la .od && sqlite3 .od/app.sqlite '.tables'`
Reset to a clean slate	`pnpm tools-dev stop`, `rm -rf .od`, run `pnpm tools-dev run web` again
Move it elsewhere	`OD_DATA_DIR=<absolute-or-relative-path> pnpm tools-dev run web` — the daemon resolves `~/` and anchors relative paths to the repo root. `OD_MEDIA_CONFIG_DIR=<dir>` narrows the override to just `media-config.json` if you want credentials in a separate location.

Migrating a pre-desktop-app `.od/` into the installed Desktop app

If you ran the repo first and only later installed the packaged Desktop app, the two writers point at different roots:

Repo dev-server (pnpm tools-dev start web) writes to <repo-root>/.od/.

Installed Desktop app writes under <appData>/Open Design/namespaces/<channel>/data/, where <appData> is Electron's per-OS app-data base (everything before the Open Design segment that app.getPath("userData") already includes). The channel suffix is platform-specific — the release workflows append -win/-linux:

Platform	`<appData>` (Electron `appData` base)	Stable channel	Beta channel
macOS	`~/Library/Application Support`	`release-stable`	`release-beta`
Windows	`%APPDATA%` (= `%USERPROFILE%\AppData\Roaming`)	`release-stable-win`	`release-beta-win`
Linux	`$XDG_CONFIG_HOME` (default `~/.config`)	`release-stable-linux`	`release-beta-linux`

Example resolved paths:

macOS beta: ~/Library/Application Support/Open Design/namespaces/release-beta/data/
Windows beta: %APPDATA%\Open Design\namespaces\release-beta-win\data\
Linux beta: ~/.config/Open Design/namespaces/release-beta-linux/data/

If unsure, inspect the packaged daemon log right after the app boots; it logs the resolved daemonDataRoot.

⚠️ Do this in a clean state. Migration replaces (not merges) the Desktop app's data dir with your repo .od/. Both writers must be fully stopped before copying — quit the Desktop app and stop the repo dev-server. SQLite-WAL needs to flush cleanly on both sides; if either daemon is still running it can write SQLite/WAL pages or project/artifact files mid-snapshot, leaving the staged copy inconsistent. If the Desktop app already has projects you care about, decide which side is authoritative before continuing — the steps below back up the Desktop's current data/ to a sibling but do not merge.

Option A: one-shot auto-migration via `OD_LEGACY_DATA_DIR`

Use this when the Desktop app's data/ is still empty, which is the typical state right after the upgrade that surfaced #710. Quit the Desktop app first (so its daemon is not holding app.sqlite), then re-launch with OD_LEGACY_DATA_DIR pointed at your old repo .od/. The daemon stages your payload into a sibling tmp directory and only promotes it into data/ on success; on any failure the staging directory is removed so the next boot retries cleanly.

The daemon refuses, with a visible startup error, when:

the path in OD_LEGACY_DATA_DIR does not contain app.sqlite (typo, deleted source, wrong path), or
the Desktop's data/ already contains any of app.sqlite, projects/, artifacts/, media-config.json, etc. SQLite/WAL pairs and project trees cannot be safely interleaved, so the daemon refuses to merge instead of silently corrupting either side. If the Desktop has already booted and seeded its own data/, use Option B and decide explicitly which side wins.

A .migrated-from marker is written on success so subsequent boots no-op.

Quit the Desktop app first, then re-launch with this env set. The launcher must put the variable into the app process environment, not just the shell that runs open / xdg-open.

macOS (LaunchServices does not inherit shell env, so use the direct binary):

OD_LEGACY_DATA_DIR="/path/to/old/repo/.od" \
  "/Applications/Open Design.app/Contents/MacOS/Open Design"

If you prefer the Dock launcher, set the variable in launchctl first, open the app, then unset it:

launchctl setenv OD_LEGACY_DATA_DIR "/path/to/old/repo/.od"
open "/Applications/Open Design.app"
# After the migration log line appears:
launchctl unsetenv OD_LEGACY_DATA_DIR

Linux (run the binary directly so the env var actually reaches it):

OD_LEGACY_DATA_DIR="/path/to/old/repo/.od" /path/to/open-design
# (e.g. the AppImage you launched, or the unpacked binary under /opt)

Windows (PowerShell):

$env:OD_LEGACY_DATA_DIR="C:\path\to\old\repo\.od"
& "$env:LOCALAPPDATA\Programs\Open Design\Open Design.exe"

The daemon log records [od-migrate] migration complete: copied N entries (...). After the first launch you can clear the env variable; the marker prevents re-migration even on subsequent runs.

Option B: manual copy

To carry your existing projects, SQLite, artifacts, and media-config.json over to the Desktop app, when Option A is not viable (Desktop already has its own data and you want to replace it explicitly).

macOS / Linux (bash):

set -euo pipefail
# 1. Stop both writers so the source and target are quiescent.
#    - Quit the Desktop app (Cmd+Q on macOS, File → Exit on Linux).
#    - Stop the repo dev-server: `pnpm tools-dev stop` from the repo root.
# 2. Set REPO and APP_DATA to your actual paths; the example below is macOS + beta.
REPO="/path/to/open-design"
APP_DATA="$HOME/Library/Application Support/Open Design/namespaces/release-beta/data"

# 3. Preflight: see what (if anything) the Desktop app already has.
ls "$APP_DATA/projects" 2>/dev/null && echo "Desktop already has projects, confirm this is a replace, not a merge."

# 4. Stage into a sibling first, then atomically swap into place. `set -e` plus
#    the explicit rsync exit check guarantee a non-zero copy aborts before any
#    `mv` runs, so the Desktop data dir cannot end up half-populated.
STAGE="${APP_DATA}.staged-$(date +%F-%H%M)"
mkdir -p "$STAGE"
rsync -a --exclude='backup-*' "$REPO/.od/" "$STAGE/" || { echo "rsync failed, aborting before swap"; exit 1; }

# 5. Backup the Desktop's current data, then promote the staged copy.
mv "$APP_DATA" "${APP_DATA}.fresh-baseline-$(date +%F-%H%M)"
mv "$STAGE" "$APP_DATA"

# 6. Relaunch the Desktop app. The daemon applies forward schema changes on boot.

Windows (PowerShell):

$ErrorActionPreference = 'Stop'
# 1. Stop both writers so the source and target are quiescent.
#    - Quit the Desktop app (File > Exit).
#    - Stop the repo dev-server: `pnpm tools-dev stop` from the repo root.
# 2. Set $Repo and $AppData to your actual paths; the example below is stable channel.
$Repo    = 'C:\path\to\open-design'
$AppData = Join-Path $env:APPDATA 'Open Design\namespaces\release-stable-win\data'

# 3. Preflight: see what (if anything) the Desktop app already has.
if (Test-Path (Join-Path $AppData 'projects')) {
  Write-Host 'Desktop already has projects, confirm this is a replace, not a merge.'
}

# 4. Stage into a sibling first. Robocopy /MIR mirrors source to staging, and
#    its exit codes >= 8 are real errors (0..7 are success/info), so we guard
#    explicitly before promoting.
$Stamp = Get-Date -Format 'yyyy-MM-dd-HHmm'
$Stage = "$AppData.staged-$Stamp"
robocopy "$Repo\.od" $Stage /MIR /XD 'backup-*' | Out-Null
if ($LASTEXITCODE -ge 8) { throw "robocopy failed (exit $LASTEXITCODE), aborting before swap" }

# 5. Backup the Desktop's current data, then promote the staged copy.
if (Test-Path $AppData) { Rename-Item $AppData "$AppData.fresh-baseline-$Stamp" }
Rename-Item $Stage $AppData

# 6. Relaunch the Desktop app. The daemon applies forward schema changes on boot.

If anything looks wrong after relaunch, restore the original Desktop data by deleting $APP_DATA (or $AppData on Windows) and renaming the .fresh-baseline-* directory back into place.

⚠️ Schema migrations are forward-only. The daemon applies CREATE TABLE IF NOT EXISTS / ALTER TABLE changes on boot; there is no version guard. After migrating, do not open the same data dir with an older repo checkout — unsupported columns or behavior mismatches can leave the workspace inconsistent. Back up app.sqlite* before the first launch with the new app.

⚠️ Advanced: sharing one data dir between repo dev-server and Desktop app. Pointing both at the same dir via OD_DATA_DIR is possible but only safe one-at-a-time. The daemon opens app.sqlite in WAL mode and writes uncoordinated files under projects/ and artifacts/; running both writers concurrently can corrupt SQLite or clobber artifacts. Always stop the Desktop app before starting the dev-server, and stop the dev-server before opening the Desktop app:
OD_DATA_DIR="$HOME/Library/Application Support/Open Design/namespaces/release-beta/data" \
  pnpm tools-dev start web

Full file map, scripts, and troubleshooting → QUICKSTART.md.

Running the Project

Open Design can run as a web app in your browser or as an Electron desktop application. Both modes share the same local daemon + web architecture.

Web / Localhost (Default)

# Foreground mode — keeps the lifecycle command in the foreground (logs written to files)
pnpm tools-dev run web

# View recent logs:
pnpm tools-dev logs

# Background mode — daemon + web run as background processes
pnpm tools-dev start web

By default, tools-dev binds to available ephemeral ports and prints the actual URLs on startup. To use fixed ports from a stopped state:

pnpm tools-dev run web --daemon-port 17456 --web-port 17573

If daemon/web are already running, use restart to switch ports in the existing session:

pnpm tools-dev restart --daemon-port 17456 --web-port 17573

Desktop / Electron

# Start daemon + web + desktop in the background
pnpm tools-dev

# Check desktop status
pnpm tools-dev inspect desktop status

# Take a screenshot of the desktop app
pnpm tools-dev inspect desktop screenshot --path /tmp/open-design.png

The desktop app discovers the web URL automatically via sidecar IPC — no port guessing required.

Other Useful Commands

Command	What it does
`pnpm tools-dev status`	Show running sidecar statuses
`pnpm tools-dev logs`	Show daemon/web/desktop log tails
`pnpm tools-dev stop`	Stop all running sidecars
`pnpm tools-dev restart`	Stop then restart all sidecars
`pnpm tools-dev check`	Status + recent logs + common diagnostics

For fixed-port restarts, background startup, and full troubleshooting see QUICKSTART.md.

Nix

A flake is published at the repo root. Home Manager is the recommended path for individual developers; a NixOS module is also exposed for shared/server installs. See nix/README.md for the full surface (data dir, secrets, webFrontend vs. bringing your own server, OD_DAEMON_URL).

# Home Manager
inputs.open-design.url = "github:nexu-io/open-design";
# then: imports = [ inputs.open-design.homeManagerModules.default ];

nix run github:nexu-io/open-design       # boot the daemon (`od`) without installing

For developers, a Nix dev shell is available and can be used with direnv too:

nix develop   # dev shell with required dependencies to work on Open Design

Use Open Design from your coding agent

Open Design ships a stdio MCP server. Wire it into Claude Code, Codex, Cursor, VS Code, Antigravity, Zed, Windsurf, or any MCP-compatible client and the agent in another repo can read files from your local Open Design projects directly. Replaces the export-then-attach loop. When the agent calls search_files, get_file, or get_artifact without a project argument, the MCP defaults to whatever project (and file) you have open in Open Design right now, so prompts like "build this in my app" or "match these styles" just work.

Why MCP? Exporting and re-attaching a zip every design iteration breaks flow. The MCP server exposes your design source directly -- tokens CSS, JSX components, entry HTML -- as a structured API the agent can query by name. The agent always sees the live file, not a stale copy from the last export.

Open Settings → MCP server in the Open Design app for a per-client install flow. The panel bakes the absolute path to your node binary and the daemon's built cli.js into every snippet, so it works on a fresh source clone where od is not on your PATH. Cursor gets a one-click deeplink; the rest get a copy-paste JSON snippet in the schema their config file expects (Claude Code includes a claude mcp add-json one-liner so you do not have to hand-edit ~/.claude.json). Restart or reload your client after install for the server to show up.

The daemon must be running locally for MCP tool calls to succeed. If the agent was started before Open Design, restart the agent after Open Design is up so it can reach the live daemon. Tool calls made while the daemon is offline return a clear "daemon not reachable" error rather than a crash.

Security model. The MCP server is read-only; it exposes file reads, file metadata, and search -- nothing that writes to disk or calls an external service. It runs as a child process of the coding agent over stdio, so any MCP client you register inherits read access to your local Open Design projects. Treat it like installing a VS Code extension: only register clients you trust. The daemon binds to 127.0.0.1 by default; LAN-wide exposure requires an explicit OD_BIND_HOST opt-in. If you also front the SPA with a non-loopback static server, set OD_ALLOWED_ORIGINS=<origin1>,<origin2>,... (comma-separated scheme://host[:port] entries) so the daemon's same-origin gate accepts API writes from those origins on both the Origin and Host checks; without it the browser will see 403s on every PUT/POST (Caddy v2 reverse_proxy preserves the original Host header upstream by default, so loopback alone is not enough). Connector-credential and live-artifact preview routes stay loopback-only regardless.

Repository structure

open-design/
├── README.md                      ← this file
├── README.de.md                   ← Deutsch
├── README.ru.md                   ← Русский
├── README.zh-CN.md                ← 简体中文
├── QUICKSTART.md                  ← run / build / deploy guide
├── package.json                   ← pnpm workspace, single bin: od
│
├── apps/
│   ├── daemon/                    ← Node + Express, the only server
│   │   ├── src/                   ← TypeScript daemon source
│   │   │   ├── cli.ts             ← `od` bin source, compiled to dist/cli.js
│   │   │   ├── server.ts          ← /api/* routes (projects, chat, files, exports)
│   │   │   ├── agents.ts          ← PATH scanner + per-CLI argv builders
│   │   │   ├── claude-stream.ts   ← streaming JSON parser for Claude Code stdout
│   │   │   ├── skills.ts          ← SKILL.md frontmatter loader
│   │   │   └── db.ts              ← SQLite schema (projects/messages/templates/tabs)
│   │   ├── sidecar/               ← tools-dev daemon sidecar wrapper
│   │   └── tests/                 ← daemon package tests
│   │
│   └── web/                       ← Next.js 16 App Router + React client
│       ├── app/                   ← App Router entrypoints
│       ├── next.config.ts         ← dev rewrites + prod static export to out/
│       └── src/                   ← React + TypeScript client modules
│           ├── App.tsx            ← routing, bootstrap, settings
│           ├── components/        ← chat, composer, picker, preview, sketch, …
│           ├── prompts/
│           │   ├── system.ts      ← composeSystemPrompt(base, skill, DS, metadata)
│           │   ├── discovery.ts   ← turn-1 form + turn-2 branch + 5-dim critique
│           │   └── directions.ts  ← 5 visual directions × OKLch palette + font stack
│           ├── artifacts/         ← streaming <artifact> parser + manifests
│           ├── runtime/           ← iframe srcdoc, markdown, export helpers
│           ├── providers/         ← daemon SSE + BYOK API transports
│           └── state/             ← config + projects (localStorage + daemon-backed)
│
├── e2e/                           ← Playwright UI + external integration/Vitest harness
│
├── packages/
│   ├── contracts/                 ← shared web/daemon app contracts
│   ├── sidecar-proto/             ← Open Design sidecar protocol contract
│   ├── sidecar/                   ← generic sidecar runtime primitives
│   └── platform/                  ← generic process/platform primitives
│
├── skills/                        ← 31 SKILL.md skill bundles (27 prototype + 4 deck)
│   ├── web-prototype/             ← default for prototype mode
│   ├── saas-landing/  dashboard/  pricing-page/  docs-page/  blog-post/
│   ├── mobile-app/  mobile-onboarding/  gamified-app/
│   ├── email-marketing/  social-carousel/  magazine-poster/
│   ├── motion-frames/  sprite-animation/  digital-eguide/  dating-web/
│   ├── critique/  tweaks/  wireframe-sketch/
│   ├── pm-spec/  team-okrs/  meeting-notes/  kanban-board/
│   ├── eng-runbook/  finance-report/  invoice/  hr-onboarding/
│   ├── simple-deck/  replit-deck/  weekly-update/   ← deck mode
│   └── guizang-ppt/               ← bundled magazine-web-ppt (default for deck)
│       ├── SKILL.md
│       ├── assets/template.html   ← seed
│       └── references/{themes,layouts,components,checklist}.md
│
├── design-systems/                ← 72 DESIGN.md systems
│   ├── default/                   ← Neutral Modern (starter)
│   ├── warm-editorial/            ← Warm Editorial (starter)
│   ├── linear-app/  vercel/  stripe/  airbnb/  notion/  cursor/  apple/  …
│   └── README.md                  ← catalog overview
│
├── assets/
│   └── frames/                    ← shared device frames (used cross-skill)
│       ├── iphone-15-pro.html
│       ├── android-pixel.html
│       ├── ipad-pro.html
│       ├── macbook.html
│       └── browser-chrome.html
│
├── templates/
│   ├── deck-framework.html        ← deck baseline (nav / counter / print)
│   └── kami-deck.html             ← kami-flavored deck starter (parchment / ink-blue serif)
│
├── scripts/
│   └── sync-design-systems.ts     ← re-import upstream awesome-design-md tarball
│
├── docs/
│   ├── spec.md                    ← product spec, scenarios, differentiation
│   ├── architecture.md            ← topologies, data flow, components
│   ├── skills-protocol.md         ← extended SKILL.md od: frontmatter
│   ├── agent-adapters.md          ← per-CLI detection + dispatch
│   ├── modes.md                   ← prototype / deck / template / design-system
│   ├── references.md              ← long-form provenance
│   ├── roadmap.md                 ← phased delivery
│   ├── schemas/                   ← JSON schemas
│   └── examples/                  ← canonical artifact examples
│
└── .od/                           ← runtime data, gitignored, auto-created
    ├── app.sqlite                 ← projects / conversations / messages / tabs
    ├── projects/<id>/             ← per-project working folder (agent's cwd)
    └── artifacts/                 ← saved one-off renders

Design Systems

The 72 design systems library — style guide spread

72 systems out of the box, each as a single DESIGN.md:

Full catalog (click to expand)

AI & LLM — claude · cohere · mistral-ai · minimax · together-ai · replicate · runwayml · elevenlabs · ollama · x-ai

Developer Tools — cursor · vercel · linear-app · framer · expo · clickhouse · mongodb · supabase · hashicorp · posthog · sentry · warp · webflow · sanity · mintlify · lovable · composio · opencode-ai · voltagent

Productivity — notion · figma · miro · airtable · superhuman · intercom · zapier · cal · clay · raycast

Fintech — stripe · coinbase · binance · kraken · mastercard · revolut · wise

E-Commerce — shopify · airbnb · uber · nike · starbucks · pinterest

Media — spotify · playstation · wired · theverge · meta

Automotive — tesla · bmw · ferrari · lamborghini · bugatti · renault

Other — apple · ibm · nvidia · vodafone · sentry · resend · spacex

Starters — default (Neutral Modern) · warm-editorial

The product-system library is imported via scripts/sync-design-systems.ts from VoltAgent/awesome-design-md. Re-run to refresh. The 57 design skills are sourced from bergside/awesome-design-skills and added directly in design-systems/.

Visual directions

When the user has no brand spec, the agent emits a second form with five curated directions — the OD adaptation of huashu-design's "5 schools × 20 design philosophies" fallback. Each direction is a deterministic spec — palette in OKLch, font stack, layout posture cues, references — that the agent binds verbatim into the seed template's :root. One radio click → a fully specified visual system. No improvisation, no AI-slop.

Direction	Mood	Refs
Editorial — Monocle / FT	Print magazine, ink + cream + warm rust	Monocle · FT Weekend · NYT Magazine
Modern minimal — Linear / Vercel	Cool, structured, minimal accent	Linear · Vercel · Stripe
Tech utility	Information density, monospace, terminal	Bloomberg · Bauhaus tools
Brutalist	Raw, oversized type, no shadows, harsh accents	Bloomberg Businessweek · Achtung
Soft warm	Generous, low contrast, peachy neutrals	Notion marketing · Apple Health

Full spec → apps/daemon/src/prompts/directions.ts.

Media generation

OD doesn't stop at code. The same chat surface that produces <artifact> HTML also drives image, video, and audio generation, with model adapters wired into the daemon's media pipeline (apps/daemon/src/media-models.ts, apps/web/src/media/models.ts). Every render lands as a real file in the project workspace — .png for image, .mp4 for video — and shows up as a download chip when the turn ends.

Three model families carry the load today:

Surface	Model	Provider	What it's for
Image	`gpt-image-2`	Azure / OpenAI	Posters, profile avatars, illustrated maps, infographics, magazine-style social cards, photo restoration, exploded-view product art
Video	`seedance-2.0`	ByteDance Volcengine	15s cinematic t2v + i2v with audio — narrative shorts, character close-ups, product films, MV-style choreography
Video	`hyperframes-html`	HeyGen / OSS	HTML→MP4 motion graphics — product reveals, kinetic typography, data charts, social overlays, logo outros, TikTok-style verticals with karaoke captions

A growing prompt gallery at prompt-templates/ ships 93 ready-to-replicate prompts — 43 image (prompt-templates/image/*.json), 39 Seedance (prompt-templates/video/*.json excluding hyperframes-*), 11 HyperFrames (prompt-templates/video/hyperframes-*.json). Each carries a preview thumbnail, the prompt body verbatim, the target model, the aspect ratio, and a source block for license + attribution. The daemon serves them at GET /api/prompt-templates, the web app surfaces them as a card grid in the Image templates and Video templates tabs of the entry view; one click drops a prompt into the composer with the right model preselected.

gpt-image-2 — image gallery (sample of 43)

_{3D Stone Staircase Evolution Infographic
3-step infographic, hewn-stone aesthetic}

_{Illustrated City Food Map
Editorial hand-illustrated travel poster}

_{Cinematic Elevator Scene
Single-frame editorial fashion still}

_{Cyberpunk Anime Portrait
Profile avatar — neon face text}

_{Glamorous Woman in Black Portrait
Editorial studio portrait}

Full set → prompt-templates/image/. Sources: most pull from YouMind-OpenLab/awesome-gpt-image-prompts (CC-BY-4.0) with author attribution preserved per template.

Seedance 2.0 — video gallery (sample of 39)

_{Music Podcast & Guitar Technique
4K cinematic studio film}

_{Emotional Face Close-up
Cinematic micro-expression study}

_{Luxury Supercar Cinematic
Narrative product film}

_{Forbidden City Cat Satire
Stylised satire short}

_{Japanese Romance Short Film
15s Seedance 2.0 narrative}

Click any thumbnail to play the actual rendered MP4. Full set → prompt-templates/video/ (the *-seedance-* and Cinematic-tagged entries). Sources: YouMind-OpenLab/awesome-seedance-2-prompts (CC-BY-4.0) with original tweet links and author handles preserved.

HyperFrames — HTML→MP4 motion graphics (11 ready-to-replicate templates)

heygen-com/hyperframes is HeyGen's open-source agent-native video framework — you (or the agent) write HTML + CSS + GSAP, HyperFrames renders it to a deterministic MP4 via headless Chrome + FFmpeg. Open Design ships HyperFrames as a first-class video model (hyperframes-html) wired into the daemon dispatch, plus the skills/hyperframes/ skill that teaches the agent the timeline contract, scene-transition rules, audio-reactive patterns, captions/TTS, and the catalog blocks (npx hyperframes add <slug>).

Eleven hyperframes prompts ship under prompt-templates/video/hyperframes-*.json, each one a concrete brief that produces a specific archetype:

_{5s minimal product reveal · 16:9 · push-in title card with shader transition}	_{30s SaaS product promo · 16:9 · Linear/ClickUp-style with UI 3D reveals}	_{TikTok karaoke talking-head · 9:16 · TTS + word-synced captions}	_{30s brand sizzle reel · 16:9 · beat-synced kinetic typography, audio-reactive}
_{Animated bar-chart race · 16:9 · NYT-style data infographic}	_{Flight map (origin → dest) · 16:9 · Apple-style cinematic route reveal}	_{4s cinematic logo outro · 16:9 · piece-by-piece assembly + bloom}	_{$0 → $10K money counter · 9:16 · Apple-style hype with green flash + burst}
_{3-phone app showcase · 16:9 · floating phones with feature callouts}	_{Social overlay stack · 9:16 · X · Reddit · Spotify · Instagram in sequence}	_{Website-to-video pipeline · 16:9 · captures site at 3 viewports + transitions}

Pattern is the same as the rest: pick a template, edit the brief, send. The agent reads the bundled skills/hyperframes/SKILL.md (which carries the OD-specific render workflow — composition source files into a .hyperframes-cache/ so they don't clutter the file workspace, daemon dispatches npx hyperframes render to dodge the macOS sandbox-exec / Puppeteer hang, only the final .mp4 lands as a project chip), authors the composition, and ships an MP4. Catalog block thumbnails © HeyGen, served from their CDN; the OSS framework itself is Apache-2.0.

Also wired but not surfaced as templates yet: Kling 2.0 / 1.6 / 1.5, Veo 3 / Veo 2, Sora 2 / Sora 2-Pro (via Fal), MiniMax video-01 — all live in VIDEO_MODELS (apps/web/src/media/models.ts). Suno v5 / v4.5, Udio v2, Lyria 2 (music) and gpt-4o-mini-tts, MiniMax TTS (speech) cover the audio surface. Templates for these are open contributions — drop a JSON into prompt-templates/video/ or prompt-templates/audio/ and it shows up in the picker.

Beyond chat — what else ships

The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:

Claude Design ZIP import. Drop an export from claude.ai onto the welcome dialog. POST /api/import/claude-design extracts it into a real .od/projects/<id>/, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". (apps/daemon/src/server.ts — /api/import/claude-design)
Multi-provider BYOK proxy. POST /api/proxy/{anthropic,openai,azure,google}/stream takes { baseUrl, apiKey, model, messages }, builds the provider-specific upstream request, normalizes SSE chunks into delta/end/error, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry /openai/v1, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + api-version; Google uses Gemini :streamGenerateContent.
User-saved templates. Once you like a render, POST /api/templates snapshots the HTML + metadata into the SQLite templates table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
Tab persistence. Every project remembers its open files and active tab in the tabs table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
Artifact lint API. POST /api/artifacts/lint runs structural checks on a generated artifact (broken <artifact> framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
Sidecar protocol + desktop automation. Daemon, web, and desktop processes carry typed five-field stamps (app · mode · namespace · ipc · source) and expose a JSON-RPC IPC channel at /tmp/open-design/ipc/<namespace>/<app>.sock. tools-dev inspect desktop status \| eval \| screenshot drives that channel, so headless E2E works against a real Electron shell without bespoke harnesses (packages/sidecar-proto/, apps/desktop/src/main/).
Windows-friendly spawning. Every adapter that would otherwise blow CreateProcess's ~32 KB argv limit on long composed prompts (Codex, Gemini, OpenCode, Cursor Agent, Qwen, Qoder CLI, Pi) feeds the prompt over stdin instead. Claude Code and Copilot keep -p; the daemon falls back to a temp prompt-file when even that overflows.
Per-namespace runtime data. OD_DATA_DIR and --namespace give you fully isolated .od/-style trees, so Playwright, beta channels, and your real projects never share a SQLite file.

Anti-AI-slop machinery

The whole machinery below is the huashu-design playbook, ported into OD's prompt-stack and made enforceable per-skill via the side-file pre-flight. Read apps/daemon/src/prompts/discovery.ts for the live wording:

Question form first. Turn 1 is <question-form> only — no thinking, no tools, no narration. The user chooses defaults at radio speed.
Brand-spec extraction. When the user attaches a screenshot or URL, the agent runs a five-step protocol (locate · download · grep hex · codify brand-spec.md · vocalise) before writing CSS. Never guesses brand colors from memory.
Five-dim critique. Before emitting <artifact>, the agent silently scores its output 1–5 across philosophy / hierarchy / execution / specificity / restraint. Anything under 3/5 is a regression — fix and rescore. Two passes is normal.
P0/P1/P2 checklist. Every skill ships a references/checklist.md with hard P0 gates. The agent must pass P0 before emitting.
Slop blacklist. Aggressive purple gradients, generic emoji icons, rounded card with left-border accent, hand-drawn SVG humans, Inter as a display face, invented metrics — explicitly forbidden in the prompt.
Honest placeholders > fake stats. When the agent doesn't have a real number, it writes — or a labelled grey block, not "10× faster".

Comparison

Axis	Claude Design (Anthropic)	Open CoDesign	Open Design
License	Closed	MIT	Apache-2.0
Form factor	Web (claude.ai)	Desktop (Electron)	Web app + local daemon
Deployable on Vercel	❌	❌	✅
Agent runtime	Bundled (Opus 4.7)	Bundled (`pi-ai`)	Delegated to user's existing CLI
Skills	Proprietary	12 custom TS modules + `SKILL.md`	31 file-based `SKILL.md` bundles, droppable
Design system	Proprietary	`DESIGN.md` (v0.2 roadmap)	`DESIGN.md` × 129 systems shipped
Provider flexibility	Anthropic only	7+ via `pi-ai`	16 CLI adapters + OpenAI-compatible BYOK proxy
Init question form	❌	❌	✅ Hard rule, turn 1
Direction picker	❌	❌	✅ 5 deterministic directions
Live todo progress + tool stream	❌	✅	✅ (UX pattern from open-codesign)
Sandboxed iframe preview	❌	✅	✅ (pattern from open-codesign)
Claude Design ZIP import	n/a	❌	✅ `POST /api/import/claude-design` — keep editing where Anthropic left off
Comment-mode surgical edits	❌	✅	🟡 partial — preview element comments + chat attachments; surgical patch reliability still in progress
AI-emitted tweaks panel	❌	✅	🚧 roadmap — dedicated chat-side panel UX is not implemented yet
Filesystem-grade workspace	❌	partial (Electron sandbox)	✅ Real cwd, real tools, persisted SQLite (projects · conversations · messages · tabs · templates)
5-dim self-critique	❌	❌	✅ Pre-emit gate
Artifact lint	❌	❌	✅ `POST /api/artifacts/lint` — findings fed back to the agent
Sidecar IPC + headless desktop	❌	❌	✅ Stamped processes + `tools-dev inspect desktop status \| eval \| screenshot`
Export formats	Limited	HTML / PDF / PPTX / ZIP / Markdown	HTML / PDF / PPTX (agent-driven) / ZIP / Markdown
PPT skill reuse	N/A	Built-in	`guizang-ppt-skill` drops in (default for deck mode)
Minimum billing	Pro / Max / Team	BYOK	BYOK — paste any OpenAI-compatible `baseUrl`

Supported coding agents

Auto-detected from PATH on daemon boot. No config required. Streaming dispatch lives in apps/daemon/src/agents.ts (AGENT_DEFS); per-CLI parsers live alongside it. Models are populated either by probing <bin> --list-models / <bin> models / ACP handshake, or from a curated fallback list when the CLI doesn't expose a list.

Agent	Bin	Stream format	Argv shape (composed prompt path)
Claude Code	`claude`	`claude-stream-json` (typed events)	`claude -p <prompt> --output-format stream-json --verbose [--include-partial-messages] [--add-dir …] --permission-mode bypassPermissions`
Codex CLI	`codex`	`json-event-stream` + `codex` parser	`codex exec --json --skip-git-repo-check --sandbox workspace-write -c sandbox_workspace_write.network_access=true [-C cwd] [--add-dir …] [--model …] [-c model_reasoning_effort=…]` (prompt on stdin)
Devin for Terminal	`devin`	`acp-json-rpc`	`devin --permission-mode dangerous --respect-workspace-trust false acp`
Gemini CLI	`gemini`	`json-event-stream` + `gemini` parser	`GEMINI_CLI_TRUST_WORKSPACE=true gemini --output-format stream-json --yolo [--model …]` (prompt on stdin)
OpenCode	`opencode`	`json-event-stream` + `opencode` parser	`opencode run --format json --dangerously-skip-permissions [--model …] -` (prompt on stdin)
Cursor Agent	`cursor-agent`	`json-event-stream` + `cursor-agent` parser	`cursor-agent --print --output-format stream-json --stream-partial-output --force --trust [--workspace cwd] [--model …] -` (prompt on stdin)
Qwen Code	`qwen`	`plain` (raw stdout chunks)	`qwen --yolo [--model …] -` (prompt on stdin)
Qoder CLI	`qodercli`	`qoder-stream-json` (typed events)	`qodercli -p --output-format stream-json --permission-mode bypass_permissions [--cwd cwd] [--model …] [--add-dir …]` (prompt on stdin)
GitHub Copilot CLI	`copilot`	`copilot-stream-json` (typed events)	`copilot -p <prompt> --allow-all-tools --output-format json [--model …] [--add-dir …]`
Hermes	`hermes`	`acp-json-rpc` (Agent Client Protocol)	`hermes acp --accept-hooks`
Kimi CLI	`kimi`	`acp-json-rpc`	`kimi acp`
Kiro CLI	`kiro-cli`	`acp-json-rpc`	`kiro-cli acp`
Kilo	`kilo`	`acp-json-rpc`	`kilo acp`
Mistral Vibe CLI	`vibe-acp`	`acp-json-rpc`	`vibe-acp`
DeepSeek TUI	`deepseek`	`plain` (raw stdout chunks)	`deepseek exec --auto [--model …] <prompt>` (prompt as positional arg)
Pi	`pi`	`pi-rpc` (stdio JSON-RPC)	`pi --mode rpc [--model …] [--thinking …]` (prompt sent as RPC `prompt` command)
Multi-provider BYOK	n/a	SSE normalization	`POST /api/proxy/{provider}/stream` → Anthropic / OpenAI-compatible / Azure OpenAI / Gemini; SSRF-guarded with loopback local providers allowed, non-loopback internal ranges blocked, and upstream redirects disabled

Adding a new CLI is one entry in apps/daemon/src/agents.ts. Streaming format is one of claude-stream-json, qoder-stream-json, copilot-stream-json, json-event-stream (with a per-CLI eventParser), acp-json-rpc, pi-rpc, or plain.

References & lineage

Every external project this repo borrows from. Each link goes to the source so you can verify the provenance.

Project	Role here
`Claude Design`	The closed-source product this repo is the open-source alternative to.
`alchaincyf/huashu-design`	The design-philosophy core. Junior-Designer workflow, the 5-step brand-asset protocol, anti-AI-slop checklist, 5-dimensional self-critique, and the "5 schools × 20 design philosophies" library behind our direction picker — all distilled into `apps/daemon/src/prompts/discovery.ts` and `apps/daemon/src/prompts/directions.ts`.
`op7418/guizang-ppt-skill`	Magazine-web-PPT skill bundled verbatim under `skills/guizang-ppt/` with original LICENSE preserved. Default for deck mode. P0/P1/P2 checklist culture borrowed for every other skill.
`multica-ai/multica`	The daemon + adapter architecture. PATH-scan agent detection, local daemon as the only privileged process, agent-as-teammate worldview. We adopt the model; we do not vendor the code.
`OpenCoworkAI/open-codesign`	The first open-source Claude-Design alternative and our closest peer. UX patterns adopted: streaming-artifact loop, sandboxed-iframe preview (vendored React 18 + Babel), live agent panel (todos + tool calls + interruptible), five-format export list (HTML/PDF/PPTX/ZIP/Markdown), local-first storage hub, `SKILL.md` taste-injection, and the first pass of comment-mode preview annotations. UX patterns still on our roadmap: full surgical-edit reliability and AI-emitted tweaks panel. We deliberately do not vendor `pi-ai` — open-codesign bundles it as the agent runtime; we delegate to whichever CLI the user already has.
`VoltAgent/awesome-claude-design` / `awesome-design-md`	Source of the 9-section `DESIGN.md` schema and 70 product systems imported via `scripts/sync-design-systems.ts`.
`bergside/awesome-design-skills`	Source of 57 design skills added directly as normalized `DESIGN.md` files under `design-systems/`.
`farion1231/cc-switch`	Inspiration for symlink-based skill distribution across multiple agent CLIs.
Claude Code skills	The `SKILL.md` convention adopted verbatim — any Claude Code skill drops into `skills/` and is picked up by the daemon.

Long-form provenance write-up — what we take from each, what we deliberately don't — lives at docs/references.md.

Roadmap

Daemon + agent detection (16 CLI adapters) + skill registry + design-system catalog
Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
31 skills + 72 design systems + 5 visual directions + 5 device frames
SQLite-backed projects · conversations · messages · tabs · templates
Multi-provider BYOK proxy (/api/proxy/{anthropic,openai,azure,google}/stream) with SSRF guard
Claude Design ZIP import (/api/import/claude-design)
Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
Artifact lint API + 5-dim self-critique pre-emit gate
Comment-mode surgical edits — partial shipped: preview element comments and chat attachments; reliable targeted patching remains in progress
AI-emitted tweaks panel UX — not implemented yet
Vercel + tunnel deployment recipe (Topology B)
One-command npx od init to scaffold a project with DESIGN.md
Skill marketplace (od skills install <github-repo>) and od skill add | list | remove | test CLI surface (drafted in docs/skills-protocol.md, implementation pending)
Packaged Electron build out of apps/packaged/ — macOS (Apple Silicon) and Windows (x64) downloads on open-design.ai and the GitHub releases page

Phased delivery → docs/roadmap.md.

Status

This is an early implementation — the closed loop (detect → pick skill + design system → chat → parse <artifact> → preview → save) runs end-to-end. The prompt stack and skill library are where most of the value lives, and they're stable. The component-level UI is shipping daily.

Stay in the loop

Follow @nexudotio on X for release notes, new skills, new design systems, and the occasional behind-the-scenes thread on what's shipping next. Discord is for chat, X is for the milestones — both links are in the badges above.

Star us

If this saved you thirty minutes — give it a ★. Stars don't pay rent, but they tell the next designer, agent, and contributor that this experiment is worth their attention. One click, three seconds, real signal: github.com/nexu-io/open-design.

Contributing

Issues, PRs, new skills, and new design systems are all welcome. The highest-leverage contributions are usually one folder, one Markdown file, or one PR-sized adapter:

Add a skill — drop a folder into skills/ following the SKILL.md convention.
Add a design system — drop a DESIGN.md into design-systems/<brand>/ using the 9-section schema.
Wire up a new coding-agent CLI — one entry in apps/daemon/src/agents.ts.

Full walkthrough, bar-for-merging, code style, and what we don't accept → CONTRIBUTING.md (Deutsch, Français, 简体中文).

Contributors

Thanks to everyone who has helped move Open Design forward — through code, docs, feedback, new skills, new design systems, or even a sharp issue. Every real contribution counts, and the wall below is the easiest way to say so out loud.

If you've shipped your first PR — welcome. The good-first-issue/help-wanted label is the entry point.

Repository activity

The SVG above is regenerated daily by .github/workflows/metrics.yml using lowlighter/metrics. Trigger a manual refresh from the Actions tab if you want it sooner; for richer plugins (traffic, follow-up time), add a METRICS_TOKEN repository secret with a fine-grained PAT.

Star History

If the curve bends up, that's the signal we look for. ★ this repo to push it.

Credits

The HTML PPT Studio family of skills — the master skills/html-ppt/ and the per-template wrappers under skills/html-ppt-*/ (15 full-deck templates, 36 themes, 31 single-page layouts, 27 CSS animations + 20 canvas FX, the keyboard runtime, and the magnetic-card presenter mode) — are integrated from the open-source project lewislulu/html-ppt-skill (MIT). The upstream LICENSE ships in-tree at skills/html-ppt/LICENSE and authorship credit goes to @lewislulu. Each per-template Examples card (html-ppt-pitch-deck, html-ppt-tech-sharing, html-ppt-presenter-mode, html-ppt-xhs-post, …) delegates authoring guidance to the master skill so the upstream's prompt → output behavior is preserved end-to-end when you click Use this prompt.

The magazine / horizontal-swipe deck flow under skills/guizang-ppt/ is integrated from op7418/guizang-ppt-skill (MIT). Authorship credit goes to @op7418.

License

Apache-2.0. The bundled skills/guizang-ppt/ retains its original LICENSE (MIT) and authorship attribution to op7418. The bundled skills/html-ppt/ retains its original LICENSE (MIT) and authorship attribution to lewislulu.

README.md Unescape Escape