mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
35 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
333a62cda6
|
fix: link od bin after fresh install (#2069)
* fix: link od bin after fresh install * test: lock root od bin shim path * test: cover root workspace deps in postinstall scan * chore(nix): refresh pnpm deps hash |
||
|
|
da19ff3ca0
|
feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241)
* feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok
Drops in a `mocks/` top-level dir that pretends to be the real agent
CLIs by streaming pre-recorded sessions in each CLI's native stdout
protocol. Zero LLM tokens.
## Use cases
- **E2E tests** in `apps/daemon/tests/` — exercise the full chat-server
pipeline against a known trace, assert UI events / artifacts.
- **Self-validation during dev** — iterate on `claude-stream.ts` /
`json-event-stream.ts` parser changes without burning provider budget.
- **Regression harness** — replay the same trace before and after a
charter / parser change; diff the daemon events the UI surfaces.
- **Demo / onboarding** — show what a 17-tool claude editing session
looks like end-to-end, offline.
## How
- 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid.
- `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks
one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` /
`_BY_PROMPT_HASH`), streams the trace in the requested format.
- Each format renderer matches the EXACT JSON shape the OD daemon
parser expects, verified line-by-line against
`apps/daemon/src/{json-event-stream,claude-stream}.ts`:
| CLI | streamFormat | parser source |
| ------------------------- | ------------------------- | ------------------------------------------ |
| `opencode` | `json-event-stream` | `handleOpenCodeEvent` |
| `codex` | `json-event-stream` | `handleCodexEvent` |
| `claude` | `claude-stream-json` | `createClaudeStreamHandler` |
| `deepseek` `qwen` `grok` | `plain` | `server.ts` (raw stdout) |
## Quick start
```bash
export PATH="$PWD/mocks/bin:$PATH"
export SYNCLO_EXPLORE_MOCK_TRACE=04097377 # 8-char prefix OK
export SYNCLO_EXPLORE_MOCK_NO_DELAY=1
echo "any prompt" | opencode run
echo "any prompt" | claude -p --output-format=stream-json
echo "any prompt" | codex exec
```
The mock binary announces the picked trace id on stderr:
`[mock-opencode] picked 04097377… via fixed`.
Recording selection (env, in priority order):
- `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK)
- `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N`
- `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` /
`skill:agent-browser` / `outcome:failed` / etc.
- (default) uniform random
- `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random"
- `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits
## Dataset
179 anonymized Langfuse traces from this project's own production
telemetry:
- 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 ·
cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1
- outcomes: succeeded 144 · failed 35
- skills: default 71 · ad-creative 50 · algorithmic-art 30 ·
agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt /
brainstorming / data-report / penpot-flutter-design-source 1 each
- 124 multi-turn (sessions with ≥2 turns)
- 18 produce `<artifact>` output
- ~4.5 MB on disk total
Anonymization: `/Users/<name>/` → `${HOME}/`,
`C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs →
stable `proj-001`, `proj-002`, …. Tool input/output payloads
preserved verbatim (templated UI, no cell-level PII).
## Smoke test
`bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents.
All pass on this branch (verified locally):
```
✓ opencode first event = step_start
✓ codex first event = thread.started
✓ claude first event = system
✓ deepseek emitted plain text (144 chars on first line)
✓ qwen emitted plain text (144 chars on first line)
✓ grok emitted plain text (144 chars on first line)
All mock CLIs working. ✅
```
## Adding more recordings
The exporter that produced this set lives in
[nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore)
(see `cli/src/local/orchestrator/langfuse-import.ts` + the `local
langfuse-import` CLI command). Operators with the Langfuse keys can pull
more by tag / outcome / artifact / multi-turn filter, then run
`local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`.
`mocks/README.md` has the full instructions.
## Out of scope (follow-ups)
- **ACP agents** (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need
a JSON-RPC server on stdio rather than a one-shot stream — separate
`format-acp.mjs` module not yet written.
- **Per-agent json-event-stream variants** (`cursor-agent`, `gemini`,
`qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer;
their parsers are in `apps/daemon/src/json-event-stream.ts` and follow
the same template as `format-codex.mjs`.
## AGENTS.md updates
- Added `mocks/` to the top-level content directories listing
- Added a Validation strategy bullet pointing here for agent-stream /
parser changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout
- Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary
RuntimeAgentDef bin names OD resolves before any fallback. Without these,
a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli,
kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback).
- Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop
so coverage is verified end-to-end.
- Call .unref() on the 30s safety timeout in format-acp.mjs so a completed
ACP session exits promptly instead of waiting the full 30 seconds.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate
Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name
`apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`,
`streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP
agents — covers full login + models + chat-session lifecycle.
### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs)
1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey,
runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl).
The on-disk projection is what OD's daemon login route + AmrLoginPill
poller read; production goes through device-auth, the mock skips
straight to the file write.
2. `vela models` — prints the production-shaped public model catalog as
newline-separated `public_model_* vela` lines. Override via
FAKE_VELA_MODELS env.
3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three
vela-specific protocol extensions:
a. `initialize` response carries `agentCapabilities`
(`promptCapabilities.embeddedContext`) + `models`
(`currentModelId` + `availableModels`).
b. `session/new` response carries the same `models` block.
c. **Strict set_model gate**: `session/prompt` is rejected with
JSON-RPC -32602 ("session/set_model must be called before
session/prompt") UNLESS `session/set_model` (or
`session/set_config_option`) has been called for the current
sessionId. Mirrors real vela 0.0.1 contract; catches regressions
in `attachAcpSession` that silently skip set_model.
### Error injection envs (in sync with fake-vela.mjs)
FAKE_VELA_SESSION_ID - sessionId returned by session/new
FAKE_VELA_TEXT - override assistant text
FAKE_VELA_THOUGHT - optional thought_chunk before text
FAKE_VELA_SESSION_NEW_ERROR - fail session/new
FAKE_VELA_SET_MODEL_ERROR - fail session/set_model
FAKE_VELA_PROMPT_ERROR - fail session/prompt
FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy)
FAKE_VELA_LOGIN_USER_EMAIL - email written into config profile
FAKE_VELA_LOGIN_USER_PLAN - plan written into config profile
FAKE_VELA_LOGIN_DELAY_MS - sleep before write (test in-flight)
FAKE_VELA_LOGIN_FAIL - print + exit 1
FAKE_VELA_MODELS - override models stdout
VELA_PROFILE - profile slot (prod | test | local)
### Components
`mocks/lib/format-vela.mjs` (~205 LOC)
- Full ACP server with vela protocol extensions
- Strict set_model gate
- Error injection plumbing
`mocks/lib/vela-subcommands.mjs` (~90 LOC)
- runVelaLogin() — writes ~/.amr/config.json
- runVelaModels() — prints catalog
`mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to
mock-agent.mjs which routes to login/models or falls through to ACP.
`mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela
dispatcher can read subcommand from there; switch case added for vela.
`mocks/scripts/smoke-test.sh` — +4 assertions:
vela models prints ≥10 catalog lines
vela login writes ~/.amr/config.json with the requested email
vela agent run ACP roundtrip (initialize+models+set_model+stream+result)
vela strict set_model gate rejects prompt without prior set_model
### Verified locally
✓ vela models printed 15 catalog lines
✓ vela login wrote ~/.amr/config.json with profile.prod.user.email
✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed)
✓ vela strict set_model gate rejects session/prompt without prior set_model
All 21 smoke checks pass (up from 17 with previous P3 ACP commit).
### AGENTS.md + README updates
AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in
the directory listing entry.
mocks/README.md — protocol table row + dedicated vela section with
subcommand contract, strict gate explanation, env-injection cheat
sheet. Mock-tree listing updated.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(mocks): honor REPORT_FILE env when --report-file flag not given
Harnesses that spawn the mock without translating their report-path
contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's
orchestrator, which passes REPORT_FILE as env per the existing
opencode/claude/codex agent launchers) wouldn't get a report file
written, so the harness's "agent exit 0 but produced no report"
check would always fire and mark mock runs as failure even though the
stdout stream was complete.
Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE
when --report-file wasn't provided on argv. Each format renderer already
accepts opts.reportFile and writes the recording's final assistant text
to it (`format-*.mjs` already had this — only the wiring was missing).
Verified: synclo-explore run with `mock=true, mock_trace=04097377`
against the opencode wrapper now produces a plan.md with the recording's
17-tool claude editing session report. ~1.5s per run vs ~70s real opencode.
* mocks: move recordings to Cloudflare R2; PR→main→Action upload path
The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has
been moved off git into Cloudflare R2 at the bucket open-design-mocks
under recordings/v1/. The repo now ships:
- mocks/manifest.json — the canonical catalog (renamed from
recordings/index.json) with sha256 + storage hints; consumers
fetch this to discover what exists, then pull individual jsonl
files on demand
- mocks/scripts/fetch-recordings.sh — parallel, sha256-verified,
idempotent puller for the public r2.dev URL
- mocks/scripts/add-recording.sh — local maintainer helper that
validates a new .jsonl and copies it into recordings-staging/
(no R2 calls; no credentials needed)
- mocks/scripts/upload-to-r2.mjs — called only by the CI workflow
- mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/
rebuild-histograms logic, used by both add-recording (preview)
and upload-to-r2 (actual write) so the entry shape never drifts
- .github/workflows/sync-mocks-to-r2.yml — fires on push to main
when mocks/recordings-staging/ changes; uploads to R2, updates
manifest, commits cleanup back; serialized via concurrency group
Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN,
CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a
laptop. Read stays public via the r2.dev URL.
Why not pnpm install integration: contributors who do not touch
agent code do not pay the fetch cost. Fetch happens on first
smoke-test run (auto-fallback) or when a mock spawn needs data.
Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest +
scripts). Smoke test (21 checks) still green against the fetched
corpus.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mocks: scope R2 write token to a dedicated secret name
Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared
CLOUDFLARE_API_TOKEN that landing-page-*.yml uses for Pages deploys)
so the R2 write capability can be scoped to just the
open-design-mocks bucket without bleeding extra capability into the
Pages workflows.
Also hardcode the powerformer CF account_id directly in the workflow
(account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID
secret may point at a different account).
Workflow now fails fast with an actionable error message + dashboard
link if the secret is unset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate)
wrangler 4.x calls /memberships before any r2 action, requiring
user:read scope. R2 "Object Read & Write" tokens deliberately lack
that scope (defense in depth — a leaked token should not enumerate
account-level resources). The workflow now uses the aws CLI talking
straight to the R2 S3-compatible endpoint with SigV4, no membership
lookup.
Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by
CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the
existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end
tested locally: PUT recording → manifest rebuild → manifest PUT →
staging cleanup all green.
aws CLI is pre-installed on ubuntu-latest, so no install step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout
These mocks were copy-pasted from synclo-explore, where they
originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var
convention. That brand-bleed is not appropriate in OD: rename the
public env surface to OD_MOCKS_* (matching OD-native prefixes like
OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS).
Renames:
SYNCLO_EXPLORE_MOCK_TRACE → OD_MOCKS_TRACE
SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH → OD_MOCKS_BY_PROMPT_HASH
SYNCLO_EXPLORE_MOCK_POOL → OD_MOCKS_POOL
SYNCLO_EXPLORE_MOCK_SEED → OD_MOCKS_SEED
SYNCLO_EXPLORE_MOCK_NO_DELAY → OD_MOCKS_NO_DELAY
SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR → OD_MOCKS_RECORDINGS_DIR
SYNCLO_EXPLORE_MOCK_SMOKE_TRACE → OD_MOCKS_SMOKE_TRACE
SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING → OD_MOCKS_ALLOW_LOCAL_UPLOAD
Also drop the inline harvester usage from README. The harvester is an
external CLI in nexu-io/agent-pr-explore — its README is the right
place for langfuse-import flags, anonymization options, etc. OD only
documents its own staging→PR→Action workflow.
Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end
verified to route correctly.
Consumers of the OLD env names (notably the orchestrator in
nexu-io/agent-pr-explore) need a matching rename. No back-compat
shim here — the explore side has zero external users today and a
one-line follow-up is cleaner than a permanent deprecation layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_*)
Missed in the prior commit (
|
||
|
|
df8a0faff6
|
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent
AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.
The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.
Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.
Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)
Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).
* fix(runtimes/amr): pin a concrete default model and bare openai ids
End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:
1. vela rejects `session/prompt` with `session/set_model must be called
before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
skips set_model whenever the picked model is the synthetic 'default'
id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
default option and `fallbackModels[0]`, which makes attachAcpSession
always send a real `session/set_model` for AMR turns.
2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
it forwards to opencode's openai provider. With OpenRouter-style ids
like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
`openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
The new fallback list ships the bare ids opencode's openai registry
actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).
Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
way real vela does, so a regression that silently goes back to
model: 'default' would surface as a fatal error in tests instead of a
hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
no 'openai/' prefix in fallbackModels, and a negative case that
asserts session/prompt fails when no model is set.
Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.
Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.
* fix(runtimes/amr): substitute concrete model when chat run sends 'default'
A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:
json-rpc id 3: session/set_model must be called before session/prompt
The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.
This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:
1. If the caller picked a real id, pass it through.
2. Else, if `def.defaultModelEnvVar` is set and the daemon process
env has a non-empty value for it, return that (operator escape
hatch — see below).
3. Else, if the def's `fallbackModels` does NOT contain a 'default'
id, return `fallbackModels[0].id`.
4. Else, return the original value (the historic shape — defs that
list 'default' themselves are untouched).
AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.
Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
four resolver branches plus the env-override happy path / fallback /
ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.
* chore(runtimes/amr): move AMR to the top of the base agent list
So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.
* feat(amr): Sign-in / Sign-out button on the AMR Settings card
The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:
- GET /api/integrations/vela/status reads ~/.vela/config.json
for the active profile and returns { loggedIn, profile, user }
(without leaking the runtime/control keys themselves).
- POST /api/integrations/vela/login spawns `vela login` once
(409 if one is already in flight). The vela CLI opens the user's
browser to the device-authorization page itself — Open Design
only needs to kick the subprocess off.
- POST /api/integrations/vela/logout removes ~/.vela/config.json
so the next status read returns logged-out.
`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.
i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.
Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
against a tmp HOME — including the negative case where a profile
has user info but no runtimeKey (still logged-out), and the
secret-leak guard ("rt-secret-*" must not appear in the projection
payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
states (logged-out, logging-in, logged-in, logging-out) plus the
click-propagation invariant the divergent card was built to keep.
`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.
Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.
* fix(connection-test): substitute model for AMR before attachAcpSession
The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.
Reuse the same helper here so Test Connection mirrors chat-run behavior.
* test(amr): three-layer end-to-end coverage for the AMR login + turn flow
The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:
1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
Spins up the real daemon Express app via `startServer({port:0,...})`,
persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
and exercises every /api/integrations/vela/* endpoint against the
extended fake-vela stub:
- status reads ~/.vela/config.json under various states
- login spawns the fake, waits for config.json to appear, returns
pid + startedAt + profile
- 409 already-running guard with the stub's delay knob
- logout removes the file (idempotent)
- secrets (runtimeKey / controlKey) never leak in the projection
- login → status round-trip flips loggedIn=false → true
2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
Boots a namespaced daemon + web pair through `createSmokeSuite`,
inlines a self-contained fake `vela` binary that handles BOTH
`vela login` (writes ~/.vela/config.json) and
`vela agent run --runtime opencode` (ACP stdio with the
`session/set_model must precede session/prompt` gate the real binary
enforces), then drives a complete /api/runs lifecycle for
`agentId: 'amr', model: 'default'` and asserts the assistant message
captures the fake's streamed text. This is the test that would have
surfaced today's plugin-default-model regression (the `set_model
before prompt` error) at PR time instead of demo time.
3. e2e/ui/amr-login-pill.test.ts (Playwright)
Mocks /api/agents + /api/integrations/vela/{status,login,logout}
to drive the Settings AMR card through the full Sign in → Signed in
→ Sign out cycle. Pins the AmrLoginPill polling contract and the
aria-label semantics (the pill's accessible name is "Sign out" once
logged in, regardless of which label the hover-state text shows).
fake-vela.mjs extensions:
- Handles `vela login` argv by writing
~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
mirrors real vela's on-disk side-effect without the device-auth
loop.
- FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
in-flight state of the spawn lifecycle.
- FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
user fields end-to-end.
Validated:
- `pnpm guard` + `pnpm typecheck` (all workspace projects)
- `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
including the new 8-test integration suite.
- `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
- `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
1 / 1 passing (6.7s).
* feat(amr): package native cli and refine login ui
* feat(amr): wire vela cli beta packaging
* docs(amr): document vela ci packaging review
* docs(amr): refine vela ci integration review
* fix(ci): refresh nix pnpm dependency hashes
* fix(pack): clean up Vela CLI packaging
* fix(pack): bundle Vela CLI support files
* fix(amr): recover login attempts from stale auth state
* test: expand AMR and automations coverage
* fix(amr): address review follow-ups
* test(web): align tasks fixtures with contracts
* fix(daemon): type wildcard route params
* fix(ci): refresh PR merge validation
* fix(amr): clear env credentials on logout
* feat(settings): inline local CLI model configuration
* fix(amr): recognize daemon env credentials
* [codex] Fix Vela companion packaging (#2979)
* Fix Vela companion packaging
* Update Nix pnpm dependency hashes
* [codex] Surface AMR account failures (#2980)
* fix: surface AMR account failures
* fix: cover AMR recovery error guidance
* chore: bump beta base version to 0.8.1 (#2990)
* Fix AMR profile and packaged runtime review issues
* Detect packaged AMR OpenCode companion tree
* feat(web): polish AMR frontend flows
* Polish AMR onboarding card
* fix: read AMR login state from dot-amr config (#3048)
* test: tighten AMR credential and packaging coverage
* test: restore AMR executable test env helper
* [codex] Fix packaged mac Dock identity and AMR label (#3076)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR live models and dot-amr login state (#3073)
* fix: read AMR login state from dot-amr config
* fix: load live AMR models before runs
* fix: point AMR onboarding link to production wallet
* fix: address AMR model review feedback
* fix: persist live AMR model fallback
* [codex] Fix AMR link catalog model ids (#3088)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR link catalog model ids
* Fix AMR model normalization typecheck
* Use live AMR model for default runs
* fix: polish AMR runtime settings UI
* Accelerate AMR startup defaults (#3092)
* Surface AMR insufficient balance wallet URL (#3099)
* fix(web): polish onboarding controls (#3112)
* fix(web): show CLI scan loading state
* Avoid duplicate AMR wallet recharge links (#3117)
* Avoid duplicate AMR wallet recharge links
* Use Vela CLI 0.0.3 test package
* chore(nix): refresh pnpm deps hash
* Fix AMR wallet guidance display
---------
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)
* chore(nix): refresh pnpm deps hash
* chore(pack): pin Vela CLI 0.0.3
* chore(nix): refresh pnpm deps hash
* fix(web): suppress AMR exit 130 fallback (#3136)
* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)
* feat(web): nudge users to hosted AMR on model/auth/quota failures
When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.
Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.
* feat(web): rework failed-run AMR guidance into per-case error UI
Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:
- The error card is now neutral gray (was red) and always carries a
retry button; it is driven by the persisted per-message error event so
it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
card under the error card offers "switch to AMR & retry" — switches the
run to AMR, opens Settings on the AMR card, and auto-retries once the
account signs in (ProjectView polls vela login status, independent of
the Settings pill lifecycle, with success / 5-min-timeout / unmount
exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
pulses, and an authorize-button coachmark (a fake hand cursor that
rises in and dismisses on hover) points at the sign-in control when not
yet authorized.
analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.
* fix(daemon): require status context for numeric service-failure codes
Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.
* fix(web): keep error pill for failed runs ChatPane's card doesn't cover
Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.
* fix(daemon): don't treat a process exit code as an HTTP status
Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.
* chore(web): translate AMR card / error keys for 16 remaining locales
PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.
* fix(amr): address review feedback on #2355
Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.
- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
verifies the inner `opencode` executable exists + is runnable, not
just the directory. This closes the false-positive availability path
that let `detectAgents()` surface AMR as available even when the
packaged companion was empty / partially copied (mrcfps, 4 threads).
- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
stale `opencode` on the user's PATH, so packaged AMR builds can't be
hijacked by a global installation.
- web/EntryShell.tsx: when the Local CLI scan returns an available
agent and the previously-selected agent is AMR, switch the selection
to the first available local agent so the runtime and persisted
agent agree before Continue.
- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
BEFORE rejecting on an empty live-model catalog — a signed-out user
was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
the correct `AMR_AUTH_REQUIRED` (sign-in affordance).
- server.ts (default model fallback): if the user asked for the AMR
agent default and the cached id is no longer in the FRESH catalog,
fall back to `liveModels[0]` from the probe instead of rejecting the
run as `AMR_MODEL_UNAVAILABLE`.
- integrations/vela.ts: route `vela login` through
`createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
verbatim args (matches `execAgentFile` / chat-run spawning).
- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
to the container-side path. The host path was being passed in as-is
even though the default container only mounts /project, /tools-pack
and cache/home — `copyOptionalVelaCliBinary` saw a missing path.
Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
prematurely blocked — `@powerformer/vela-cli` only publishes the
`darwin-arm64` platform binary today; adding the flag elsewhere
would fail the builds. Revisit once win/x64/linux binaries ship.
* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)
The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.
Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.
Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in
|
||
|
|
34165ff189
|
chore: retire tools-pr (#2867) | ||
|
|
b4e94b0534
|
Harden packaged updater downloads and install handoff (#2677)
* Add managed download package for updater resumes * fix(download): clear stale pid locks * test(e2e): harden windows updater resume smoke * feat(updater): make update downloads silent in ui * fix(updater): keep install handoff prompt visible * fix(ci): build platform before download in postinstall |
||
|
|
f294ab4915
|
chore(ci): add visual regression PR workflow (#2372)
* Add visual regression PR workflow * Allow manual visual PR comments * Post visual comments for same-repo PRs * fix(ci): surface R2 lookup failures in visual report Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Align visual workflow names |
||
|
|
80d305858b
|
feat(diagnostics): add one-click log export from Settings → About (#798)
* feat(diagnostics): add one-click log export from Settings → About
Adds a new "Export diagnostics" entry under the About section that bundles
daemon/web/desktop logs, machine info, and recent macOS crash reports into
a zip the user can share when reporting issues.
- Browser hits a new daemon HTTP endpoint and triggers a download.
- Electron uses an IPC bridge with the native save dialog and reveals the
saved file in Finder/Explorer; the Help menu also exposes it as a
fallback when the daemon is unresponsive.
Packaging + redaction lives in a new @open-design/diagnostics package so
both surfaces share it. Sensitive JSON keys, URL query secrets, and the
current user's home path are redacted before packaging.
* build(nix): include packages/diagnostics in daemon build targets
The Nix daemon derivation builds workspace siblings in dependency order
before compiling apps/daemon. Without @open-design/diagnostics in that
list, the daemon TypeScript build fails inside the Nix sandbox with
`Cannot find module '@open-design/diagnostics'` because pnpm install
only creates the symlink — the dist output that the package.json
exports point at isn't produced until each sibling's build script runs.
* build(tools-pack): include @open-design/diagnostics in packaged INTERNAL_PACKAGES
Without this, packaged win/mac/linux builds fail with `npm error 404` when
the post-build `npm install --omit=dev --no-package-lock` step in the
assembled app tries to resolve `@open-design/diagnostics@0.2.0` from the
public npm registry. The package is workspace-private, so it has to be
tarballed via `pnpm pack` and file:-referenced from the assembled
package.json like every other internal workspace dep that daemon/desktop
depend on.
Also wires the package's `pnpm --filter ... build` into the pre-pack
workspace build step so the dist/ exists before pnpm pack runs, and
updates the two test fixtures (`win-app.test.ts`, `workspace-build.test.ts`)
that mirror INTERNAL_PACKAGES.
The diagnostics package itself is repinned to exact dependency versions
already used elsewhere in the workspace (`jszip 3.10.1`, `@types/node
20.19.39`, `esbuild 0.28.0`, `typescript 5.9.3`, `vitest 4.1.6`) so it
passes the new `pnpm guard` exact-version rule and produces a minimal
lockfile diff vs main (additions only, no resolution-string churn).
* fix(diagnostics): include `~` in bearer-token redaction char class
RFC 6750 token68 syntax allows `~`, so tokens like `Authorization: Bearer
abcd~efgh` were only partially matched by `HTTP_AUTH_SCHEME_RE`. The
regex stopped at the first `~`, leaving the tail (`~efgh`) un-redacted in
the exported diagnostics zip — a clear leak since this feature explicitly
generates support bundles for external sharing.
Add `~` to the character class and a regression test.
* fix(diagnostics): only collect renderer.log from desktop
`buildSidecarLogSources` unconditionally added `logs/${app}/renderer.log`
for daemon/web/desktop, but only the desktop runtime writes a renderer
log (see apps/desktop/src/main/runtime.ts) — daemon and web are pure
Node services with no Electron renderer. Every export therefore produced
missing-file placeholders and manifest warnings for the two phantom
paths, polluting the bundle.
Gate the renderer.log source on APP_KEYS.DESKTOP so the daemon-side
collector matches the desktop-side collector in apps/desktop/src/main/
diagnostics.ts:63.
* fix(diagnostics): mirror desktop-side renderer.log gate
The previous fix only updated the daemon-side `buildSidecarLogSources`
in `apps/daemon/src/diagnostics-export.ts`. The desktop-side collector
at `apps/desktop/src/main/diagnostics.ts` had an identical copy of the
same bug that I overlooked: it also unconditionally added
`logs/${appKey}/renderer.log` for daemon/web/desktop, producing
missing-file placeholders + manifest warnings for the two phantom paths
on every desktop-initiated export.
Apply the same `appKey === APP_KEYS.DESKTOP` gate here so both export
entry points (browser via daemon HTTP, Electron via native save dialog)
emit the same clean manifest.
* feat(diagnostics): add `od diagnostics export` CLI subcommand
AGENTS.md's dual-track capability-exposure contract requires every
user-facing feature to ship on both the web UI and the `od` CLI. The
diagnostics export was only reachable through Settings → About and the
desktop Help menu; this commit closes the loop with an `od diagnostics
export [<path>] [--json]` subcommand registered in SUBCOMMAND_MAP.
The CLI is a thin shell over the existing GET /api/diagnostics/export
endpoint — same zip output, same redaction, same crash-report scope.
Defaults to writing `open-design-diagnostics-<timestamp>.zip` in the
current directory; `--output <path>` or a positional arg overrides.
`--json` prints `{path, sizeBytes}` for shell pipelines.
Use cases this unlocks:
- A CI script can `od diagnostics export ~/artifacts/bundle.zip` after
a failed run.
- Bug reporters on headless boxes can grab a bundle without booting
the web UI.
- `od doctor` follow-ups can collect a full snapshot when a probe fails.
* fix(diagnostics): surface non-sidecar launch in manifest warnings
`buildSidecarLogSources()` returns `[]` when the daemon has no sidecar
runtime context, which is the standard `od` (plain) launch path —
`runDaemonCliStartup()` -> `startDaemonRuntime()` does not pass a
runtime. Settings → About and the new `od diagnostics export` previously
reported success but produced a bundle with only the summary JSONs, so
operators could not tell "no logs because plain launch" from "no logs
because something genuinely broke."
- Extend `DiagnosticsContext` with an optional upstream `warnings:
string[]` that `buildManifest` merges into the manifest warnings.
- Emit STANDALONE_LAUNCH_WARNING from the daemon handler when
`options.runtime == null`. The warning names the limitation and
points the user at the sidecar entry points that DO capture logs.
- Add a regression spec at `apps/daemon/tests/diagnostics-export.test.ts`
that drives the handler with `runtime: null` and asserts the warning
surfaces in `summary/manifest.json` (and that `files` is empty so a
user reading the bundle does not confuse "no log sources" with
"missing files").
|
||
|
|
2c128e0e91
|
refactor desktop host bridge (#2246) | ||
|
|
6a08dfe111
|
Add design system package quality guard (#2224)
* Add design system import manifest schema * Generate hybrid design system imports * Read design system usage and cached manifests * Add design system pull-file tool * Show design system package evidence * Wire design system import semantics * Add design system package quality guard --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> |
||
|
|
bd48c597b0
|
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation |
||
|
|
4424f08be0
|
[codex] Add packaged desktop auto-update (#1375)
* Add packaged desktop auto-update * Handle counted beta nightly update versions * Refresh desktop auto-update branch for main * Serialize desktop updater operations * Refresh auto-update branch for packaged paths |
||
|
|
f7eb82d7a5
|
feat(design-systems): import design system projects (#2112)
* feat(design-systems): define project manifest contract * feat(design-systems): add default project manifest * feat(daemon): consume design system manifests * feat(design-systems): import local project systems * feat(design-systems): import from github repositories --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> |
||
|
|
1f66c53203
|
feat(daemon): consume component manifests (#2053)
* feat(design-systems): extract component manifests * feat(daemon): consume component manifests --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> |
||
|
|
d1a2f9f07e
|
chore(design-systems): report component fixture coverage (#2049)
Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> |
||
|
|
b268bbe169 |
Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix
Brings in 11 new garnet commits, most importantly: - |
||
|
|
6c16283850 |
Merge origin/main (post-7c8305f4) into reconcile branch
Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline. |
||
|
|
e57e028222
|
feat(daemon): make design-system token channel default-on (PR-D) (#1544)
* feat(daemon): make design-system token channel default-on (PR-D)
Flip `OD_DESIGN_TOKEN_CHANNEL` from default-off to default-on. Every
chat that picks a brand with `tokens.css` + `components.html` siblings
(today: `default`, `kami`) now gets the structured token contract
appended to the system prompt automatically. `OD_DESIGN_TOKEN_CHANNEL=0`
keeps the DESIGN.md-only path as a kill switch.
Adds `scripts/check-design-system-flag-parity.ts`, registered in
`pnpm guard`. The guard walks every brand and asserts:
- 147 prose-only brands produce byte-identical prompts under flag-off
vs flag-on (PR-D's "no-op for legacy brands" promise)
- 2 structured brands diverge as expected (catches a future regression
that silently dropped the structured blocks)
Smoke evidence on #1385 (PR-C):
- `default` — 10/10 brand tokens used byte-for-byte in treatment vs
0/10 invented colors in control
- `kami` — treatment recovers brand name (`Kami · 纸`), the two-tier
surface (`--bg` parchment + `--surface` ivory), the CN font stack
override, and the `components.html` card pattern; control invented
"Replica" as a brand name
Co-authored-by: Cursor <cursoragent@cursor.com>
* review: address @nettee + @lefarcen feedback on parity guard
Two blocking findings from #1544 review:
1. @nettee — guard's inventory walk silently passed on unreadable
filesystem state. `fileExists` swallowed every `stat` error and the
bare `readdir` catch returned `[]` for any failure. A renamed
`design-systems/` tree, a permission-denied DESIGN.md, or a
directory at the brand path would have left `pnpm guard` happy
after checking 0 brands — exactly the silent misconfiguration this
guard exists to catch. Both error paths now treat only ENOENT /
ENOTDIR as absence and rethrow everything else, mirroring the
`readFileOptional` fix already applied to PR-C's
`apps/daemon/src/design-systems.ts`.
2. @nettee — guard exercised `composeSystemPrompt` directly, bypassing
the `process.env.OD_DESIGN_TOKEN_CHANNEL !== '0'` gate in server.ts
that PR-D actually flipped. A regression that restored `=== '1'`,
typo'd the env name, or stopped reading assets when the var is
unset would still leave the guard green. Extracted the predicate
into `isDesignTokenChannelEnabled(env)` next to
`readDesignSystemAssets` and added 6 unit tests pinning every value
that matters: unset / `'1'` / `'true'` / empty / `'0'` /
whitespace-padded. server.ts now calls the predicate. Any
regression on the env-flag semantics fails
`tests/design-system-assets.test.ts` independently of the
composer-level coverage.
Verified: pnpm guard (13/13), tsc -p scripts/tsconfig.json (clean),
@open-design/daemon typecheck (clean), 32/32 prompt + asset tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
* review: pin server-layer asset resolution end-to-end (lefarcen P2)
Round-2 review feedback from @lefarcen on #1544: the predicate suite
in tests/design-system-assets.test.ts pinned the env-flag boolean but
did NOT exercise the server prompt-assembly path that PR-D actually
flipped — the seam where the daemon decides whether to read tokens.css
/ components.html from disk and hand them to composeSystemPrompt. A
regression that, say, restored an inline `=== '1'` gate or stopped
calling isDesignTokenChannelEnabled() from server.ts would still leave
the predicate test green.
Extracted that whole seam into `resolveDesignSystemAssets(id,
builtInRoot, userInstalledRoot, env)` on apps/daemon/src/design-systems.ts.
The function combines:
1. the env-flag gate (kill switch on `OD_DESIGN_TOKEN_CHANNEL=0`)
2. the built-in → user-installed root fallback chain (per-file)
3. the DesignSystemAssets result shape consumed by composeSystemPrompt
server.ts at the prompt-assembly site is now a thin caller of this
function. The previous 13-line inline block (env check + per-file
fallback) collapses to one call, so the whole asset-resolution path
now has a single testable seam.
7 new tests in tests/design-system-assets.test.ts run the full pipeline
end-to-end against real disk fixtures:
- env unset (default-on): returns built-in assets
- env=`'0'` (kill switch): returns undefined even with files on disk
- env=`'1'` (legacy opt-in): still works
- mixed builtin/user-installed: per-file fallback merges correctly
- both halves built-in: skips user-installed roundtrip verbatim
- prose-only brand (no files): undefined / undefined
- nonexistent brand directory: undefined / undefined
Verified: pnpm guard (13/13), tsc -p scripts/tsconfig.json (clean),
@open-design/daemon typecheck (clean), 39/39 prompt + asset tests
(was 32; +7 new server-layer-resolution tests).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(test): add missing projectKind to FileViewer deck preview test
The deck preview test added in #1556 (
|
||
|
|
3b167b6921 |
feat(plugins): add registry protocol and enhance plugin management features
- Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities. |
||
|
|
d3602be666 |
Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory. |
||
|
|
f621dbbfea
|
feat(web): Add Tailwind foundation (#1388) | ||
|
|
5af84c09af |
feat(web): refactor PluginsHomeSection to use tag-based filtering and introduce PluginCard component
- Replaced the legacy tabbed categorization in `PluginsHomeSection` with a tag-driven approach, allowing dynamic filtering based on plugin tags. - Introduced a new `PluginCard` component to encapsulate the rendering of individual plugin cards, improving separation of concerns and maintainability. - Added a `usePluginCategories` hook to manage plugin visibility and filtering logic, enhancing the overall structure and testability of the component. - Implemented a "More" pill for overflow tags in the filter row, improving user interaction with a cleaner UI. - Updated CSS styles to support the new layout and improve visual consistency across the plugins home section. This update significantly enhances the user experience by providing a more flexible and intuitive way to discover and interact with plugins. |
||
|
|
a75d9938c7
|
feat(design-systems): add structured tokens.css schema (default + kami) (#1231)
* feat(design-systems): add structured tokens.css schema (default + kami) Compile each brand's DESIGN.md prose into a machine-readable :root block agents paste verbatim, removing the "Primary → --accent" translation step where most token misuse happens. Daemon prompt injection lands in a follow-up; lint-artifact already enforces the shared token vocabulary so no rule changes needed. Schema validated across two contrasting aesthetics: - default (sans-serif, cobalt, B2B utility) — stress test the shallow form, 2-level fg / 2-level surface - kami (serif, parchment, ink-blue, print-first) — stress test the rich form, 4-level fg ramp, 3-level surface, ring elevation, i18n font stacks, and solid-hex tag tints (print renderers double-paint alpha) Schema growth from kami's stress test (5 new optional slots, all backward-compatible — default aliases via var() to existing tokens): - --fg-2 / --meta (4-level fg ramp) - --surface-warm (3-level surface) - --border-soft (2-level border) - --elev-ring (ring elevation as first-class level) Brand-specific extensions live in tokens.css with explicit "NOT in shared schema" labels and a documented promotion path (≥2 brands need it → promote to schema slot). components.html in each brand is a self-contained reference fixture that exercises every token through real layouts. Both fixtures lint clean against apps/daemon/src/lint-artifact.ts. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): add token-fixture drift guard Each design system in design-systems/<brand>/ ships two files agents consume in tandem: tokens.css (canonical token bindings) and components.html (a self-contained fixture whose first <style> embeds the same :root paste so the file renders standalone). The fixture's :root block is a copy of tokens.css's :root block, kept in sync only by an inline comment. This adds scripts/check-tokens-fixture-sync.ts and registers it in pnpm guard. The check pairs each brand's tokens.css with its components.html and asserts the unscoped :root block is byte-equivalent after canonical normalization (CSS comments stripped, whitespace collapsed, separator spacing normalized). Brands missing one half of the pair, or with no :root rule in either file, fail the guard. Scoped overrides like :root[lang="zh-CN"] are not required to appear in the fixture (per the kami fixture's inline comment they are pasted only when an artifact's <html lang> matches), so the check only compares the unscoped :root block. Verified: pnpm guard passes for default + kami, fails on intentional value drift, fails on missing token, tolerates whitespace-only formatting differences. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(design-systems): point fixture CTAs to real files Both default and kami components.html advertised in-page anchors (#tokens, #spec, #surface, #accent, #type, #components) but defined no matching ids, so every CTA was a no-op when the fixture was opened locally — flagged by mrcfps in #1231. Re-point each link to a real artifact in the same brand directory: - "View tokens" / "Inspect tokens" / "Inspect typography" → ./tokens.css - "Read the spec" / "Read the rule" → ./DESIGN.md Browsers render these as raw source views, which is the desired UX for a reference fixture: clicking the CTA shows the underlying contract instead of jumping to nothing. Agents copying the fixture also learn the pattern of "buttons link to actual sibling resources". The :root token block is unchanged, so the token-fixture drift guard still passes for both brands. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): codify token schema (A1/A2/B/C layers) The two-brand pilot (default + kami) settled the shape of the shared token schema; this commit codifies it as a machine-readable contract and enforces it in pnpm guard, addressing lefarcen's review on #1231: > the optional-vs-required split won't generalize cleanly when brand > #3 needs different Layer A tokens or when multiple brands converge > on the same extension (promoting C→B→A). Consider surfacing that > limitation in the PR narrative or in a future SCHEMA.md. Schema lives under design-systems/_schema/ as three files: - tokens.schema.ts — TypeScript declaration of every shared token with its layer (A1-identity / A1-structure / A2 / B-slot), plus per-brand C-extension allowlists and a global C-prefix allowlist - defaults.css — CSS mirror of A2 fallback values, used as the human-readable contract reviewer's-eye copy and the future input to the derive script - AGENTS.md — schema layer model, C → B-slot → A2 promotion rules, when-not-to-add-a-token guidance Layer model: A1-identity 8 tokens — bg/surface/fg/muted/border/accent + font-display/font-body. The brand IS these values; no fallback is defensible. A1-structure 18 tokens — type scale (8), leading (2), tracking (1), section-y (3), container (4). Structural decisions vary per brand by design and have no cross-brand default. A2 26 tokens — accent states, semantic colors, motion, base spacing scale, radius, elevation, focus, font-mono. Required in every tokens.css; fallback lives in defaults.css for the future derive script to inline when DESIGN.md does not specify the value. B-slot 4 tokens — fg-2 / meta / surface-warm / border-soft. Brand may bind independently or alias the named sibling via var(...) for components that target the richer ramp. C-extension n tokens — brand-specific names (kami's tag-bg-*, leading-display, accent-light, etc.). Allowlisted per-brand in BRAND_EXTENSIONS or globally by prefix in BRAND_EXTENSION_PREFIXES. Promote when a second brand adopts the same name. Why A2 fails the guard today: Artifacts are generated by agents pasting one brand's :root block into a single <style>; there is no global stylesheet that supplies fallbacks at runtime. A tokens.css missing an A2 declaration would silently break any var() reference in the fixture. Until the derive script (PR-B) lands and inlines defaults, every brand's tokens.css must declare every A2 token directly. The guard enforces this strictly. Why --font-mono lands in A2 (not A1): 149 brands' DESIGN.md files were surveyed: 87 (58%) declare a monospace stack, 62 (42%) do not — including major brands like bmw / nike / apple / notion / mastercard / meta. Agent paste cannot rely on the brand author having written it down; a defaultable A2 fallback (with CJK brands like kami overriding) is safer than forcing every brand author to add a field they may not realize their kbd / code-block components need. Five guard checks, each registered as its own entry in scripts/guard.ts so failures attribute to a specific contract: 1. token-fixture sync — components.html :root ↔ tokens.css :root byte-equivalent (existing) 2. A1 required tokens — every brand declares every A1 token 3. A2 required tokens — every brand declares every A2 token 4. unknown token allowlist — every declared token is in schema or brand-extension allowlist 5. A2 defaults parity — defaults.css ↔ tokens.schema.ts fallback byte-equivalent Verified on default + kami: - 26 A1 tokens declared in both brands - 26 A2 tokens declared in both brands - 129 total declarations, all match shared schema or brand extensions - defaults.css ↔ tokens.schema.ts parity holds - sanity test: drifting --motion-fast in defaults.css fails check 5 with a clear divergence message The PR description originally listed "Dedicated SCHEMA.md" as explicitly NOT in this PR ("Once 3+ brands ship, extracting a single source of truth becomes worthwhile"). That boundary moves: lefarcen's review surfaced the schema-generalization risk, and the schema must exist as a machine-enforced contract before the derive script can read it. The TS file replaces the markdown that was deferred. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web/tests): pass missing designTemplates prop to ProjectView Pre-existing typecheck regression on main: PR #955 ( |
||
|
|
10802bb0b0
|
test: expand nightly UI and desktop regression coverage (#1256)
* e2e(ui): cover examples preview flows * e2e(ui): cover Codex local CLI fallback UX * test: expand desktop and connector regression coverage * e2e(ui): cover workspace restoration flows * e2e(ui): cover retry recovery workspace flow * test: cover artifact and connector recovery flows * e2e(ui): cover Continue in CLI stale provenance flow * e2e(ui): cover BYOK model fetch caching * test: expand Orbit and desktop connector coverage * e2e(ui): cover workspace quick switcher recovery flows * e2e(ui): cover connector pending authorization recovery * e2e(ui): cover workspace and conversation restoration routes * e2e(ui): cover conversation draft and attachment restoration * e2e(ui): cover conversation history selection recovery * e2e(ui): cover workspace surface conversation selection * test: cover artifact presentation and orbit link behavior * test: cover artifact external link restoration * e2e(ui): cover root-route deep-link restoration * e2e(specs): cover Orbit open-artifact desktop click * e2e(specs): cover desktop artifact open link * test: fix Orbit settings fixture type drift * test: split Playwright critical and extended suites * test: fix ProjectView design template fixtures * ci: split workspace test stages * guard: allow split Playwright suite scripts * test: shrink Playwright critical suite * test: restore omitted Playwright suites |
||
|
|
8c0fb8dc01
|
feat(tools-pr): add maintainer PR-duty workspace (#1259)
* feat(tools-pr): add maintainer PR-duty workspace
Adds `tools/pr` as the maintainer-only control plane for PR-duty work on
this repo. Thin `gh` wrapper that encodes repo-specific knowledge:
review lanes, forbidden surfaces, lane-specific checklists, validation
command derivation from touched packages.
Subcommands:
- `list` — triage open queue by lane and review-state bucket.
- `view <num>` — agent-friendly review brief for a single PR.
- `classify [num]` — emit script-level tags for one PR or the whole
open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/`
with rate-limit telemetry per run.
- `assignment` — assigner-perspective view of PR ownership, idle time,
and blockers (derived from existing tags; no new judgments).
Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase,
forbidden-surface, unlabeled, duplicate-title, non-ascii-slug,
maintainer-edits-disabled, org-member, unresolved-changes-requested,
stale-approval, and three awaiting-* timing tags. Each rule is
expressible as one factual sentence over `gh` data + repo paths — see
`tools/pr/AGENTS.md` for the full dictionary plus precision rules.
Templates in `tools/pr/templates/*.md` are aesthetic references for
recurring maintainer comments (duplicate-title ask, awaiting-author
nudge, agent-review brief shape). `templates/examples/` holds
frozen-in-time agent-review snapshots for three PR shapes.
Infrastructure:
- `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s
backoff) on transient 5xx / network errors. Persistent failures still
surface — retry is anti-jitter, not an exponential-backoff resilience
layer.
- Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines)
use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to
stay under GitHub's GraphQL server-side timeout. Light chunks stay on
`gh pr list --json`.
- `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members
--paginate`.
Wiring:
- Root `package.json` adds `pnpm tools-pr` to the allowed root entry
points.
- `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace
packages.
- `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and
`tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level
layout allowlist.
- Root `AGENTS.md` and `tools/AGENTS.md` document the new command
surface, root-command-boundary update, and per-tool ownership.
* docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md
Adds a `PR-duty tooling` section to the root AGENTS.md summarising what
`pnpm tools-pr` is, listing the four common subcommands (list / view /
classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for
the full tag dictionary, operational playbook, templates, and design
rules. The section keeps root-level guidance to high-level orientation
while details stay local to the tool's own AGENTS.md.
* fix(tools-pr): drop overly broad touches-root-package.json forbidden hit
`deriveForbidden` was flagging any change to root `package.json` as a
forbidden-surface hit, but AGENTS.md §Root command boundary only forbids
specific *lifecycle* aliases (pnpm dev / test / build / daemon / preview
/ start) — tools-control-plane entrypoints like `pnpm tools-pr` are
explicitly allowed. Distinguishing "forbidden alias" from "allowed
entry" requires reading the diff content, which is `pnpm guard`'s job
rather than a path-derived classify tag.
Dogfooded on this branch's own PR (#1259), which added the `pnpm
tools-pr` script and was incorrectly flagged. Removing the hit aligns
the `forbidden-surface` tag with what tools-pr can mechanically detect
from file paths alone (apps/nextjs/, packages/shared/).
* fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator
Three review follow-ups on #1259, all factual fixes:
- `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a
one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps
connection page size at 100, so the previous implementation would
fail at runtime when callers passed `--limit > 100`. The paginated
path makes the commits fetch consistent with the other heavy chunks
(reviews, comments, assignment timelines) and removes the artificial
ceiling entirely. The `limit` parameter is dropped from
`fetchOpenPrCommits`; the CLI `--limit` continues to bound the
`gh pr list --json` chunks.
- `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision`
and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge
state `CLEAN` or `UNSTABLE` and carries no blockers, status renders
as `ready to merge` instead of falling through to `in review`. The
assignment view loses its main triage signal without this — a clean
human-approved PR rendered identical to a REVIEW_REQUIRED one.
- `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both
constructed the title-index key with a literal NUL byte between
author and title, which made the file appear as binary in `git diff`
/ review tooling. Replaced the literal byte with a Unicode escape
sequence in source; the runtime string value is identical, the
source stays plain text and round-trips through review tooling
cleanly.
* fix(tools-pr): raise default --limit to 1000 to cover the live open queue
mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`)
defaults to `--limit 100`, which silently drops every PR past the first
100 in the open queue. The repo currently sits at 104 open PRs, so the
out-of-the-box run was already omitting four PRs.
Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`,
and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates
internally, so a high cap is cheap. Users can still pass `--limit <small>`
for a truncated preview. CLI help text on the three subcommands updated to
match.
* fix(web): pass designTemplates to ProjectView render helper
#955 made `designTemplates` a required Prop on ProjectView, but the test
helper added in #1244 (`renderProjectView` in
`ProjectView.api-empty-response.test.tsx`) was never updated. The two
PRs landed on main without conflicting, leaving `apps/web` typecheck red
for every PR that rebases past
|
||
|
|
b5eb8c1647
|
feat: generic skills + split skills/design-templates + finalize-design API (#955)
* feat: general-purpose skills with @-mention composition and user import
Lift skills from "one mode-bound skill per project" to a generic capability
the user can compose per turn:
- Daemon: scan multiple skill roots (user-skills under runtime data, then
the bundled `skills/`); user-imported skills can shadow built-ins by id.
- New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints,
with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete
protection.
- ChatRequest gains `skillIds: string[]`; the chat run concatenates each
picked skill's body (and merges craftRequires) into the system prompt
for that turn only — the project's persistent `skillId` is untouched.
- Web composer: `@` popover now lists skills alongside project files;
picks render as removable chips above the textarea and ride along with
the request as `skillIds`.
- Settings → Library: import form (name/description/triggers/body),
per-card delete for user skills, "user" origin badge.
* chore(web): drop welcome pet teaser + add ds→prompt-template mapping util
- SettingsDialog: remove the inline pet adoption teaser from the welcome
panel so the first-run modal stays focused on configuration.
- New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design
system's authored metadata to prompt-template gallery categories.
Imported by the design-system gallery wiring on a sibling branch; no
callers in this branch yet.
* feat: split skills/design-templates and add finalize-design API
Phase 0 of the skills/design-templates refactor (specs/current/
skills-and-design-templates.md):
- Move ~104 rendering catalogue entries from skills/ to design-templates/
and keep skills/ for the small set of functional skills that *do work*
on user input (utilities, briefs, packagers).
- Add design-templates/AGENTS.md and skills/AGENTS.md describing the
contract, and a brand-agnostic craft/ surface for opt-in craft rules.
- Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and
an /api/design-templates surface mirroring /api/skills. Asset/example
routes still span both registries so existing srcdoc URLs keep
resolving across the rename.
- Web: split LibrarySection into SkillsSection + DesignSystemsSection,
rename the EntryView "Examples" tab to "Templates", and update locales
+ the New-project picker accordingly.
Adds the finalize-design endpoint:
- New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/
finalize.ts — one-shot synthesis of a project's transcript + active
design system + current artifact into <projectDir>/DESIGN.md via the
Anthropic Messages API. Per-project .finalize.lock mirrors the
transcript-export hygiene from PR #493; provider credentials are not
persisted by the daemon.
Other supporting changes:
- README + AGENTS.md updates to document the new directory split and
craft/ surface, plus i18n strings across 13 locales.
- Test refactors and new coverage (finalize-design, runs, sidecar
server, plus refreshed daemon integration tests).
- .gitignore: scope the *.exe ignore to /OpenDesign.exe so legitimate
vendor binaries are no longer hidden.
* fix(merge): move clinical-case-report to design-templates/
Origin/main added the clinical-case-report skill under skills/ before
the skills/design-templates split landed. Its od.mode is prototype, so
per specs/current/skills-and-design-templates.md it is a design template
and belongs alongside the other rendering catalogue entries — not under
the slimmed-down functional skills/ root. Moving it keeps the EntryView
Templates tab consistent with origin/main's intent.
* feat(skills): curated design/creative catalogue + collapsible Settings rows
Seed ~100 curated design/creative skill stubs under skills/ sourced from
awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent).
Each stub carries an od.category tag so the new filter pill row in
Settings -> Skills can group them. The seed script
(scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills)
is idempotent: it only creates folders that don't already exist, so
hand-edited stubs are never overwritten.
- Daemon: parse and surface od.category on SkillInfo with a strict slug
normaliser; mirror the field on SkillSummary in @open-design/contracts.
Category is purely a UI hint — system-prompt composition is unchanged.
- Web: rewrite SkillsSection from a left-list / right-detail grid into a
vertical stack of collapsible rows mirroring the External MCP panel
(header always visible with name + mode/source/category pills + per-row
enable toggle; SKILL.md preview, file tree and inline edit form expand
on demand). Add a Category filter row above the list. Reorder Settings
nav so Skills + External MCP sit above the Composio/MCP cluster. Update
composer placeholder/hint across 17 locales to advertise '@ files or
skills · / for commands'.
- Docs: extend skills/AGENTS.md with the curated catalogue rules
(idempotency, category vocabulary, no upstream vendoring).
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(skills): teach localized-content + system-prompt tests about the skills/design-templates split
mrcfps blocking review on PR #955: the skills/design-templates split
(
|
||
|
|
976edaf38e
|
test: harden e2e smoke and release reports (#1140)
* test: harden e2e inspect specs * test: wire e2e release reports * chore: bump packaged beta base to 0.6.1 * test: run release smoke vitest directly * test: add suite-owned tools-dev lifecycle * ci: harden stable release packaging * fix(release,e2e): gate stable signing on verify and harden suite cleanup - restore `needs: [metadata, verify]` on the stable release `build_mac`, `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple signing/notarization and Windows release builds cannot run before pnpm guard, typecheck, and layout checks complete on the metadata commit. - in `runToolsDevSuite`, drop the `started` flag and always attempt `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and when the test body succeeded, escalate the stop failure to the suite result and rethrow — so orphan daemon/web processes from an interrupted `startToolsDevWeb` or a broken shutdown can no longer pass silently. Addresses PR #1140 review feedback from lefarcen and mrcfps. |
||
|
|
bf30b308e3
|
feat(deploy): docker-compose + Helm chart entry slice (Phase 5)
Plan J4 / spec §15.4 / §15.5 / §16 Phase 5.
Three landings:
- deploy/Dockerfile now COPYs plugins/_official/ into the image so
the bundled atom plugins from §3.I3 register on container boot —
without this, registerBundledPlugins() silently no-ops inside the
container and the §23 self-bootstrap promise breaks for hosted
deployments.
- tools/pack/docker-compose.yml ships the canonical hosted-mode
manifest spec §15.4 calls out: two-volume layout (od-data +
od-config) per §15.2, OD_BIND_HOST=0.0.0.0 + OD_API_TOKEN +
OD_NAMESPACE + snapshot retention knobs as env, /api/daemon/status
as the healthcheck endpoint (Phase 1.5). Drop-in usable with
`docker compose -f tools/pack/docker-compose.yml up -d`.
- tools/pack/helm/open-design/{Chart.yaml,values.yaml,README.md}
pins the Helm chart parameter surface for the per-cloud overrides
spec §15.5 enumerates (AWS / GCP / Azure / Aliyun / Tencent /
Huawei / self-hosted). Templates land in the Phase 5 follow-up
PR; the values schema is locked here so the per-cloud override
files (values-<cloud>.yaml) review in isolation.
scripts/guard.ts allowlist gains
`packages/agui-adapter/esbuild.config.mjs` so the new package
passes the residual-JS guard.
Daemon tests stay at 1486/1486 (deploy artifacts only).
Co-authored-by: Tom Huang <1043269994@qq.com>
|
||
|
|
4c7cd5d9f2 |
feat(plugins): introduce plugin system with installation and management capabilities
- Added support for a new plugin system, allowing users to install, uninstall, and manage plugins through the daemon. - Implemented API endpoints for listing installed plugins, retrieving plugin details, and applying plugins with input validation. - Introduced a plugin doctor feature to validate plugin manifests and check for issues before application. - Established a plugin persistence layer with SQLite migrations for managing installed plugins and their metadata. - Enhanced the CLI with commands for plugin operations, improving user interaction with the plugin ecosystem. |
||
|
|
2df8b775ec
|
feat(skills): add 32 zhangzara HTML deck templates (#704)
* feat(skills): add 32 zhangzara HTML deck templates Vendored from upstream MIT-licensed zarazhangrui/beautiful-html-templates — one Open Design skill per template (name prefix `html-ppt-zhangzara-`) so each template surfaces as its own entry in the Examples panel and renders its own preview. Each skill ships: - SKILL.md (frontmatter + workflow), description, triggers, and od.upstream pointing at the source folder - example.html (the self-contained deck; daemon's preview route looks for <skillDir>/example.html) - template.json (upstream metadata snapshot, with `slug` re-prefixed to `zhangzara-<base>` and a `source` URL) - assets/deck-stage.js / assets/styles.css for the 8 templates that ship a runtime; HTML refs rewritten so the daemon's iframe URL rewriter resolves them through /api/skills/<id>/assets/ scripts/guard.ts allowlist updated with the `html-ppt-zhangzara-` prefix so the vendored upstream JS runtimes pass the residual-JS check. * fix(skills, i18n): address PR #704 review feedback - Add the 32 new html-ppt-zhangzara-* skill ids to the de/ru/fr SKILL_IDS_WITH_EN_FALLBACK arrays so the localized-content coverage e2e test passes. The vendored upstream templates are English-only; falling back to the upstream English description is the right semantic for this batch. - Also add the pre-existing social-media-dashboard skill and totality-festival design system to the same fallback arrays (introduced in #678 without i18n coverage). Tagged with TODOs so localized copy can land in a follow-up. - Ship the upstream MIT LICENSE file in each skills/html-ppt-zhangzara-*/ folder so the copyright/permission notice travels with the vendored copy, as MIT requires for redistributing substantial portions. Update each SKILL.md's Source section to reference the bundled LICENSE. - For the 8 runtime-backed templates (creative-mode, editorial-tri-tone, neo-grid-bold, peoples-platform, pin-and-paper, pink-script, soft-editorial, stencil-tablet), expand the workflow's clone step to instruct the agent to copy the assets/ folder alongside example.html — the skill HTML references assets/deck-stage.js (and assets/styles.css for pin-and-paper) as project-local paths, so cloning the HTML alone produces an artifact whose runtime 404s. Verified locally: - pnpm guard passes. - pnpm --filter @open-design/web typecheck passes. - pnpm --filter @open-design/web test passes (309/309). - pnpm --filter @open-design/e2e test passes (6/6 active, including localized-content coverage for de/ru/fr). * fix(i18n): drop duplicate totality-festival fallback after merge with main Main already added 'totality-festival' to the design-system EN-fallback lists; the TODO entry from this branch became a duplicate after merge. * fix(skills, guard): address PR #704 follow-up review - Pin Chart.js CDN to 4.4.7 in coral and cartesian example.html so vendored decks no longer track the latest jsDelivr major. - Narrow scripts/guard.ts zhangzara allowlist to a regex that only permits skills/html-ppt-zhangzara-*/assets/deck-stage.js, restoring the TypeScript-first guard for any other JS under those skill dirs. - Reconcile slide_count and 'Slides in demo' with actual <section class="slide"> counts: broadside 20 -> 16, monochrome 18 -> 16, neo-grid-bold 13 -> 12. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): keep resolveDataDir return path stable, canonicalize at compare site The realpathSync wrapper inside resolveDataDir was rewriting every /var/... result to /private/var/... on macOS, which broke 11 hermetic assertions in tests/resolve-data-dir.test.ts (absolute paths, relative paths, and \$HOME / \${HOME} / ~ expansions whose mkdtempSync roots live under /var/folders/...). It also changed the public OD_DATA_DIR resolution contract for any downstream caller that compared against the expanded user-supplied path. Restore resolveDataDir to return the expanded resolved path unchanged, and introduce RUNTIME_DATA_DIR_CANONICAL — a one-shot realpath of RUNTIME_DATA_DIR — used only at the narrow folder-import comparison site that needs to match against a user-supplied realpath() result. The import-path symlink protection from #624 still works (a /var-rooted data dir now compares against its /private/var canonical form), while resolveDataDir keeps its stable, user-shaped contract. Verified locally: pnpm --filter @open-design/daemon test (1083/1083), including all 12 resolve-data-dir.test.ts cases. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
56bf6ee1b6
|
feat: agent-callable research command and /search (#615)
* feat: pre-generation research (Tavily) for grounded generation
Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.
User flow:
1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
2. Click the new Research button in the chat composer.
3. On send, the daemon runs a Tavily search, prepends the findings
as a <research_context> block ahead of the system prompt, and
spawns the agent. Research progress shows up as status pills in
the chat stream; the agent cites sources inline as [1]/[2]/...
Phase 1 surface:
- Single provider (Tavily), single depth ('shallow'), no LLM
synthesis pass (Tavily's `answer` is the summary).
- Composer toggle only; no popover / depth picker yet.
- Reuses the existing `status` SSE agent payload + StatusPill UI
so no new event variants or renderer code are needed.
Layers touched:
- contracts: ResearchOptions / Source / Findings DTOs;
ChatRequest.research; export from index.
- daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
+ provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
in startChatRun before prompt assembly.
- web: ChatComposer toggle + ChatSendMeta; threaded through
ChatPane / ProjectView / streamViaDaemon into ChatRequest.
Side fix (required to land the feature, but useful on its own):
contracts internal relative imports lacked the `.js` suffix that
NodeNext module resolution requires. This was already breaking
`pnpm --filter @open-design/daemon typecheck` on main; without the
fix, none of the new research types were visible to the daemon.
All internal contracts imports now carry `.js`.
Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).
Verified:
- pnpm --filter @open-design/contracts typecheck/test
- pnpm --filter @open-design/daemon typecheck (the chokidar
project-watchers test is a pre-existing flake, unrelated)
- pnpm --filter @open-design/web typecheck
- node scripts/verify-media-models.mjs
* fix(daemon): clamp Tavily max_results to 20
Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.
Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)
* Remove stale research merge leftovers
* Add agent-callable research search
* Fix Indonesian locale typecheck
* Fix research command invocation edge cases
* Harden slash search prompt expansion
* Honor research source caps in command contract
* Require search reports in design files
* Add research data provider settings
* Wire web research provider fallback order
* Update research provider fallback wording
* Revert "Update research provider fallback wording"
This reverts commit
|
||
|
|
cb92c93ae0
|
Migrate beta release publishing to R2 (#805)
* Prebundle standalone web packaged runtime * Harden mac standalone prebundle policy * Prebundle mac daemon packaged runtime * Prune mac Electron locales * Maximize mac release artifact compression * Publish beta mac artifacts to R2 * Use remote R2 uploads for beta releases * Fail fast on beta R2 access issues * Use S3-compatible uploads for beta R2 releases * Decouple beta versioning from GitHub releases * Remove legacy beta metadata source * Address release beta review notes |
||
|
|
6efac8887e
|
Improve Windows beta packaging and installer flow (#768)
* Optimize Windows packaged web output * Fix packaged contracts runtime build * Optimize Windows packaged size pruning * Prune Windows root Next payload * Remove Windows bundled Node runtime * Prune Windows standalone duplicate Next * Add tools-pack cache foundation * Cache Windows packaged build layers * Cache Windows workspace builds * Cache Electron-ready Windows app * Split Windows tools-pack module * Cache Windows dir build outputs * Split Windows pack build modules * Document Windows NSIS smoke namespace limits * Move Windows NSIS smoke note to agents guide * Optimize Windows beta packaging * Bump packaged beta base version * Improve Windows installer namespace UX * Improve Windows tools-pack cache keys * Stabilize Windows beta cache version keys * Cache Windows workspace build outputs * Optimize windows release beta cache layers * Cache windows release dependencies * Trim windows release cache before save * Refresh windows tools-pack cache key * Improve windows installer preflight prompts * Fallback NSIS installer strings to English * Fix Windows installer cleanup and preflight * Improve Windows NSIS state logging * Fix system NSIS Persian language alias * Use long-path removal for Windows uninstall * Fix mac tools-pack tests on Windows * Address Windows packaging review feedback * Fix Windows installer cache namespace isolation * Include web output mode in Windows tarball cache key * Use unique Windows release cache save keys |
||
|
|
f1cdb2844a
|
test(e2e): gate beta packaged runtime (#637)
* test(e2e): gate beta mac packaged runtime * test(e2e): separate ui automation layout * test(e2e): move localized content coverage * chore(release): prepare packaged 0.4.1 beta validation * test(e2e): keep ui lane playwright-only * fix(web): keep chat recoverable after conversation load failure * fix(desktop): honor native mac quit |
||
|
|
14a73d948b
|
Fix packaged contracts runtime exports (#577)
* fix packaged contracts runtime exports * fix(packaging): prepare contracts runtime exports on install |
||
|
|
bbdd4e84b5
|
chore: enforce test directory conventions (#496)
* chore: enforce test directory conventions Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only. * ci: use guard and package-scoped tests Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases. * ci: align stable release guard check Use the new repository guard in stable release verification after replacing the residual-JS-only script. * chore: tighten test layout enforcement Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests. * chore: clarify no-emit test tsconfigs Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs. |