Commit graph

561 commits

Author SHA1 Message Date
PerishFire
c3d41c7d45
fix(tools-pr): chunk stats fetch through cursor-paginated GraphQL (#1285)
`fetchOpenPrs` was reading the stats chunk via
`gh pr list --limit 1000 --json mergeStateStatus,...`. With the default
limit raised to 1000 in #1259, this 502s reliably on the live open
queue (107 PRs): GitHub's GraphQL gateway has to recompute
mergeStateStatus for every PR up front, and the resulting query exceeds
the gateway budget once the requested page passes ~60 PRs.

Switch the stats chunk to `fetchPaginatedPrList`, the same cursor-
paginated GraphQL helper that already drives reviews / comments /
commits / assignment-timelines. Page size stays at PR_LIST_PAGE_SIZE
(30), well within the gateway budget, and the heavy stats fetch is now
consistent with the other heavy chunks.

Verified locally: `pnpm tools-pr list` now completes against the live
107-PR queue without a 502.
2026-05-11 20:51:29 +08:00
nettee
be77dc0394
Default English resource i18n fallback (#1270) 2026-05-11 20:29:05 +08:00
Joey-nexu
12ac2e988e
docs: add Maintainer rules (MAINTAINERS.md + CONTRIBUTING entry-point) (#1290)
Adds a public set of rules for the External Maintainer role: who qualifies,
how nominations work, what permissions Maintainers gain, what's expected
of them, and how step-down works.

The Core Team's individual roster is intentionally not enumerated. What's
public is the rules everyone plays by.

- New file: MAINTAINERS.md (English authoritative version) + 5 locale variants
  matching the existing CONTRIBUTING.md i18n surface (de, fr, ja-JP, pt-BR,
  zh-CN). Non-EN/non-zh-CN variants are machine-translated drafts marked at
  the top — native-speaker review is welcome via follow-up PRs.
- CONTRIBUTING.md (and its 5 locale variants): adds a short
  Becoming a Maintainer section that points at MAINTAINERS.md, so the
  rules live in one place and translation drift is bounded.

Decisions not in this PR (intentional):
- No internal Core Team roster.
- No internal observability dashboards.
- No nomination PR / public voting flow (Core-Team-consensus-driven for now;
  to be revisited once External Maintainers exceed 5).

Co-authored-by: Joey Li <lijinwei@open-design.ai>
2026-05-11 20:19:55 +08:00
Caprika
fb079d8115
Add reliable agent-browser skill (#1284)
* Add reliable agent browser skill

* Fix ProjectView delete conversation test props
2026-05-11 20:09:12 +08:00
PerishFire
1eb20e3807
fix(web): keep tweaks selection usable without annotations (#1268) 2026-05-11 20:06:49 +08:00
Sebastian Westberg
8962088c75
feat(daemon): guard against agent-emitted stub artifact regressions (#1171)
* feat(daemon): guard against agent-emitted stub artifact regressions

When an agent emits an <artifact> block whose body is a placeholder
("see other-file.html in this project", a bare filename string, a tiny
fallback page) instead of the full document, the daemon writes the
placeholder to disk verbatim. Users see a 25-500 byte HTML file where
their previous version had tens of kilobytes of real markup.

Add a structural regression guard in writeProjectFile: before writing
an html/deck artifact whose manifest carries metadata.identifier, scan
the project dir for prior siblings matching <identifier>(-\d+)?\.html?
and compare sizes. If the new body is below minRetainedRatio (default
0.2) of the largest prior sibling >= minPriorBytes (default 4096),
flag a regression. Three modes via env:

- OD_ARTIFACT_STUB_GUARD=warn (default) writes the file and attaches
  stubGuardWarning to the response so the frontend can surface it.
- OD_ARTIFACT_STUB_GUARD=reject throws ArtifactRegressionError before
  fs.writeFile; the route returns 422 ARTIFACT_REGRESSION with the
  prior sibling's name and size in error.details.
- OD_ARTIFACT_STUB_GUARD=off skips the guard entirely.

Cross-agent by design: anchored on size delta + identifier match,
no agent-specific stub-phrase regex, so works for any agent backend
behind the agent-adapter abstraction.

The body-then-manifest write order pre-dates this change; the reject
path throws before fs.writeFile so rejections never leave a partial
state behind.

24 unit + 8 HTTP tests cover happy paths, all three modes, deck kind,
.htm extension sibling detection, ratio=1 edge case, and verify
rejected writes leave neither the html nor its manifest sidecar on
disk.

* fix(stub-guard): close same-name, nested-dir, and non-slug bypasses

Code review on PR #1171 (lefarcen, Codex, mrcfps) found three holes
where the stub guard could be silently bypassed. All three are now
closed with HTTP test coverage.

Same-name overwrite (lefarcen P1): the writer's prior-sibling scan
deliberately skipped the file at safeName, but for an in-session
overwrite (persistArtifact reuses the same fileName when
savedArtifactRef.current matches) that file is the prior content,
not the new entry. Drop the exclude-by-name filter; the current
on-disk size at scan time is always the prior because the overwrite
happens after this check.

Subdirectory scoping (Codex/mrcfps P2): writeProjectFile creates
parent directories for nested paths like reports/overview.html, but
the guard only scanned the project root. Pass path.dirname(target)
as scanDir so nested artifacts are evaluated against their real
sibling set.

Non-slug identifier (Codex/lefarcen/mrcfps P2): the web's
persistArtifact slugifies the filename basename but stores the raw
identifier in the manifest, so an identifier like "Landing Page"
yields filename landing-page.html with metadata.identifier="Landing
Page". Build the sibling regex from both the raw identifier and a
slugified variant (mirroring the frontend's slugifier) so either
form matches the same priors.

Also surface warn-mode warnings in the web UI: ProjectView now
checks file.stubGuardWarning after writeProjectTextFile and renders
the warning via setError. Reject-mode 422 surfacing requires
restructuring writeProjectTextFile's return contract and is
deferred.

API change inside the daemon: evaluateArtifactStubGuard /
findPriorArtifactSiblings drop excludeSafeName and rename projectDir
to scanDir. Tests updated.

Tests: 4 new HTTP cases (same-name overwrite preserves prior body,
nested subdir rejects, slug-form match rejects, plus the existing
warn/off/deck/.htm cases) and 1 new unit case (slug-form sibling
match). 44 tests pass.

* fix(stub-guard): empty-slug fallback + reject-mode UI surface

Round 3 review on PR #1171 (lefarcen, mrcfps) found two remaining
holes after 9cc82430 closed the same-name / subdir / non-slug bypasses.

Empty-slug fallback bypass (lefarcen P2): an identifier like "测试"
(all-non-ASCII) strips to empty through the web slugifier, and
persistArtifact's `slice(0,60) || 'artifact'` falls back to the
literal "artifact" basename. The guard searched for raw identifier +
slug only, so a later artifact-2.html stub bypassed the prior. Add
EMPTY_SLUG_FALLBACK_NAME = 'artifact' as a sibling-name candidate
when the slug is empty, mirroring the frontend fallback exactly.

Reject-mode UI silence (mrcfps P2 + lefarcen P2): writeProjectTextFile
collapses any non-OK response (including 422 ARTIFACT_REGRESSION) to
null, and persistArtifact previously had no else branch. Users in
reject mode saw the daemon log fire but the UI was silent. Add an
else branch that surfaces a generic banner pointing at the most
likely cause and mentions checking the daemon logs for structured
details. Also clear savedArtifactRef.current on failure so retries
re-enter the persistence path.

Plumbing the structured 422 details through writeProjectTextFile
itself remains out of scope (cross-cutting client contract change
affecting 5+ call sites). The generic banner is the "at minimum"
path mrcfps suggested.

Tests: 1 new unit case (artifact.html sibling discovery for non-ASCII
identifier) + 1 new HTTP case (empty-slug stub regression rejected
end-to-end). 46 tests pass across stub-guard suites (was 44).

* fix(stub-guard): verify sidecar identity to avoid cross-identifier false positives

Round 4 review on PR #1171 (mrcfps inline + lefarcen review) caught
a false-positive introduced by the round-3 empty-slug fallback. Two
distinct identifiers that both slugify to empty (e.g. "测试" and
"首页") share the artifact*.html basename, so a brand-new save under
the second identifier was being compared against — and falsely
rejected because of — the unrelated first.

The same shape exists symmetrically: a non-empty-slug identifier
literally named "artifact" would falsely match empty-slug fallback
files written under any other identifier.

Fix: filename pattern matching is now a candidate generator, not
the source of truth. For every candidate sibling, read its
.artifact.json sidecar and verify metadata.identifier matches the
input via artifactIdentifiersMatch (raw equality OR shared non-empty
slug). Files without a sidecar are skipped — they weren't written
through the artifact-tag path this guard targets, and treating them
as priors was always a stretch.

Empty-slug equivalence is intentionally NOT honored: 测试 != 首页
even though both slugify to empty. The whole bug was conflating
distinct identifiers via the fallback name; slug-equivalence kicks
in only for non-empty slugs (Landing Page <-> landing-page).

Tests: unit fixtures now write file+sidecar pairs (mirrors prod);
new artifactIdentifiersMatch suite covers the 5 equivalence cases;
new HTTP test does NOT cross-reject distinct empty-slug identifiers
asserts the second save returns 200 instead of 422; new unit test
skips files without a sidecar.

42 tests pass across stub-guard suites.

* fix(stub-guard): require canonical-form anchor in identifier match to avoid 60-char truncation collisions

Round 5 review on PR #1171 (mrcfps) caught another false-positive in
artifactIdentifiersMatch: slugifyArtifactIdentifier truncates at 60
chars, so two distinct >60-char identifiers that share their first
60 chars (e.g. "A...A1" and "A...A2", 70 chars each) slugify to the
same string and would falsely bridge. Same shape as the empty-slug
fallback bug from round 4, just at the other end of the input range.

Tighten the rule: slug-equivalence requires at least one input to BE
its own canonical slug form. That keeps the legitimate bridge
("Landing Page" <-> "landing-page" — second input IS the slug) but
rejects truncation collisions ("A...A1" <-> "A...A2" — neither is in
canonical form).

Side effect: two non-canonical forms that slugify to the same value
no longer bridge (e.g. "Landing Page" vs "LANDING-PAGE"). This is
correct: without one canonical anchor we can't safely call them the
same lineage. Updated the slug-equivalence test to assert the new
semantics explicitly with both directions and a negative case.

Tests: 2 new cases (no bridge for >60-char truncation collision; raw
70-char to its 60-char truncated slug still bridges) + 1 negative
test for the non-canonical-pair case. 45 tests pass.

* fix(stub-guard): cover legacy sidecar-less HTML priors

Round 6 review on PR #1171 (mrcfps, non-blocking) caught a real
legacy bypass: round 4's sidecar-required policy skipped any HTML
file without an .artifact.json companion, but readManifestForPath
(projects.ts) treats those same files as legitimate artifacts via
inferLegacyManifest. So a project with an older sidecar-less
dashboard.html (pre-sidecar era, Write-tool-emitted, paste-text,
manual import, etc.) let its first stub rewrite through as a
supposed "first emission".

Fix: when the sidecar is missing, derive a synthetic identifier
from the filename (strip the (-N)?\.html? suffix) and run it
through the same artifactIdentifiersMatch rules. Synthetic
identifiers come from already-slugified filenames, so they bridge
raw inputs only via the canonical-form rule established in round
5 — no truncation collisions, no empty-slug conflation, no
unrelated cross-identifier matches.

Tests: 3 new unit cases (legacy fallback finds the prior; bridges
raw->slug under the same rules; does NOT bridge unrelated slug
forms via inference) + 1 new HTTP test that seeds a sidecar-less
prior via the artifact-manifest-less write path and asserts the
stub rewrite is rejected with 422 ARTIFACT_REGRESSION.

48 tests pass across stub-guard suites (was 45).

* fix(stub-guard): try both interpretations for legacy filename inference

Round 7 review on PR #1171 (mrcfps, non-blocking) caught a real
ambiguity in the round-6 legacy fallback: a filename like
`phase-2.html` is genuinely ambiguous without a sidecar. It could
be the identifier "phase" with a -2 collision suffix, OR the
standalone identifier "phase-2". The round-6 helper only stripped
the suffix, so a sidecar-less `phase-2.html` followed by a stub
emission with metadata.identifier="phase-2" bypassed the guard
("phase-2" doesn't match the inferred "phase").

Fix: when the sidecar is missing, generate both candidate
identifiers (full basename and suffix-stripped basename) and
accept the file as a prior if either matches. Visible false
positives are preferable to silent false negatives — and the
canonical-form anchor in artifactIdentifiersMatch still rules out
truncation collisions and empty-slug conflations regardless of
which candidate matched.

Tests: 2 new unit cases (full-basename interpretation finds
"phase-2"; suffix-stripped interpretation also finds "phase") and
1 new HTTP test that seeds a sidecar-less `phase-2.html` and
asserts the stub rewrite is rejected with 422 ARTIFACT_REGRESSION.

51 tests pass across stub-guard suites (was 48).

---------

Co-authored-by: Sebastian Westberg <sebastianwestberg@users.noreply.github.com>
2026-05-11 19:59:37 +08:00
初晨
0f0d214298
fix(web): render static previews for sketch json files (#1060)
* fix(web): render static previews for sketch json files

* fix(web): tolerate malformed sketch text items

* fix(web): harden sketch preview parsing

* fix(web): preserve sketch items on round-trip

* fix(web): clear sketch files destructively

* fix(web): unblock unsupported sketch saves
2026-05-11 19:29:46 +08:00
Dongsen
fd67b680d7
fix(contracts): pin API-mode override above discovery layer (#313) (#1207)
* fix(contracts): pin API-mode override above discovery layer (#313)

The old streamFormat='plain' rule was appended at the BOTTOM of the
composed prompt, but DISCOVERY_AND_PHILOSOPHY is pinned at the TOP with
its own 'these override anything later' header — so its hard rules
('TodoWrite on turn 3', 'brand-spec extraction via Bash + Read +
WebFetch') still won precedence in API mode. With no real tools wired
through to the Anthropic Messages path, the agent narrated pseudo-tool
markup (<todo-list>...</todo-list>, [读取 X]) instead of emitting
structured tool_use events the UI could render.

Move the API-mode override to the absolute top of the prompt so it
beats the discovery layer, name every unavailable tool, and explicitly
forbid the pseudo-tool / fake-protocol markup observed in #313.
<artifact> output and <question-form> discovery are still allowed —
both are markup the UI parses, not tool calls.

* fix(daemon): mirror API-mode override above discovery layer (#313)

Address Codex + mrcfps review on #1207: the daemon has its own copy of
composeSystemPrompt that is hit by any adapter declaring streamFormat:
'plain' (e.g. DeepSeek) via server.ts:6190. That copy still appended
the obsolete bottom '## API mode rule', which loses the precedence war
against DISCOVERY_AND_PHILOSOPHY's 'these override anything later'
header — so plain-stream daemon agents could still narrate
<todo-list> / [读取 X] pseudo-tool markup.

Mirror the same top-anchored API_MODE_OVERRIDE here (byte-identical to
the contracts copy) so both code paths produce the same behaviour.
Adds 8 daemon-side tests including the indexOf-based positional
assertion that pins the override above the discovery layer header.
2026-05-11 19:29:34 +08:00
Dongsen
12ce5ad38b
fix(web): ignore <artifact> tags inside markdown code spans and fences (#1132)
* test(web): add failing parser cases for <artifact> recitation in markdown code

Cover the three real-world prose contexts where the model legitimately
quotes the artifact tag without intending to emit one:

- inside an inline backtick span
- inside a fenced code block
- spread across streaming chunks crossing the fence boundary

Establishes the RED baseline before parser code-fence awareness lands.

* fix(web): ignore <artifact> tags inside markdown code spans and fences

The streaming artifact parser scanned the buffer with a raw indexOf,
guarded only by 'next char must be whitespace'. That meant any literal
<artifact ...> the model recited while documenting the protocol — even
inside backticks or a ```html fence — flipped the parser into artifact
mode, swallowed the rest of the reply from the chat UI, and (when a
matching </artifact> appeared in the recitation) silently wrote a
spurious file to disk via persistArtifact.

Replace findOpenTag with a linear scan that tracks fenced code blocks
(```) and inline code spans (`), skipping any <artifact prefix found
inside either. If the buffer ends mid-fence, return a partial match
anchored at the fence start so the next streaming chunk can resolve
the boundary without losing fence context.

Closes #1130.

* fix(web): match renderer fence/inline-code rules in artifact parser

Codex review on PR #1132 caught that the previous fix toggled inFence on
any triple-backtick run anywhere in the buffer, including mid-line, while
the chat renderer (apps/web/src/runtime/markdown.tsx) only treats ``` as
a fence when it occupies a whole line matching /^[ ]{0,3}```(\w[\w+-]*)?\s*$/.
That asymmetry would suppress a real <artifact> tag emitted after a prose
sentence like "the opening marker is ```html and the response then writes:".

Rework findOpenTag in three passes that mirror the renderer:

  1. Walk \n-terminated lines; only a line that matches FENCE_LINE_RE
     toggles fence state. Open fences without a close (or with an
     unterminated tail line) return partial so the next chunk can resolve.
  2. Collect inline code spans with /`[^`]+`/g — the same regex used by
     renderInline — so what the parser skips matches what the user sees as
     code. Unmatched trailing backticks after the last \n hold back.
  3. Find the first <artifact …> outside any skip range; preserve the
     existing partial-prefix tail handling.

Adds a regression test covering the exact case Codex reported.

* test(web): pin parser behavior on double-backtick and in-fence string literal recitation

Two cases raised in PR #1132 review:

- a real artifact tag wrapped in '``<artifact …>``' (double-backtick
  inline code span) should not be treated as a real artifact
- a fenced JS example whose body contains a string literal like
  'const fence = "```";' should not pop fence state early and let a
  later literal <artifact> be parsed as real

Both already pass on 96e88ca because the line-anchored fence regex and
the renderer-aligned inline regex handle them correctly. Pinning the
behavior so future regressions surface as test failures.

* fix(web): make stripArtifact markdown-aware to stop truncating literal recitations

The streaming artifact parser was hardened in 96e88ca to skip <artifact>
recitations inside backticks and fences, but the post-stream stripper at
AssistantMessage.tsx still ran a naive 'content.indexOf("<artifact")' over
the same text events. As reported by lefarcen on PR #1132, that meant
chat replies with literal protocol recitations could still get silently
truncated mid-explanation — even though the parser preserved them in the
text stream and the file panel was no longer polluted with ghost files.

Extract the renderer-aligned classification (FENCE_LINE_RE, INLINE_CODE_RE,
computeSkipRanges, rangeContains) into a single source of truth at
apps/web/src/artifacts/markdown-context.ts so the parser and the stripper
agree on what counts as code. Add apps/web/src/artifacts/strip.ts with a
markdown-aware stripArtifact that:

- ignores any <artifact open inside a fenced block or inline code span
- looks for </artifact> with the same skip-range filter, so a real open
  paired with a literal close inside backticks does not strip a literal
  body that is meant to render
- returns content unchanged when an open exists with no matching real
  close (the previous implementation sliced to end-of-string, which would
  nuke trailing prose on a malformed or still-streaming tag)

Refactor parser.ts to import the shared helpers; behavior preserved (all
seven existing parser tests still pass). New strip.test.ts covers six
cases including the empirically-verified inline-backtick regression.

* fix(web): align artifact stripper/parser fence rules with renderer exactly

Two gaps surfaced in review at a0bf05f:

- markdown-context.ts used a single FENCE_LINE_RE that allowed 0-3 leading
  spaces and reused the same pattern for opening and closing fences. The
  chat renderer (runtime/markdown.tsx:44 and :49) is asymmetric — opens
  with /^```(\w[\w+-]*)?\s*$/, closes with /^```\s*$/, and rejects any
  leading indentation on either side. Indented "   ```html" was being
  treated as a code fence even though the renderer keeps it as a paragraph,
  and a literal "```html" line inside an open fenced example was closing
  the skip range early — both could expose a real or literal <artifact …>
  to the wrong handler.
- stripArtifact discarded computeSkipRanges' unclosedFenceStart, so a
  fenced literal that ends at EOF without a trailing newline (very common
  for chat output) leaked the inner <artifact …> recitation to the
  stripper, reproducing the original #1130 truncation symptom on a
  narrower input shape.

Split FENCE_LINE_RE into FENCE_OPEN_RE / FENCE_CLOSE_RE with no leading
indentation, gate the fence state machine on the right side of the toggle,
and have stripArtifact extend skip ranges to end-of-content when a fence
is left open. Also tightened the parser's tail-line hold-back regex to
match the renderer's no-leading-space rule. Added regression tests for the
EOF-unclosed-fence case, the indented pseudo-fence (renderer treats as
paragraph, stripper must strip the real artifact), and a "```html" line
inside an open fence.

Refs nexu-io/open-design#1130

* refactor(web): align streaming tail-line fence guard with FENCE_OPEN_RE

The streaming parser's tail-line hold-back used a stricter local regex
(/^```\w*$/) than the renderer's FENCE_OPEN_RE (/^```(\w[\w+-]*)?\s*$/),
missing valid opener tails like ```c++, ```ts-, or ``` (trailing space).

In practice these tails are still held back by the unmatched-backtick
parity scan that runs immediately after — three backticks in a tail line
are odd, so firstUnmatched stays set and the parser holds from that
position. So this wasn't a runtime correctness bug, just a regex
divergence that future readers could trip on.

Drop the local regex and reuse FENCE_OPEN_RE so the tail check matches
the same shape the rest of the pipeline already uses. Pinned the
behavior with three new parser tests (`+`/`-` info-string suffix and
trailing-space tails arriving as the first chunk) — they pass at HEAD,
proving the parity scan was already covering these cases.

Refs nexu-io/open-design#1132 (lefarcen polish P2)

* fix(web): scope inline-code skip ranges per block and reject <artifact prefix-shared opens

INLINE_CODE_RE previously ran over the whole buffer, so an unmatched
backtick in one paragraph could pair with a backtick in a later
paragraph and create a phantom inline span that swallowed any real
<artifact …> between them. Mirror runtime/markdown.tsx by splitting the
buffer on fence / blank / heading / list / hr boundaries and running
INLINE_CODE_RE per block region instead.

stripArtifact accepted any unskipped `<artifact` substring as a real
open, while the streaming parser already required a following whitespace
character — so prose like `<artifactual>demo</artifact>` was being
truncated to `prefix  suffix`. Extract the parser's real-open guard into
isRealArtifactOpenAt and reuse it from both sides.

While reordering findOpenTag for the shared guard, also fix the related
hold-back ordering issue tracked at #1141: a stray tail-line backtick or
fence-opener prefix used to suppress an artifact already complete
earlier in the buffer. Scan for the earliest complete real open first,
then pick the earliest hold-back position only when no complete tag was
found.

Regressions pinned in parser.test.ts and strip.test.ts for both new
finding shapes.

* fix(web): keep HR-shaped lines inside paragraph regions for inline-code scanning

The previous walker closed inline-scan regions on lines matching the HR
regex, but `parseBlocks()` in runtime/markdown.tsx does not break a
paragraph on HR — its inner accumulation loop only breaks on blank /
fence / heading / ul / ol (runtime/markdown.tsx:95-104). HR is only an
HR block in the outer loop's first-look, never mid-paragraph.

So inputs like `intro \`\n---\n<artifact …>…</artifact>\n---\nclosing \``
are one paragraph in the renderer, whose two stray backticks pair to
cover the literal artifact recitation — but the walker was splitting on
the `---` lines, leaving the recitation outside skip ranges, and the
parser/stripper would treat it as a real tag.

Drop HR from the paragraph-break list (HR-shaped lines carry no
backticks of their own, so keeping them inside the surrounding region
is benign either way) and document the renderer-mirror rationale.

Regressions pinned on both sides.
2026-05-11 19:29:22 +08:00
Sid
156bf5a34e
fix(web): refresh home projects after deleting a conversation (#1202) (#1219)
The home design cards render their `Needs input` badge from the
cached `/api/projects` payload — App.tsx owns the `projects` state
and exposes a `refreshProjects` callback that ProjectView already
fires from every other state-changing branch (run end, live-artifact
events, project rename, etc.). The conversation-delete branch
silently skipped it: deleting a conversation that owned an unanswered
`<question-form>` flips the daemon-side flag, but the home view kept
showing the stale badge until the next manual reload.

Call `onProjectsRefresh()` immediately after a successful
`deleteConversation` API response (and only then — if the request
fails the cached state is still the truth and we must not pretend
otherwise). Adds `onProjectsRefresh` to the useCallback deps for
exhaustive-deps correctness; matches the pattern at the four
existing call sites in this file.

New regression coverage in
`apps/web/tests/components/ProjectView.deleteConversation.test.tsx`:
- triggers onProjectsRefresh after deleting a conversation
  (verified RED before this fix, GREEN after)
- does not trigger onProjectsRefresh when the delete request fails
  (defensive complement so a future "always refresh" refactor
  doesn't paper over a real failure with a stale-but-confident UI)
2026-05-11 19:29:09 +08:00
shangxinyu1
10802bb0b0
test: expand nightly UI and desktop regression coverage (#1256)
* e2e(ui): cover examples preview flows

* e2e(ui): cover Codex local CLI fallback UX

* test: expand desktop and connector regression coverage

* e2e(ui): cover workspace restoration flows

* e2e(ui): cover retry recovery workspace flow

* test: cover artifact and connector recovery flows

* e2e(ui): cover Continue in CLI stale provenance flow

* e2e(ui): cover BYOK model fetch caching

* test: expand Orbit and desktop connector coverage

* e2e(ui): cover workspace quick switcher recovery flows

* e2e(ui): cover connector pending authorization recovery

* e2e(ui): cover workspace and conversation restoration routes

* e2e(ui): cover conversation draft and attachment restoration

* e2e(ui): cover conversation history selection recovery

* e2e(ui): cover workspace surface conversation selection

* test: cover artifact presentation and orbit link behavior

* test: cover artifact external link restoration

* e2e(ui): cover root-route deep-link restoration

* e2e(specs): cover Orbit open-artifact desktop click

* e2e(specs): cover desktop artifact open link

* test: fix Orbit settings fixture type drift

* test: split Playwright critical and extended suites

* test: fix ProjectView design template fixtures

* ci: split workspace test stages

* guard: allow split Playwright suite scripts

* test: shrink Playwright critical suite

* test: restore omitted Playwright suites
2026-05-11 19:23:13 +08:00
PerishFire
8c0fb8dc01
feat(tools-pr): add maintainer PR-duty workspace (#1259)
* feat(tools-pr): add maintainer PR-duty workspace

Adds `tools/pr` as the maintainer-only control plane for PR-duty work on
this repo. Thin `gh` wrapper that encodes repo-specific knowledge:
review lanes, forbidden surfaces, lane-specific checklists, validation
command derivation from touched packages.

Subcommands:
- `list` — triage open queue by lane and review-state bucket.
- `view <num>` — agent-friendly review brief for a single PR.
- `classify [num]` — emit script-level tags for one PR or the whole
  open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/`
  with rate-limit telemetry per run.
- `assignment` — assigner-perspective view of PR ownership, idle time,
  and blockers (derived from existing tags; no new judgments).

Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase,
forbidden-surface, unlabeled, duplicate-title, non-ascii-slug,
maintainer-edits-disabled, org-member, unresolved-changes-requested,
stale-approval, and three awaiting-* timing tags. Each rule is
expressible as one factual sentence over `gh` data + repo paths — see
`tools/pr/AGENTS.md` for the full dictionary plus precision rules.

Templates in `tools/pr/templates/*.md` are aesthetic references for
recurring maintainer comments (duplicate-title ask, awaiting-author
nudge, agent-review brief shape). `templates/examples/` holds
frozen-in-time agent-review snapshots for three PR shapes.

Infrastructure:
- `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s
  backoff) on transient 5xx / network errors. Persistent failures still
  surface — retry is anti-jitter, not an exponential-backoff resilience
  layer.
- Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines)
  use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to
  stay under GitHub's GraphQL server-side timeout. Light chunks stay on
  `gh pr list --json`.
- `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members
  --paginate`.

Wiring:
- Root `package.json` adds `pnpm tools-pr` to the allowed root entry
  points.
- `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace
  packages.
- `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and
  `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level
  layout allowlist.
- Root `AGENTS.md` and `tools/AGENTS.md` document the new command
  surface, root-command-boundary update, and per-tool ownership.

* docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md

Adds a `PR-duty tooling` section to the root AGENTS.md summarising what
`pnpm tools-pr` is, listing the four common subcommands (list / view /
classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for
the full tag dictionary, operational playbook, templates, and design
rules. The section keeps root-level guidance to high-level orientation
while details stay local to the tool's own AGENTS.md.

* fix(tools-pr): drop overly broad touches-root-package.json forbidden hit

`deriveForbidden` was flagging any change to root `package.json` as a
forbidden-surface hit, but AGENTS.md §Root command boundary only forbids
specific *lifecycle* aliases (pnpm dev / test / build / daemon / preview
/ start) — tools-control-plane entrypoints like `pnpm tools-pr` are
explicitly allowed. Distinguishing "forbidden alias" from "allowed
entry" requires reading the diff content, which is `pnpm guard`'s job
rather than a path-derived classify tag.

Dogfooded on this branch's own PR (#1259), which added the `pnpm
tools-pr` script and was incorrectly flagged. Removing the hit aligns
the `forbidden-surface` tag with what tools-pr can mechanically detect
from file paths alone (apps/nextjs/, packages/shared/).

* fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator

Three review follow-ups on #1259, all factual fixes:

- `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a
  one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps
  connection page size at 100, so the previous implementation would
  fail at runtime when callers passed `--limit > 100`. The paginated
  path makes the commits fetch consistent with the other heavy chunks
  (reviews, comments, assignment timelines) and removes the artificial
  ceiling entirely. The `limit` parameter is dropped from
  `fetchOpenPrCommits`; the CLI `--limit` continues to bound the
  `gh pr list --json` chunks.
- `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision`
  and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge
  state `CLEAN` or `UNSTABLE` and carries no blockers, status renders
  as `ready to merge` instead of falling through to `in review`. The
  assignment view loses its main triage signal without this — a clean
  human-approved PR rendered identical to a REVIEW_REQUIRED one.
- `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both
  constructed the title-index key with a literal NUL byte between
  author and title, which made the file appear as binary in `git diff`
  / review tooling. Replaced the literal byte with a Unicode escape
  sequence in source; the runtime string value is identical, the
  source stays plain text and round-trips through review tooling
  cleanly.

* fix(tools-pr): raise default --limit to 1000 to cover the live open queue

mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`)
defaults to `--limit 100`, which silently drops every PR past the first
100 in the open queue. The repo currently sits at 104 open PRs, so the
out-of-the-box run was already omitting four PRs.

Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`,
and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates
internally, so a high cap is cheap. Users can still pass `--limit <small>`
for a truncated preview. CLI help text on the three subcommands updated to
match.

* fix(web): pass designTemplates to ProjectView render helper

#955 made `designTemplates` a required Prop on ProjectView, but the test
helper added in #1244 (`renderProjectView` in
`ProjectView.api-empty-response.test.tsx`) was never updated. The two
PRs landed on main without conflicting, leaving `apps/web` typecheck red
for every PR that rebases past b5eb8c16.

Pass `designTemplates={[] as SkillSummary[]}` alongside the existing
`skills={[] as SkillSummary[]}` so the helper compiles. The component
already treats the array shape (empty included) as a no-op fallback in
the empty-response paths the test exercises.

* fix(tools-pr): correct author signal + merge inline review comments

Two correctness gaps in the awaiting-* signal pipeline surfaced during
review of the new tools-pr commands:

1. `authorSignalAt` iterated every PR commit unconditionally. On
   `maintainerCanModify=true` PRs a maintainer's follow-up push would
   advance the author timestamp, masking a stalled author response.
   Filter commits to those whose `authorLogin` matches `facts.author`,
   mirroring the same filter already applied to comments.

2. `fetchOpenPrComments` (and `fetchView`) only fetched
   `pullRequest.comments` / `gh pr view --json comments`, which is the
   issue-conversation thread. Inline review-thread replies — where
   authors and reviewers actually exchange most fix-up replies — live in
   `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let
   `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point
   at the wrong side after someone replied inline. Extend the list-mode
   GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`,
   and add a parallel REST inline-comments fetch in `fetchView` that
   merges into `GhView.comments`.
2026-05-11 19:17:21 +08:00
Tom Huang
b5eb8c1647
feat: generic skills + split skills/design-templates + finalize-design API (#955)
* feat: general-purpose skills with @-mention composition and user import

Lift skills from "one mode-bound skill per project" to a generic capability
the user can compose per turn:

- Daemon: scan multiple skill roots (user-skills under runtime data, then
  the bundled `skills/`); user-imported skills can shadow built-ins by id.
- New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints,
  with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete
  protection.
- ChatRequest gains `skillIds: string[]`; the chat run concatenates each
  picked skill's body (and merges craftRequires) into the system prompt
  for that turn only — the project's persistent `skillId` is untouched.
- Web composer: `@` popover now lists skills alongside project files;
  picks render as removable chips above the textarea and ride along with
  the request as `skillIds`.
- Settings → Library: import form (name/description/triggers/body),
  per-card delete for user skills, "user" origin badge.

* chore(web): drop welcome pet teaser + add ds→prompt-template mapping util

- SettingsDialog: remove the inline pet adoption teaser from the welcome
  panel so the first-run modal stays focused on configuration.
- New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design
  system's authored metadata to prompt-template gallery categories.
  Imported by the design-system gallery wiring on a sibling branch; no
  callers in this branch yet.

* feat: split skills/design-templates and add finalize-design API

Phase 0 of the skills/design-templates refactor (specs/current/
skills-and-design-templates.md):

- Move ~104 rendering catalogue entries from skills/ to design-templates/
  and keep skills/ for the small set of functional skills that *do work*
  on user input (utilities, briefs, packagers).
- Add design-templates/AGENTS.md and skills/AGENTS.md describing the
  contract, and a brand-agnostic craft/ surface for opt-in craft rules.
- Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and
  an /api/design-templates surface mirroring /api/skills. Asset/example
  routes still span both registries so existing srcdoc URLs keep
  resolving across the rename.
- Web: split LibrarySection into SkillsSection + DesignSystemsSection,
  rename the EntryView "Examples" tab to "Templates", and update locales
  + the New-project picker accordingly.

Adds the finalize-design endpoint:

- New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/
  finalize.ts — one-shot synthesis of a project's transcript + active
  design system + current artifact into <projectDir>/DESIGN.md via the
  Anthropic Messages API. Per-project .finalize.lock mirrors the
  transcript-export hygiene from PR #493; provider credentials are not
  persisted by the daemon.

Other supporting changes:

- README + AGENTS.md updates to document the new directory split and
  craft/ surface, plus i18n strings across 13 locales.
- Test refactors and new coverage (finalize-design, runs, sidecar
  server, plus refreshed daemon integration tests).
- .gitignore: scope the *.exe ignore to /OpenDesign.exe so legitimate
  vendor binaries are no longer hidden.

* fix(merge): move clinical-case-report to design-templates/

Origin/main added the clinical-case-report skill under skills/ before
the skills/design-templates split landed. Its od.mode is prototype, so
per specs/current/skills-and-design-templates.md it is a design template
and belongs alongside the other rendering catalogue entries — not under
the slimmed-down functional skills/ root. Moving it keeps the EntryView
Templates tab consistent with origin/main's intent.

* feat(skills): curated design/creative catalogue + collapsible Settings rows

Seed ~100 curated design/creative skill stubs under skills/ sourced from
awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent).
Each stub carries an od.category tag so the new filter pill row in
Settings -> Skills can group them. The seed script
(scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills)
is idempotent: it only creates folders that don't already exist, so
hand-edited stubs are never overwritten.

- Daemon: parse and surface od.category on SkillInfo with a strict slug
  normaliser; mirror the field on SkillSummary in @open-design/contracts.
  Category is purely a UI hint — system-prompt composition is unchanged.
- Web: rewrite SkillsSection from a left-list / right-detail grid into a
  vertical stack of collapsible rows mirroring the External MCP panel
  (header always visible with name + mode/source/category pills + per-row
  enable toggle; SKILL.md preview, file tree and inline edit form expand
  on demand). Add a Category filter row above the list. Reorder Settings
  nav so Skills + External MCP sit above the Composio/MCP cluster. Update
  composer placeholder/hint across 17 locales to advertise '@ files or
  skills · / for commands'.
- Docs: extend skills/AGENTS.md with the curated catalogue rules
  (idempotency, category vocabulary, no upstream vendoring).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(skills): teach localized-content + system-prompt tests about the skills/design-templates split

mrcfps blocking review on PR #955: the skills/design-templates split
(b5993385) moved ~110 SKILL.md entries out of `skills/` and into
`design-templates/`, but two repo-level tests still hard-coded the
single-root layout, so CI gates went red on the merged branch:

- `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills`
  while the locale `skillCopy` map keeps id-keyed entries spanning
  both roots (ExamplesTab/Templates uses one lookup regardless of
  origin). Teach the helper to read both `skills/` and
  `design-templates/`, deduplicating ids so the union matches the
  localized claim.
- `apps/daemon/tests/prompts/system.test.ts` read
  `skills/live-artifact/SKILL.md`, which now lives under
  `design-templates/live-artifact/`. Update the absolute path so
  composeSystemPrompt's coverage of the live-artifact preamble is
  exercised again.

Also enroll the curated design/creative catalogue (PR #955, ~91
stubs sourced from awesome-claude-skills / awesome-agent-skills) in
the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are
English-only by design (frontmatter advertises an upstream URL); the
fallback list is exactly the place to acknowledge "we know this id
exists, English copy is fine here" so the localized-content coverage
gate passes without forcing a translation task per locale.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids

mrcfps PR #955 review: `buildSkillMarkdown` emitted `name:
${escapeYamlString(name)}` without quotes, so YAML coerced names
like `123`, `true`, `false`, or `null` into non-string scalars on
re-parse. listSkills() then read `data.name` as a number/boolean
and the import flow's follow-up `findSkillById(skills, result.id)`
missed it, falling into `/api/skills/import`'s "imported skill
could not be re-read" 500 path for those ids.

Switch the emitter to a quoted scalar (`name: "..."`) — the
double-escape already in `escapeYamlString` makes the quoted form
safe — and add a round-trip test covering `123`, `true`, `false`,
`null`, and `0` to lock in the contract.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop staged-skill chips when the matching @<id> token leaves the draft

mrcfps PR #955 review: `submit()` always forwarded every id in
`stagedSkills`, but that state was only mutated on picker click and
chip removal. Hand-deleting an `@<id>` token from the textarea left
the chip staged, so the request still carried `skillIds: [<id>]` and
the daemon composed a skill the prompt no longer referenced.

Sync the chips with the draft inside `handleChange()` by pruning
`stagedSkills` whenever the new value no longer contains the
`@<id>` token (using the same whitespace boundary as
`removeStagedSkill`'s strip regex). Comment explains why this
prune does not run for `staged` file attachments — users frequently
add files via the upload button without leaving an `@<path>` token,
so a symmetric prune there would erase legitimate uploads.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(daemon): stage @-composed skills' side files alongside the active skill

codex PR #955 review: composing a per-turn `@`-picked skill into the
system prompt appended its body (with the `withSkillRootPreamble`
guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`)
but never staged the actual folder. `startChatRun` only copied
`activeSkillDir`, so when the project's primary skill was different
(or absent) the composed skill's references/, examples/, and scripts/
files lived only at their absolute repo path — agents that honour
the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with
allowlisted gpt-image projects) couldn't reach them.

Thread the composed skills' dirs out of `composeDaemonSystemPrompt`
as `extraSkillDirs` and stage each one through the same
`stageActiveSkill` API used for the primary skill. Dedupe by folder
basename so a project whose primary skill is also `@`-composed isn't
copied twice. Each preamble already advertises its own folder, so the
prompt and the staged tree stay aligned without further changes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): respect the Library disable toggle in the project @-mention picker

codex PR #955 review: only `EntryView` received `enabledSkills`
(filtered against `config.disabledSkills`); active projects still
got `skills={skills}` raw, so a skill the user disabled in Settings
kept appearing in the project's `@`-mention popover and could ride
along to the daemon via `skillIds`. That broke the Library toggle
for any project opened on the post-split branch.

Compute a functional-skills-only enabled subset
(`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead.
Templates stay separate — design-templates are filtered through their
own `enabledDesignTemplates` memo for the Templates gallery — so
ProjectView's chat composer still only sees skills, never templates,
matching the pre-split prop surface.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): mock /api/design-templates for example-use-prompt flow

The Templates tab in EntryView fetches from /api/design-templates after
the skills/design-templates split (specs/current/skills-and-design-templates.md).
The example-use-prompt Playwright scenario only mocked /api/skills, so the
gallery card never appeared and the test timed out waiting on
example-card-warm-utility-example. Serve the same fixture summary on both
endpoints so the templates gallery renders the card the test clicks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools-pack): create design-templates fixture for resources test

The packaging resources copy now bundles the new design-templates tree
alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The
copyBundledResourceTrees fixture only created skills, design-systems,
craft, etc., so the recursive copy crashed with ENOENT on
design-templates before it could check the prompt-templates assertion.
Add the missing fixture directory so the test exercises the same set
of resource trees the packaged build does.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): clone built-in side files into the shadow on first edit

mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR
shadow folder that contained only a new SKILL.md. The next listSkills()
pass surfaced the shadow as the active dir, but every side-file resolver
(/api/skills/:id/files, /example, /assets/*, the system-prompt preamble,
and the per-turn cwd staging) reads through skill.dir. With nothing but
SKILL.md in the shadow, the bundled assets/, references/, scripts/, and
examples/ disappeared the moment the user hit save — a built-in like
last30days or live-artifact would break immediately after edit instead
of just having its body overridden.

Teach updateUserSkill() to take a `sourceDir` and clone every entry
except SKILL.md / dotfiles into the shadow on the very first edit. The
shadow stays self-contained, so all the resolvers keep working without
fallback bookkeeping. Subsequent edits detect the existing shadow and
skip the clone, so user tweaks under the side tree survive a re-save.

Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler
and add two regression tests:
- 'clones built-in side files into the shadow on the first edit' walks
  the file tree after save and asserts assets/template.html, references/
  notes.md, and scripts/helper.sh all round-trip from the built-in.
- 'preserves user-edited side files on subsequent edits' edits the
  staged assets/template.html, re-saves, and confirms the user content
  is still there.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): rename home tab from Examples to Templates

The Examples tab was renamed to Templates in EntryView (b5993385's
skills/design-templates split — entry.tabExamples became entry.tabTemplates
and the tab value moved from 'examples' to 'templates'), but
entry-chrome-flows still asserted the old label and testId. Update both.

* fix(skills+web): preserve template body in API mode and dir-based skill delete

Two follow-ups from PR #955 review:

1. ProjectView only received `enabledFunctionalSkills`, but
   `composedSystemPrompt()` still resolved `project.skillId` through that
   prop and `fetchSkill()`. Projects created from the new
   `/api/design-templates` surface keep a template id in `project.skillId`,
   so opening one in API mode dropped the template body from the system
   prompt and the upstream request ran without the project's primary
   template instructions. Now ProjectView takes a separate
   `designTemplates` prop (the unfiltered template list, so a
   later-disabled template still loads for projects already created from
   it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups
   fall back to that list, with `fetchDesignTemplate()` as the body-fetch
   fallback to `fetchSkill()`. The chat composer's `@`-picker keeps
   receiving only the enabled functional skills.

2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)`
   which re-slugified the frontmatter id and removed
   `<userSkillsDir>/<slug>/`. That matched the import shape but missed the
   install shape — `installFromTarget` writes the folder at
   `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local
   symlink), neither of which is guaranteed to equal the slugified
   frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)`
   handler at the install routes never fired because Express resolved the
   earlier registration first, leaving the install/uninstall path without
   working teardown. The handler now removes `skill.dir` (the absolute
   path listSkills already discovered) under a USER_SKILLS_DIR safety
   check, using `lstat` + `unlinkSync` so symlinked local installs unlink
   cleanly without recursing into the user's source tree. The dead
   duplicate handler is removed; `deleteUserSkill` is dropped from the
   server.ts import set (still exported and unit-tested in skills.ts).
   Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts`
   pins both shapes plus the symlink-preserves-source case.

* test(daemon): point hyperframes system-prompt test at design-templates

The merge with main brought in a hyperframes system-prompt test that
reads `skills/hyperframes/SKILL.md`, but this branch's split moved
`hyperframes` into `design-templates/` (same migration as `live-artifact`
already handled above in this file). CI was failing with ENOENT on the
old path.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 17:48:34 +08:00
PerishFire
f2db5a749c
chore: enforce PR→issue linking discipline (#1263)
PRs that omit Fixes #N break the release-time reverse lookup
(issue → closing PR → merge sha → first containing tag), since the
auto-link only fires on the explicit closing keywords. We've been doing
this by hand on recent fixes; codify it so future PRs don't drift.

- Add .github/pull_request_template.md with a Fixes # placeholder so
  the link surface is in front of the author by default.
- Add a corresponding bullet to the Bug follow-up workflow in the root
  AGENTS.md so the discipline lives next to the methodology that
  produces issue-linked work.
2026-05-11 17:24:24 +08:00
PerishFire
a797e079b1
fix(desktop): exit fullscreen before hiding window on macOS close (#1249)
* fix(desktop): exit fullscreen before hiding window on macOS close (#1215)

When a preview is in 演示 → 全屏 mode, the macOS close handler called
window.hide() directly, leaving the OS fullscreen Space orphaned as a
black screen — the window vanished but the Space stayed up.

Extract hideWindowExitingFullscreen as the named invariant ("hide,
but first leave fullscreen so the OS Space tears down with the window")
and route the darwin close handler through it. The hide is deferred
until 'leave-full-screen' fires so we don't race the OS Space teardown.

Bootstraps Vitest on apps/desktop with a single test under
tests/main/hide-window-exiting-fullscreen.test.ts that exercises the
helper through a structural mock — the bug shape is pure logic, no real
Electron window required. Spec was red against a hide-only helper and
green after the leave-full-screen sequencing.

* docs(agents): codify bug follow-up workflow

Distill the spec-first / cheapest-layer / scope-discipline /
invariant-shaped-fix / baseline-diff playbook used recently on #135 and
#1215 into a top-level subsection of root AGENTS.md, framed as a default
action shape with explicit room for case-by-case judgment rather than a
hard rule. Includes a single pointer back to the worked example spec.

* docs(agents): require staged human verification for visible bugs

Add the human-verification gate as a sixth bullet in the Bug follow-up
workflow. UI / platform-native / animation symptoms can pass green specs
and still ship the visible regression — proven by #1215, where the
desktop unit test green-lighted the helper logic but only a side-by-side
buggy-vs-fix run on a real macOS Space proved the black-screen actually
went away.

Reinforces the production-API-only seed constraint while we're there:
source-level backdoors prove a fake flow, not the real one, so they
invalidate the verification.

* fix(desktop): defer hide across the fullscreen-enter transition (#1215)

mrcfps observed on PR #1249 that the close handler only catches windows
already in fullscreen — Electron's enter-full-screen event is async on
macOS, so isFullScreen() can still read false during the OS Space
transition triggered by requestFullscreen(). A close in that window
took the plain hide() path and stranded the same black Space the fix
was meant to eliminate.

Track in-flight fullscreen entry from webContents.enter-html-full-screen
(set) and BrowserWindow.{enter,leave}-full-screen (clear), and surface
it through WindowFullscreenSurface.isEnteringFullscreen. The helper now
parks on enter-full-screen until the OS confirms the Space, then runs
the existing exit-then-hide path.

Adds a regression test ("waits out a fullscreen-enter transition before
exiting and hiding") that goes red against the previous helper.
2026-05-11 17:04:42 +08:00
Caprika
f7f2661bda
[codex] Handle empty API responses as no output (#1244)
* Handle empty API responses as no output

* Fix empty API response comment cleanup

* Stabilize API empty response detection
2026-05-11 16:57:02 +08:00
PerishFire
421ddf553c
fix(pack/win): close running app before silent reinstall (#1238) 2026-05-11 16:35:07 +08:00
nettee
e859c31574
fix(web): complete finished tool calls missing results (#1240) 2026-05-11 15:54:11 +08:00
Tom Huang
e254d1280b
feat(memory): auto-memory store with chat-protocol-aware extraction (#999)
* feat(memory): auto-memory store with chat-protocol-aware extraction

Markdown memory store at <dataDir>/memory/ with two extractors —
heuristic regex for explicit "remember:" / "我是 X" markers, and a
small-model LLM pass after each turn — folded into the system prompt
so cross-chat preferences, role, and ongoing-work context survive
restarts.

Settings UI:
- Memory tab lists entries, exposes a hand-edited MEMORY.md index, and
  shows an extraction history with per-attempt phase/skip/failure rows.
- Memory model picker is inline next to the chat model picker (CLI and
  BYOK) so the choice "which fast model mines facts each turn?" sits
  next to the chat-model decision instead of a separate panel. The
  picker reuses the same SUGGESTED_MODELS table and "Custom..." pattern
  the chat picker uses.

LLM extractor supports all four protocols (anthropic / openai / azure /
google); pickProvider takes the chat agent id from the chat handler
and constrains its auto-pick to the chat's protocol family — Claude
Code chats no longer surprise users by silently extracting on whatever
OpenAI key happens to be in media-config. When no matching key is
configured the attempt records as 'skipped: no-provider' instead of
quietly switching vendors.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): keep hint outside <label> and disambiguate Model selectors

The inline Memory model picker wrapped its hint paragraph inside the
<label>, which made the hint's "API key" / "model" wording bleed into
the <select>'s accessible name and broke Playwright's getByLabel('API
key') / getByLabel('Model') strict-mode matching in the existing
settings-api-protocol e2e suite.

- Move the hint <p> out of the <label> in MemoryModelInline so the
  select's accessible name is just "Memory model".
- Switch the chat-Model selectors in settings-api-protocol.test.ts from
  getByLabel('Model') to getByRole('combobox', { name: 'Model', exact:
  true }) so they no longer collide with the new "Memory model" select
  that sits next to the chat Model picker.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): address review changes — BYOK wiring, MEMORY.md index, /v1, label wrapper

Addresses the four blocking review threads on PR #999.

1. MemoryModelInline accessibility (mrcfps)
   The inline picker still wrapped its select + custom input + flash +
   hint inside a single <label>, which made the select's accessible
   name absorb every text descendant — including the "API key" / "model"
   hint copy. The previous fix moved only the hint outside; the
   reviewer asked for a non-label wrapper. Switch to <div className="field">
   and associate just the short title with the controls via
   `aria-labelledby` / `aria-label`. The select's accessible name is
   now exactly "Memory model" so `getByLabel` strict-mode locators
   on the surrounding chat form stop cross-matching the memory copy.

2. Respect the hand-edited MEMORY.md index (mrcfps + codex)
   `composeMemoryBody()` was reading every *.md file in the memory
   dir, ignoring the index. Removing a `- [Name](id.md)` line had no
   effect on future prompts. Parse the index's `INDEX_LINK_RE` bullets
   and filter `listMemoryEntries()` to the linked id set, so the
   editor's "delete this line to disable injection" promise actually
   holds.

3. Versioned OpenAI-compatible base URLs (codex)
   `callOpenAI` and `callAnthropic` hard-coded `/v1` onto
   `provider.baseUrl`, breaking custom endpoints whose saved URL
   already includes `/v1` (`/v1/v1/chat/completions`). Apply the same
   conditional `appendVersionedApiPath` helper the chat proxy and
   connection-test routes already use.

4. Wire memory into BYOK / API-mode chats (mrcfps + codex)
   The previous PR's daemon-only memory hook never fired for BYOK,
   leaving the Memory tab + model picker as a no-op for that mode.
   Add the missing surface and wire it through ProjectView:
   - contracts: extend `composeSystemPrompt` with `memoryBody`,
     mirroring the daemon's local composer; add
     `MemorySystemPromptResponse` and the `attemptedLLM` flag on
     `ExtractMemoryResponse`.
   - daemon: expose `GET /api/memory/system-prompt` (returns the
     composed body) and turn `POST /api/memory/extract` into a
     two-phase endpoint — heuristic-only when only userMessage is
     supplied (pre-turn), LLM-only when assistantMessage is also
     supplied (post-turn), so the extraction-history doesn't double
     up.
   - web: ProjectView's BYOK branch now fetches the memory body
     before composing the system prompt, runs the heuristic
     extractor before the run (so "remember:" markers in this turn
     reach this turn's prompt), accumulates assistant text during
     streaming, and queues the LLM extractor on `onDone` — fire-and-
     forget so it never blocks the chat round-trip.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): re-sync BYOK memory override when chat config drifts

The inline memory-model picker captured `apiProtocol` / `chatApiKey` /
`chatBaseUrl` / `chatApiVersion` into the saved override only at the
moment the user clicked a model. If they later swapped the BYOK
protocol tab, rotated the API key, or edited the base URL in the same
settings flow, the daemon's background extractor kept calling the
*old* vendor / credential — directly contradicting the picker's
"borrows the surrounding chat picker's protocol, key, base URL, and
api-version automatically" promise.

Add a debounced effect that compares the persisted (masked) shape
against the live chat props and re-PATCHes /api/memory/config when
they drift. The masked config exposes `apiKeyTail` (last 4 chars), so
key rotation is detectable without ever round-tripping the secret
back to the browser. The 300 ms debounce coalesces the keystroke-
granularity prop updates the parent settings dialog streams during
its autosave loop, so a user editing the base URL doesn't trigger one
PATCH per character. Background re-syncs are silent — the "Saved!"
flash only fires for explicit user clicks, so the picker doesn't feel
like it's fighting them as they edit unrelated chat fields.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): thread BYOK chat config through /api/memory/extract default path

Leaving the BYOK memory picker on "Same as chat" still broke the
default LLM extraction path: `MemoryModelInline` clears the override
for that option, both `/api/memory/extract` calls in `ProjectView`
only sent the messages, and the daemon never persists BYOK creds, so
`extractWithLLM(..., { chatAgentId: null })` always reached
`pickProvider()` with no chat context and fell through to env /
media-config — the wrong vendor for a BYOK chat that works for
inference.

Thread the live BYOK chat config through the extract endpoint as a
per-call snapshot:

- contracts: extend `ExtractMemoryRequest` with an optional
  `chatProvider` (provider/apiKey/baseUrl/apiVersion/model) and add
  `'chat-byok'` to the credentialSource enum.
- daemon: parse + validate `chatProvider` on `/api/memory/extract`
  (provider must be one of the five known shapes) and forward to
  `extractWithLLM` as a new option. `pickProvider()` gets a new
  path 2 that uses the snapshot directly with the per-protocol
  fast-model default — so a memory pass on `gpt-4o` / `claude-sonnet-4-5`
  silently turns into a cheap `gpt-4o-mini` / `claude-haiku-4-5` call
  instead of paying chat-tier rates for sediment work. Override and
  CLI-agent-constrained paths still win when they apply.
- web: `ProjectView` snapshots `apiProtocol` / `apiKey` / `baseUrl` /
  `apiVersion` from the live `AppConfig` on each BYOK extract call
  (both pre-turn heuristic-only and post-turn LLM phases). The
  picker's existing drift-resync effect already covers explicit
  overrides; this snapshot covers the implicit "Same as chat"
  default that the override flow can't reach.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): treat empty apiKey on PATCH as a real clear

MemoryModelInline silently re-PATCHes /api/memory/config whenever the
surrounding BYOK chat creds drift. The previous reuse branch lumped
`apiKey === ''` together with `apiKey === undefined`, so clearing the
chat API key from the picker quietly preserved the old daemon-side
secret and kept calling the provider on a stale credential.

Distinguish four states for the apiKey field:
- absent       -> preserve stored secret (form re-save without re-typing)
- ''           -> clear stored secret (user removed it from the picker)
- 'sk-...'     -> replace
- new provider -> ignore stored secret entirely

Add tests/memory-config-route.test.ts covering all four cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 15:45:42 +08:00
Tom Huang
e11e86d468
feat(hyperframes): land HTML-in-Canvas across web + skills (#866)
* feat(hyperframes): land HTML-in-Canvas across web + skills

Ships HTML-in-Canvas as a first-class HyperFrames video path:
- 7 new video prompt templates (liquid glass, iPhone+MacBook, portal,
  shatter, magnetic, liquid background, text-cursor reveal).
- skills/hyperframes/references/html-in-canvas.md, surfaced via
  SKILL.md description+triggers and the system-prompt pre-flight
  references list.
- ChatPane starter prompts now branch by project kind and video model,
  so the hyperframes-html surface shows HTML-in-canvas-shaped prompts
  instead of the generic prototype trio.
- NewProjectPanel propagates a picked template's model+aspect onto
  the project, and defaults videoModel to hyperframes-html when the
  hyperframes skill resolves for the video tab.

Polish bundled in the same branch:
- DesignFilesPanel empty state becomes a centered pill with a "New
  sketch" CTA; designFiles.empty copy simplified across 19 locales.
- Topbar project title + meta render on one baseline row separated
  by a middot.
- scripts/seed-test-projects.ts hardens daemon URL discovery against
  pnpm engine warnings on stdout.

* fix(new-project): preserve explicit video model choice across tab revisits

Latch a videoModelTouched guard once the user picks a model via the
dropdown or via a template that declares one, so the hyperframes-html
auto-default no longer silently overwrites the override when the Video
tab is re-entered.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(i18n): register hyperframes html-in-canvas templates, category, and tags

Adds the seven new prompt-template ids, the "VFX / HTML-in-Canvas"
category, and the new tag set to the de/ru/fr i18n bundles so the
e2e localized-content coverage test passes.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(daemon): inject html-in-canvas preflight for hyperframes runs

The contracts-side derivePreflight() learned about
references/html-in-canvas.md when this PR landed, but the daemon
copy at apps/daemon/src/prompts/system.ts kept the older five-ref
allowlist. server.ts:4138 wires composeSystemPrompt from the
daemon copy into live chat runs, so the main HyperFrames flow this
PR is meant to improve still wasn't auto-injecting the preflight
directive in production.

Mirror the html-in-canvas case into the daemon composer and lock it
behind a daemon-side test so the two copies cannot drift again on
this reference. The broader live-artifact preflight gap (artifact-
schema / connector-policy / refresh-contract) is pre-existing drift
and is intentionally out of scope here.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): restyle designs empty state as centered card on grid backdrop

Swap the horizontal pill for a stacked card and add a faint grid backdrop
so the empty designs surface reads as an intentional canvas rather than a
gap. Title now wraps instead of truncating; container is taller.

* fix(new-project): pin skillId to hyperframes when videoModel is hyperframes-html

When the Video tab resolves its skill it used to fall back to `list[0]?.id`
if no skill declared `default_for: video`. That list is built from an
unsorted `readdir()` in apps/daemon/src/skills.ts, so a freshly mounted
project could land on `video-shortform` even when the user had explicitly
chosen the HyperFrames-HTML model (or one of the new
`hyperframes-html-in-canvas-*` templates). The agent then ran without the
hyperframes SKILL body or its `references/html-in-canvas.md` preflight —
the exact regression PR #866 was meant to land.

`skillIdForTab` now pins to `hyperframes` whenever the current video model
is `hyperframes-html`, regardless of discovery order. Added a unit test
that mounts both `video-shortform` and `hyperframes` (with hyperframes
last, simulating the bad readdir order) and asserts the create payload
routes through `hyperframes`.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 15:45:12 +08:00
PerishFire
31e57fd773
fix(daemon): persist runStatus/endedAt on chat run termination (#1230)
* fix(daemon): persist runStatus/endedAt on chat run termination (#135)

POST /api/runs created the run but never reconciled the messages row
on terminal status. If the web failed to persist the cancel (refresh,
dropped PUT), the row stayed at run_status='running' / ended_at=NULL,
and on reload the elapsed timer kept climbing because the renderer
fell back to now - startedAt.

Mirror routine/orbit reconciliation: attach a wait-completion handler
that updates run_status and ended_at, guarded by COALESCE and a
run_status IN ('queued','running') filter so concurrent web persists
are not clobbered.

Adds cancelRun helper and two regression specs under e2e/tests/dialog/.

* fix(daemon): annotate reconcile callback params for chat-routes

The chat run reconciliation block landed in chat-routes.ts after the
recent server-route split (#1043), where stricter type checking surfaces
implicit `any` parameters. Annotate the wait/then callback as
`{ status: string }` and the catch callback as `unknown`.

* refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper

The inline if/wait/then/catch block in POST /api/runs read as a bolt-on
patch. Lift it to a named file-scope helper so the route handler stays
intent-level (start the run, arrange follow-up reconciliation) and the
guard for missing assistantMessageId is an internal detail.

The helper's docblock describes the invariant ("messages row reflects
the run's terminal state even without web persist"); commit history
keeps the issue context.

* test(e2e): wait for any terminal status in stop-reconcile spec

The earlier .catch fallback chained two waitForRunStatus calls (canceled
then succeeded). waitForRunStatus throws on the first non-expected
terminal, so a canceled run that resolves to failed (e.g. agent exits
non-zero on SIGTERM) would still abort the test before reaching the
messages-row assertion.

Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any
terminal status without throwing on mismatch, since this spec's claim
is about the resulting messages row, not which terminal the run took.

Addresses Codex inline review on PR #1230.
2026-05-11 15:37:52 +08:00
nettee
ab922327f4
refactor(daemon): split agent runtime definitions (#1063) 2026-05-11 15:01:55 +08:00
nettee
b1d440d2bd
refactor(daemon): split route registration (#1043)
* spec

* refactor(daemon): split server route registrars

* refactor(daemon): group route registrar dependencies

* refactor(daemon): move remaining domain routes out of server

* update doc

* revert spec

* fix daemon route context contract

Generated-By: looper 0.5.6 (runner=fixer, agent=opencode)

* fix media task persistence

Generated-By: looper 0.5.6 (runner=fixer, agent=opencode)

* fix: restore daemon route registrations

* fix: restore static resource mutation origin checks
2026-05-11 15:00:23 +08:00
PerishFire
976edaf38e
test: harden e2e smoke and release reports (#1140)
* test: harden e2e inspect specs

* test: wire e2e release reports

* chore: bump packaged beta base to 0.6.1

* test: run release smoke vitest directly

* test: add suite-owned tools-dev lifecycle

* ci: harden stable release packaging

* fix(release,e2e): gate stable signing on verify and harden suite cleanup

- restore `needs: [metadata, verify]` on the stable release `build_mac`,
  `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple
  signing/notarization and Windows release builds cannot run before
  pnpm guard, typecheck, and layout checks complete on the metadata commit.
- in `runToolsDevSuite`, drop the `started` flag and always attempt
  `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and
  when the test body succeeded, escalate the stop failure to the suite
  result and rethrow — so orphan daemon/web processes from an interrupted
  `startToolsDevWeb` or a broken shutdown can no longer pass silently.

Addresses PR #1140 review feedback from lefarcen and mrcfps.
2026-05-11 13:11:16 +08:00
Sid
1dc0224599
fix(desktop): enforce minimum window size on main client (#1189) (#1203)
The main BrowserWindow was created with only `width: 1280, height: 900`
and no `minWidth` / `minHeight`, so Electron honored arbitrary user
drags. Past roughly 900×600 the project page's left/right split (chat
composer + designs panel + preview pane) overlaps and the top
navigation clips, which is the broken first impression reported in
#1189.

Pin `minWidth: 900, minHeight: 600` on the main window — preserves the
usable layout floor while still fitting common 13" small-screen
laptops. The ephemeral print sub-window (`show: false`, closed on
print completion) is unchanged: it isn't user-resizable so a min-size
floor has no observable effect there.
2026-05-11 12:33:47 +08:00
shangxinyu1
b19aa6c907
Improve Codex CLI path fallback UX (#1205)
* Improve Codex CLI path fallback UX (#1193)

* Handle ENOENT Codex shim fallback
2026-05-11 12:00:47 +08:00
Botshelo Brandon Tidimalo
979733d39b
feat(web): add Cmd+, shortcut to open settings with platform shortcut badge (#1173)
Register a capture-phase Cmd+, (mac) / Ctrl+, (win/linux) listener in App.tsx that opens Settings, and show a shortcut badge on the Settings menu item in both AvatarMenu and EntryView. Extract the duplicated isMac platform check into a shared isMacPlatform() utility in utils/platform.ts, replacing inline copies in FileWorkspace and ProjectView as well.
2026-05-11 11:43:57 +08:00
Nicholas-Xiong
2838a28585
fix: set writable OD_DATA_DIR default for nix run (#1159)
Fixes #1157

When running via 'nix run github:nexu-io/open-design', the daemon
attempted to create runtime state under the Nix store package path:

  /nix/store/.../lib/open-design/.od/projects

The Nix store is read-only at runtime, causing startup to fail with
ENOENT when mkdir() tried to create the projects directory.

This commit updates the nix run wrapper to export OD_DATA_DIR with
a writable default ($HOME/.od) when the variable is unset. Users
can still override it by setting OD_DATA_DIR before running.

The Home Manager and NixOS modules already set OD_DATA_DIR, so they
are unaffected by this change.
2026-05-11 10:52:53 +08:00
github-actions[bot]
d3b1804523
docs(readme): refresh contributors wall (#1188)
Co-authored-by: mrcfps <23410977+mrcfps@users.noreply.github.com>
2026-05-11 10:50:30 +08:00
github-actions[bot]
12708fd379
Update docs/assets/github-metrics.svg - [Skip GitHub Action] (#1183)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-11 10:50:16 +08:00
shangxinyu1
d45bf3fb9a
test: expand entry and settings automation coverage (#954)
* test: harden new project panel metadata coverage

* test: expand entry e2e coverage

* test: drop e2e docs from the guarded package

* test: cover examples gallery interactions

* test: cover examples preview modal actions

* test: cover examples preview escape fullscreen

* test: cover examples template prompt filtering

* test: cover updated settings and entry tabs

* test: fix entry/settings coverage type drift

* test: fix example preview fetch assertion

* test: fix new project panel skill fixture
2026-05-11 10:49:42 +08:00
Nagendhra Madishetti
32fa0c23bb
feat(daemon): Critique Theater Phase 6.2 (artifact extraction + endpoint) (#1085)
The orchestrator was leaving artifactPath = null on every shipped run because
the SHIP <ARTIFACT> body never made it past the parser. Reviewers caught this
on PR #1006: a rerun-style endpoint built on top of that null could not return
a usable prior-art reference, and tests that synthesized artifactPath via
insertCritiqueRun were hiding the gap rather than covering the feature.

This PR closes that gap. The parser now hands the orchestrator a
ShipArtifactPayload (round, mime, body) through a side-channel callback, and
the orchestrator writes the bytes to <artifactsDir>/<projectId>/<runId>/
artifact.<ext> via a new artifact-writer module. The row's artifactPath is
the absolute on-disk path. The web layer never sees that path: it fetches
the bytes through GET /api/projects/:projectId/critique/:runId/artifact,
which the new artifact-handler module serves with a mime-derived
Content-Type, X-Content-Type-Options: nosniff, a CSP header for HTML and
SVG, and the same cross-project leak guard pattern the interrupt handler
uses.

The body and mime intentionally never travel on the SSE wire. The SHIP
PanelEvent (which doubles as the SSE payload shape) keeps its lightweight
artifactRef, and the orchestrator strips body/mime before bus.emit, so a
multi-megabyte artifact does not broadcast to every subscriber. The new
orchestrator test asserts this explicitly.

Defense in depth in the writer + handler:

  - mime allowlist with text/html, text/css, text/markdown, text/plain,
    application/json, image/svg+xml; everything else falls through to
    application/octet-stream + .bin so unknown payloads can't be
    misinterpreted as a known type;
  - UTF-8 byte-length cap, configurable via cfg.parserMaxBlockBytes, so
    multi-byte payloads can't sneak past a JS .length check;
  - atomic write through a sibling tmp file + rename so a daemon crash
    mid-write can't leave a half-written artifact under the canonical
    name;
  - path-traversal guard on the GET endpoint that resolves the row's
    artifactPath against the artifacts root and refuses anything that
    escapes it, refuses non-regular files (symlinks, dirs), and refuses
    files larger than the response cap.

Folded in two non-blocking notes lefarcen left on PR #1016 (the contracts
move) since persistence.ts was already in scope here:

  - P2: introduced CritiquePersistedStatus = CritiqueRunStatus | 'running'
    in the contracts package. CritiqueRunRow.status and CritiqueRunInsert.
    status now use it, and the inline `as CritiqueRunStatus | 'running'`
    widen in interrupt-handler.ts is gone. Public DTOs continue to use the
    terminal-only CritiqueRunStatus so a future endpoint can't leak a
    'running' row through the wire.
  - P3: added AssertExhaustiveValues + a compile-time assertion that
    CRITIQUE_RUN_STATUSES covers every CritiqueRunStatus variant.
    Adding a value to ShipStatus or CritiqueRunStatus without updating
    the array now fails the build with a tuple naming the missing
    variants instead of silently dropping out of UI filters.

Coverage: 174 critique tests across 14 files pass locally, including the
new critique-artifact-writer (13 cases) and critique-artifact-endpoint
(11 cases) suites, the inverted critique-lifecycle artifact-persistence
test, and the orchestrator happy-path that asserts the SSE ship payload
does NOT carry body or mime.

Validated: pnpm guard, pnpm --filter @open-design/contracts build,
pnpm --filter @open-design/daemon build (full tsc), pnpm --filter
@open-design/web typecheck, pnpm --filter @open-design/daemon exec
vitest run tests/critique (all green).

This is step (b) of the four-step plan that PR #1006's closing comment
laid out. Step (a) was the contracts move in PR #1016. Steps (c)
(persist original_message_id / agent_id / model_id) and (d) (real
rerun endpoint on top of (a)+(b)+(c)) follow.

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-10 23:59:04 +08:00
Matt Van Horn
976a5900f8
fix: clear stale upload failure banner when previewing files (#797)
* fix: clear stale upload failure banner when previewing existing files

Closes #786

- Clear uploadError in openFile() so navigating to a file dismisses the banner
- Scope banner visibility to the Design Files tab so stale errors do not bleed
  into preview surfaces
- Add test pinning that no banner is rendered when there is no upload error

* fix(workspace): move upload banner into DesignFilesPanel + interactive test

Per @mrcfps + @lefarcen review on PR #797:

- Move the upload-error banner from FileWorkspace into DesignFilesPanel
  body. Hide it whenever the in-panel preview is active (the missed
  flow that mrcfps and lefarcen flagged: single-click preview kept
  activeTab on DESIGN_FILES_TAB, so the old guard left the banner
  mounted above the preview).
- Keep a fallback banner in FileWorkspace that fires only when
  activeTab is not Design Files. This preserves the partial-upload
  visibility flagged by chatgpt-codex-connector: a partial upload
  opens the last successful file (flipping activeTab to a viewer)
  and the failure note still surfaces.
- Wrap uploadProjectFiles in try/catch so thrown errors surface a
  banner instead of disappearing.
- Replace the brittle viewer-empty assertion with two interactive
  vitest cases: (1) mock-fail upload, banner visible, preview file,
  banner hidden, close preview, banner back, dismiss, banner gone;
  (2) partial-upload uploaded+failed, banner appears on the viewer
  surface with the existing 'Uploaded N file(s), but M failed' text.
- Add df-upload-banner class and stable test ids upload-error-banner
  and upload-error-dismiss so future tests don't rely on the
  generic viewer-empty class.

Closes #786 staleness; addresses follow-up review.

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-10 23:56:24 +08:00
Yuhao Chen
35e7b622b7
fix(web): allow pod-to-chat comment text to wrap instead of truncating (#793) (#1156) 2026-05-10 23:27:06 +08:00
Zihuailin
06e677cb72
Fix pending prompt clearing for templates (#1148) 2026-05-10 21:52:49 +08:00
code-Y
84f768d4a2
feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug (#1083)
* feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug

- Add WeChat design system (design-systems/wechat/) with full brand spec
  including color palette, typography, and component rules for chat UI
- Add login-flow skill (skills/login-flow/) for mobile authentication flows
  with P0 checklist, example HTML, and i18n registration across 3 locales
- Fix DeepSeek V4 bug: API/BYOK mode (streamFormat=plain) models now receive
  a directive to emit only <artifact> HTML blocks and suppress tool_calls,
  since plain adapters proxy to external providers that cannot execute tools

* fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd commit

Restore files that were corrupted in PR #1083 head branch.
The WeChat DESIGN.md was reduced to a single line (filename only)
and server.ts was reduced to ~1 line. Both are restored to their
original ad46d8cd state with full content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd

Restore files corrupted in PR #1083:
- apps/daemon/src/server.ts: restored 7106-line file
- design-systems/wechat/DESIGN.md: restored 301-line WeChat design spec
- skills/login-flow/SKILL.md: restored from local working state
- skills/login-flow/example.html: restored 351-line example HTML

* fix: only suppress tool_calls when streamFormat='plain' explicitly, remove nonexistent assets/template.html

1. streamFormat check now requires explicit 'plain' value instead of defaulting
   to 'plain' when undefined. This prevents normal tool-using chat runs from
   incorrectly inheriting the API/BYOK tool_calls suppression rule.

2. login-flow SKILL.md: removed reference to assets/template.html since that file
   does not exist in the skill bundle and derivePreflight() would inject a hard
   instruction to read it before any other tool, causing pre-flight to fail.

* fix: thread streamFormat to composeSystemPrompt in server.ts call

Previously the composeSystemPrompt call at line ~4940 omitted streamFormat,
causing the composer to default to 'plain' and suppress tool_calls even
for tool-using chat runs. Now streamFormat is passed through from the
adapter definition so the API mode rule only fires when streamFormat='plain'
is explicitly set.

* fix: WeChat category metadata, font-family, and login-flow example interactivity

WeChat DESIGN.md:
- Add Category: Social & Messaging metadata so it appears correctly in picker
- Fix font-family declaration: remove invalid -webkit-font-family prefix,
  use standard font-family so downstream CSS generation works correctly

skills/login-flow/example.html:
- Add password toggle click handler so show/hide actually works
- Change Apple icon fill from hardcoded #fff to currentColor so it is
  visible on light backgrounds

* fix: mirror streamFormat suppression in contracts composer and add WeChat i18n

1. packages/contracts/src/prompts/system.ts: Add streamFormat parameter to
   ComposeInput and ComposeInput interface, mirroring the same suppression
   rule from daemon prompts/system.ts. When streamFormat='plain' is passed,
   a directive is appended telling models not to emit tool_calls and to only
   output <artifact> HTML blocks.

2. apps/web/src/i18n/content.{ts,fr,ru}.ts: Add WeChat design system entries:
   - Add 'wechat' to DE/FR/RU_DESIGN_SYSTEM_IDS_WITH_EN_FALLBACK arrays
   - Add 'wechat' summary to DE/FR/RU_DESIGN_SYSTEM_SUMMARIES
   - Add 'Social & Messaging' category to DE/FR/RU_DESIGN_SYSTEM_CATEGORIES
     (matching the Category: Social & Messaging metadata in WeChat DESIGN.md)

* fix: thread streamFormat='plain' into web composeSystemPrompt for api mode

* test: focus localized content coverage on missing resources

---------

Co-authored-by: Open Design Contributor <z@open-design.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-10 20:38:33 +08:00
Dongsen
bfedbeca0f
fix(prompts): add "When NOT to emit" guardrail to artifact handoff (#1143) (#1145)
* fix(prompts): add "When NOT to emit <artifact>" clauses (#1143)

#1143 现场报告:当本轮只用 Edit 工具修改已有 HTML 文件、没有写出新
canonical HTML 时,AI 仍按 system prompt 的 "non-negotiable output
rule" 字面收尾,把一句中文总结塞进 `<artifact type="text/html">` 块
里。下游持久化路径会把这一句 prose 当合法 HTML 落盘,污染项目文件
面板(截图见 #50 评论)。

根因在 prompt 缺少免发条款。本次修改:

- 把 "Artifact handoff (non-negotiable output rule)" 改为条件化措辞
  "Artifact handoff" + "When you ship a fresh deliverable…"
- Workflow step 5 ("Finish") 增加 in-place edit 不发 artifact 的分支
- 新增 "When NOT to emit `<artifact>`" 子段,明确三条免发条件:
  - in-place edits only:本轮无新 canonical HTML 产出 → 直接说改了
    哪个文件、改了什么
  - body 必须是完整 `<!doctype html>` 文档 → 总结/路径/bash 输出/
    说明用普通回复,不要包标签
  - 拿不准就别发 → 重发未变 artifact 没价值,发空壳 artifact 反而
    误导用户、污染面板

测试:apps/daemon/tests/prompts/system.test.ts 新增 describe 块
"artifact handoff no-emit clauses (#1143)" 4 例,断言 composed prompt
含必要短语。

#50 持久化层兜底(pre-write HTML gate)由 #1144 单独跟进,与本 PR
互补:本 PR 让 AI 不去发空壳 artifact,#1144 在写盘前再挡一道,
即使 prompt 失守也不会污染项目面板。

* fix(prompts): 让 discovery 主导层也支持 artifact 免发例外 (方案 C)

review (lefarcen P1, mrcfps blocker): base 层新增的 "When NOT to emit
<artifact>" 例外被更高优先级的 DISCOVERY_AND_PHILOSOPHY 层中的无条件
emit 指令盖过去,导致 #1143 主路径仍可能产出空壳 artifact。

按 review 中的方案 C 修补:
- discovery.ts RULE 3 之前新增 "Artifact emission is conditional"
  主导层不变式段落(条件式 emit 在主导层声明一次,base 层保持详细规则)
- discovery.ts:17 arc 注释 / :143 plan 模板 step 9 / :262 default arc
  recap 全部改为条件式(仅在本轮写出新 canonical HTML 时 emit)
- deck-framework.ts:327 deck workflow step 7 同步改为条件式

测试加 2 条断言:
- 负断言:组装后 prompt 不再含未限定的 "Emit single <artifact>" /
  "emit a single <artifact>." 行
- 正断言:discovery 层包含 "only when this turn wrote a new canonical
  HTML" 与 "only edited an existing HTML file" 等价表述

* test(prompts): 补 deck-mode 负断言覆盖 deck-framework.ts:327

review (lefarcen P2): 上一轮的负断言用 composeSystemPrompt({}) 调用,
不会触发 DECK_FRAMEWORK_DIRECTIVE 的拼接(仅在 skillMode === 'deck' 或
metadata.kind === 'deck' 时追加)。如果 deck-framework.ts:327 后续回
退到 "Emit single <artifact>",无参负断言依然假绿。

补一条显式的 deck-mode 断言:
- 负断言:deck-mode prompt 不含未限定的 "7. Emit single <artifact>" 行
- 正断言:含本次改的 "Emit single <artifact> if a new canonical deck HTML"
2026-05-10 20:28:22 +08:00
Dongsen
7c1db80893
fix(web): 写盘前拦截 prose-as-HTML artifact (#50) (#1144)
* fix(web): reject prose-as-HTML artifacts at write time (#50)

AI 偶尔会在仅做 in-place 编辑(无新 canonical HTML 产出)时仍按
system prompt 的非协商性收尾规则发出 `<artifact type="text/html">`
块,但块内只装一句中文总结。`persistArtifact` 之前不做内容校验,
此类 prose 会作为合法 HTML 落盘到 `.od/projects/<id>/<id>.html`,
并附带 `kind: html` manifest,污染项目文件面板(截图见 #50 评论)。

新增 `validateHtmlArtifact` 纯函数:要求非空 + 长度 ≥64 + 含
`<!doctype html>` 或 `<html>` 标签(大小写不敏感、容忍 BOM)。
`persistArtifact` 在 `ext === '.html'` 分支调用 gate,失败时
通过 `setError` 报错且不写文件。

scope 限于 `<artifact>`-tag 持久化路径——FileViewer/FileWorkspace
里用户手动保存草稿 HTML 走的是不同入口,不受影响。

prompt 层根因(缺少免发条款)已拆出 #1143 单独跟进,本 PR 是
持久化层的兜底防御。

* fix(web): anchor HTML structural check at content start (#1144 review)

mrcfps 在 #1144 review 指出原实现的 false negative:
HTML_OPENING_TAG_RE / DOCTYPE_RE 用 .test() 在整个字符串里搜,
所以 AI 描述改动时 inline 引一个 tag 名("Updated the <html lang>
attribute...")就能蒙混过关——长度过 64、含 `<html `——同样
落地为幽灵 HTML 文件。

修复:合并两个 regex 成 STARTS_WITH_DOCUMENT_RE,加 ^ anchor,
要求 trimmed 内容的首个非空白 token 必须是 `<!doctype html>`
或 `<html`。Mid-string 出现的 tag 名不再算数。

同时按 lefarcen 的非阻塞文档建议把 docblock 改写得更精确:
- "structural sniff" 替代 "validation",明确不是 HTML 校验器
- 列出 not-a-linter / .jsx-tsx-skipped / 用户手动保存路径不受
  影响 三条 scope 边界
- 64 字符阈值会拒收 49 字符的最小空 doc(如
  `<!doctype html><html><body></body></html>`),明确这是有意
  trade-off:AI 产出预期是 non-trivial deliverable

新增 3 例测试覆盖 mrcfps 描述的 false negative:
- 长 prose 中 inline 引 `<html lang>` 应拒收
- 长 prose 中 inline 引 `<!doctype html>` 应拒收
- 首个 token 是 `<p>` 等非文档标签的 fragment 应拒收
2026-05-10 20:22:48 +08:00
lefarcen
93a08689e4
fix(web): truncate entry footer pet label (#1150) 2026-05-10 19:45:39 +08:00
Sid
e948405c22
fix(web): surface connector auth errors and stop silent popup close (#725) (#1128)
* fix(web): surface connector auth errors and stop silent popup close (#725)

Two layered bugs caused the "Twitter Connect button does nothing" symptom:

1. ConnectorsBrowser dropped result.error from connectConnector. On
   Electron desktop the popup is never opened (electronAPI.openExternal
   path), so the existing renderConnectorAuthError(null, ...) was a
   no-op and the user got zero feedback.

2. registry.ts silently called authWindow?.close() whenever the connect
   response did not carry { kind: 'redirect_required', redirectUrl },
   leaving web users with a popup that vanishes without explanation.

Patch:
- Add a per-connector connectorAuthorizationError state and render it
  as an inline banner on both ConnectorCard and ConnectorDetailDrawer
  (mirrors the existing cancel-failed pattern; reuses the existing
  .connector-authorization-error styling).
- Replace authWindow?.close() with a renderConnectorAuthInfo helper
  that branches on auth.kind ('connected' | 'pending' | unknown) and
  writes an explanatory message to the popup before the user closes it.
- Tests: 1 registry test for the pending/info popup branch, 2
  ConnectorsBrowser tests for surfacing and clearing the inline banner.

* fix(web): clear connector auth error on background status refresh

Addresses review feedback from @mrcfps and the Codex bot on PR #1128:
the inline error banner stayed visible even after background status
refresh marked the connector as `connected` (e.g. user completes auth
out-of-band through the Composio dashboard, then focus/poll/message
refresh observes the connection).

- Add clearConnectorAuthorizationErrorsForConnected helper next to the
  existing pending-state helpers; same shape, returns the same object
  reference when nothing changes so React skips a re-render.
- Wire it into reloadConnectorStatuses so every status refresh path
  (pending poll, focus, OAuth callback message) drops stale errors for
  any connector now reported as connected.
- Add 2 unit tests for the helper next to the existing pending-state
  helper tests in EntryView.test.ts.
2026-05-10 19:38:18 +08:00
Priyanshu Kayarkar
eabf3a6e86
feat: add collapsible MCP JSON field-mapping helper (#1136)
* feat(web): add collapsible MCP JSON helper component

* feat(web): add collapsible MCP JSON field-mapping helper

* test(web): add McpJsonHelper component tests for toggle behavior

* fix(web): scope helper id per row and show helper

* test(web): rewrite McpJsonHelper tests to use row-scoped ids

* feat(mcp): use stable _localId for McpRow keys and aria-controls\n\n- Add _localId to DraftRow and genLocalId()\n- Use _localId as React key and helper id to avoid duplicate DOM ids\n- Move helper outside transport branches so helper is visible for all transports\n- Fix malformed template.homepage anchor

* fix(web): restore _localId-scoped helperId and helper visibility for all transports

* test(web): replace integration test with _localId-scoped helper tests

* test(web): exercise McpJsonHelper via production McpClientSection in jsdom

* fix(web): resolve typecheck errors

* test(web):expand rows before querying helper toggles to fix timeout
2026-05-10 19:37:46 +08:00
Jie Zhu
1f625cff77
fix(i18n): translate comments panel UI to Chinese (#1139)
The comments panel in the project page left sidebar was missing Chinese
translations for all UI strings. Users with Chinese language settings would
see English text in the comments section, which created an inconsistent
experience.

This commit adds complete translations for:
- Comment section titles (attached/saved comments)
- Action buttons (add/remove/add all)
- Empty state messages
- Comment placeholder text
- Attachment-related labels

Both simplified Chinese (zh-CN) and traditional Chinese (zh-TW) locales
are updated to provide full Chinese language support for the comments
feature.
2026-05-10 19:37:22 +08:00
Jie Zhu
602cf704e2
fix(web): center close button in MCP picker dialog (#1137) 2026-05-10 15:32:58 +08:00
郭一通
13005f4fea
fix(desktop): allow about:blank popup for PDF export fallback (#1081)
The renderer's PDF export fallback uses window.open('', '_blank')
to open a blank window that is then navigated to a Blob URL.
Electron's setWindowOpenHandler only allowed blob: and od: protocols,
so about:blank was denied and the user saw a "Popup blocked" alert.

Fix: add about:blank to the allowed child window URL whitelist.

Co-authored-by: Ken <hitken@users.noreply.github.com>
2026-05-10 12:21:15 +08:00
Arya Kaushal
9079c51ba3
feat(daemon): HTTP 206 range request support for video/audio files Fixes #784 (#1105)
* feat(daemon): HTTP 206 range request support for video/audio files (#784)

Stream video and audio via fs.createReadStream with Accept-Ranges: bytes
and 206 Partial Content responses so browsers can play and seek media
inline. Non-media files keep the existing buffer path unchanged.

Add parseByteRange (RFC 7233-compliant) and resolveProjectFilePath to
projects.ts, and 23 unit tests covering all range edge cases.

* fix(daemon): move range streaming to /raw/* route used by media viewers

The inline VideoViewer and AudioViewer components fetch
/api/projects/:id/raw/* (via projectRawUrl), not /files/*.
Apply the HTTP 206 / Accept-Ranges streaming path to the raw route
while preserving its Origin: null CORS behaviour for sandboxed iframes.

Add 7 route-level HTTP tests against a real startServer() instance
covering 200 full, 206 partial, suffix range, open-ended range, 416
unsatisfiable, non-media passthrough, and 404 cases.

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-10 12:16:52 +08:00
Nicholas-Xiong
31f89f74fd
fix: remove Trump pet from bundled community pets (#1103)
Fixes #1042

Problem:
The Trump pet was included in the bundled community pets catalog, which
appeared in the Built-in pet adoption picker. This raised concerns about
keeping politically-charged content in the default pet selection.

Solution:
- Removed the trump pet directory from assets/community-pets/
- Removed 'trump' from the BUNDLED_PETS list in bake-community-pets.ts

The pet is still available on Codex Pet Share for users who want to
download it manually, but it no longer ships as a built-in option.

Impact:
- Trump pet no longer appears in the default pet adoption picker
- Users can still access it via "Download community pets" if desired
- Keeps the built-in pet selection neutral and welcoming

Related:
- PR #850 (previous attempt that was closed without merging)
2026-05-10 12:11:00 +08:00
Yuhao Chen
6f2584e315
fix(web): prevent chat messages from overflowing into workspace area (#662) (#1104)
Add overflow-x: hidden to .chat-log so any horizontally overflowing content
(thinking blocks, status pills, form cards) is clipped inside the chat pane
instead of spilling into the workspace area.

Add min-width: 0 and max-width: 100% to .msg so flex items in the chat log
column cannot expand beyond the panel width when their intrinsic content is
wider than the container.

Add min-width: 0 to .assistant-flow to prevent the intermediate flex
container from propagating intrinsic content width up to the message boundary.

This complements the existing overflow: hidden on .pane and .split-chat-slot
from #740 by also constraining the intermediate flex items that can propagate
width from deeply nested content up to the scroll container boundary.
2026-05-10 11:58:49 +08:00
soulme
cbb3c0e33a
Improve design files grouping (#1082)
Add a modified-date grouping mode to make busy design workspaces easier to scan as generated files accumulate. The new view keeps existing batch actions and pagination available, adds localized labels, and covers date boundaries with component tests.
2026-05-10 11:55:34 +08:00
bojie.hbj
bb578b3dca
fix: Support OpenCode Write tool display as card (#1126)
The Write tool from OpenCode AI wasn't being displayed correctly as a card. This fix addresses two issues:

1. Tool name normalization: Added support for lowercase 'write' in addition to 'Write'
2. Field naming normalization: Added support for camelCase 'filePath' in addition to snake_case 'file_path'

Changes made:
- Added `normalizeToolInput()` function in daemon.ts for root-level field normalization
- Updated ToolCard.tsx to recognize both tool name variants and field naming conventions
- Updated AssistantMessage.tsx for tool name recognition
- Updated ProjectView.tsx for file path parsing in auto-open feature

This ensures consistent behavior across different AI providers regardless of their tool naming conventions.
2026-05-10 11:49:00 +08:00
Nicholas-Xiong
29e5732f44
fix: prevent design system filter popover from shifting position on reopen (#960)
* fix: prevent design system filter popover from shifting position on reopen

Fixes #921

The design system filter popover was repositioning incorrectly when reopened
after filtering, sometimes appearing too high and becoming partially hidden
at the top of the viewport.

Root cause:
- The popover uses position: absolute with top: calc(100% + 6px)
- When filtering reduces the number of items, the popover height shrinks
- On reopen, the reduced height can cause the popover to appear higher
  than expected, especially if the trigger button is near the top

Solution:
- Added min-height: 120px to .ds-picker-list
- This ensures the popover maintains a consistent minimum height
- Prevents position shifts when content is filtered
- The popover stays anchored correctly to its trigger

The 120px minimum provides enough space for ~3-4 items while keeping
the popover stable across filter state changes.

* fix: scope min-height to design-system picker only

The .ds-picker-list class is shared by multiple pickers:
- NewProjectPanel prompt-template picker
- SettingsDialog MCP client picker
- NewProjectPanel design-system picker

Adding a global min-height: 120px would affect all pickers,
causing unnecessary blank space when they have few items.

This adds a dedicated .ds-picker-list-design-systems modifier
class to scope the min-height fix to only the design-system
picker, which is the one affected by the position-shift bug.
2026-05-10 11:47:40 +08:00