open-design/apps/web/tests/components/pluginFolderActions.test.ts
lefarcen c617e30e27
fix(plugins): make Publish repo actually create the author's repo (#2332) (#2363)
* fix(plugins): make Publish repo actually create the author's repo (#2332)

QA repro from 0.8.0 preview: clicking "Publish repo" on a generated
plugin's `DesignFilesPanel` card ran the agent down a path that
produced an Open Design registry-submission URL but never created the
author's GitHub repo. After the action finished, `gh repo view
nuomi/cat` still returned 404 and `git ls-remote
https://github.com/nuomi/cat.git HEAD` failed with "Repository not
found".

Root cause is the action prompt at
`apps/web/src/components/design-files/pluginFolderActions.ts:11-12`:

  publish: 'Use the supported `od plugin publish` or
            repository-publish flow after confirming the manifest.'

That sentence let the agent pick the legacy registry-link CLI (`od
plugin publish --to open-design`), which mirrors the path the
"Open Design PR" button takes and emits an issue URL instead of
creating a public repo. The button label said "Publish repo" but the
behavior collapsed onto the registry-submission flow.

This PR rewrites the `publish` prompt in the same shape PR #2182
used for `contribute` — a numbered gh + git sequence that drives the
real action end-to-end:

  1. Pre-flight `gh --version` / `gh auth status`. Invalid / expired
     tokens are treated the same as not-logged-in (the bug-report
     agent kept going past an "invalid token" warning).
  2. Read manifest, capture `name`, `version`, `description`,
     `plugin.repo`. Fall back to `https://github.com/<gh-login>/<name>`
     when `plugin.repo` is missing and write it back into the
     manifest.
  3. `gh repo view <owner>/<name>` to decide create-vs-update.
  4a. Repo does not exist → `git init` + commit + tag +
      `gh repo create <owner>/<name> --public --source . --push`.
  4b. Repo exists → reuse the remote, `git add -A` + `git commit -m
      "Update: <name> v<version>"` (skip if working tree is clean),
      `git tag v<version>` (skip if already published), `git push`.
  5. Verify with `gh repo view <owner>/<name> --json url,nameWithOwner`.
  6. Hand off the resolved `https://github.com/<owner>/<name>` URL to
     chat. End the turn.

Hard constraints encoded in the prompt:

* Do NOT call `od plugin publish --to open-design` (or any `--to
  <catalog>` variant). That is the registry-submission flow.
* Do NOT call `AskUserQuestion` — fire-and-forget, same as the
  `contribute` flow's stall fix.
* Do NOT auto-install gh/git. Detect-and-instruct only.
* Do NOT force-push or overwrite a published tag.
* Do NOT retry a failed step. Report and stop.

Refactor: pulled `publish` out of the shared `ACTION_TITLES`/
`ACTION_NOTES` template into its own `buildPublishPrompt(folderPath)`
function (mirrors `buildContributePrompt` from PR #2182). `install`
keeps the simple shared template — that action stays inferrable from
the manifest and doesn't need the same blast radius.

Tests:

* `apps/web/tests/components/pluginFolderActions.test.ts` — extends
  the existing contract suite with seven new `publish` assertions:
  targets `plugin.repo` not the registry catalog, drives the full gh
  + git command list, handles both new-repo and existing-repo
  branches, explicit ban on the registry-submission CLI, hard-bans on
  AskUserQuestion / auto-install / force-push / retry, "invalid
  token" treated as STOP, `${folderPath}` interpolation guard, ends
  by handing the repo URL back to chat.

Validation:

* `pnpm --filter @open-design/web exec vitest run tests/components/pluginFolderActions.test.ts`
  → 16/16 passed (was 9/9 before this PR; +7 new publish-flow
  assertions, the old generic "mentions od plugin publish" assertion
  replaced with the precise contract above)

(Local `pnpm --filter @open-design/web typecheck` fails on
`tests/runtime/exports.test.ts` because `packages/host/dist/testing`
isn't built in this checkout — pre-existing breakage from
`2c128e0e refactor desktop host bridge` on main, unrelated to this
prompt change. CI runs a fresh install and was green on the four
previous prompt-only PRs that touched the same module.)

Closes #2332.

* fix(plugins): don't assume standalone jq when reading the manifest

QA repro from the Open Design PR button (transcript shared with the
PR #2363 thread): the agent reached step 2 of the contribute prompt,
ran `jq '{name,title,description,version}' generated-plugin/open-design.json`,
got `zsh:1: command not found: jq`, and stopped per the prompt's
"stop on first hard failure" rule. No fork, no branch, no PR.

jq is not part of the OD agent runtime baseline — default macOS and
Windows shells don't ship it. The agent reached for it first because
"jq" is the default JSON tool in claude/codex's shell training
distribution, not because the prompt asked for it. The prompt just
said "Load and capture", which the agent interpreted as "shell out to
the most common JSON parser".

Updates both step-2 instructions (contribute + publish prompts) to:

  - List portable manifest-read alternatives in priority order:
    the built-in Read tool (always available); `cat` + manual JSON
    parsing; `node -e 'JSON.parse(...)'` as the shell-only fallback.
  - Add an explicit "Do not assume the standalone `jq` binary is
    installed" guard with the macOS / Windows shell rationale.
  - Disambiguate the standalone `jq` CLI from `gh ... --jq`. The gh
    flag uses an embedded library and is fine — without the
    disambiguation the agent reads the ban literally and stops using
    `gh api user --jq .login` at step 3.

Tests:

* `apps/web/tests/components/pluginFolderActions.test.ts` — two new
  contract assertions:
  - publish prompt: warns against assuming standalone jq is
    installed; lists cat and node -e as alternatives. Closes the
    regression on its own surface.
  - shared block: both contribute and publish prompts disambiguate
    standalone jq from `gh ... --jq`. One assertion guards both
    flows so a future prose edit can't drop the carve-out on one
    side.

Validation:

* `pnpm --filter @open-design/web exec vitest run tests/components/pluginFolderActions.test.ts`
  → 18/18 passed (was 16/16 on the previous PR #2363 commit; +2 new
  jq-guidance assertions)

Continues PR #2363. Same source-of-bug shape as the registry-submission
fallback issue this PR was opened to fix: agent picks a tool the
prompt didn't actually ask for because the prompt was loose.
2026-05-20 15:38:29 +08:00

171 lines
8.3 KiB
TypeScript

// Contract test for the prompts the plugin-folder card buttons send to the
// agent. `install` uses the simple shared template; `contribute` drives the
// `gh repo fork → branch → commit → gh pr create --web` flow against
// `nexu-io/open-design`; `publish` drives `gh repo create / push` against the
// author's own `plugin.repo` URL. The tests below lock the *shape* of each
// prompt (keywords + folder interpolation) without coupling to exact wording,
// so prose tweaks don't break the suite but accidental removal of a critical
// step would.
import { describe, expect, it } from 'vitest';
import { buildPluginFolderAgentActionPrompt } from '../../src/components/design-files/pluginFolderActions';
const FOLDER = 'generated-plugin';
describe('buildPluginFolderAgentActionPrompt', () => {
describe('install', () => {
it('mentions the folder path and the supported install CLI', () => {
const prompt = buildPluginFolderAgentActionPrompt(FOLDER, 'install');
expect(prompt).toContain(`Plugin folder: \`${FOLDER}\``);
expect(prompt).toContain('od plugin install --source');
});
});
describe('publish (repo-publish flow)', () => {
const prompt = buildPluginFolderAgentActionPrompt(FOLDER, 'publish');
it('targets the author\'s plugin.repo, not the registry catalog', () => {
expect(prompt).toContain(`Plugin folder: \`${FOLDER}\``);
expect(prompt).toContain('plugin.repo');
expect(prompt).toMatch(/<owner>\/<name>/);
expect(prompt).toMatch(/repository-publish flow|repo URL|published code/i);
});
it('drives the full publish flow via gh + git', () => {
// The agent must drive raw gh/git commands so an actual public repo
// exists at the end of the turn. Regression guard for issue #2332,
// where the previous prompt let the agent fall back to `od plugin
// publish --to open-design` and never created the target repo.
expect(prompt).toContain('gh --version');
expect(prompt).toContain('gh auth status');
expect(prompt).toContain('gh repo view <owner>/<name>');
expect(prompt).toContain('gh repo create <owner>/<name> --public --source . --push');
expect(prompt).toContain('git push --tags');
});
it('handles both new-repo and existing-repo paths', () => {
// 404 → create + push; 200 → push to existing remote. Both branches
// must exist or the agent will silently skip one case.
expect(prompt).toMatch(/Could not resolve to a Repository|repo does not exist/i);
expect(prompt).toMatch(/repo exists/i);
expect(prompt).toMatch(/Create \+ push|Push to existing repo/i);
});
it('bans the registry-submission CLI explicitly', () => {
// The legacy CLI is what shipped the bug — without an explicit ban
// the agent had been routing back to it. The mention must be in a
// negative imperative ("Do NOT call …"), not a recommendation.
expect(prompt).toMatch(
/Do NOT (call|route through) `?od plugin publish --to open-design`?/i,
);
expect(prompt).toMatch(
/registry[- ]submission|registry-submission flow|Open Design PR/i,
);
});
it('hard-bans AskUserQuestion + auto-install + force-push + retry', () => {
expect(prompt).toContain('AskUserQuestion');
expect(prompt).toMatch(/fire-and-forget|do not call the `AskUserQuestion`/i);
expect(prompt).toMatch(/do not try to install/i);
expect(prompt).toMatch(/do not force-push|--force/i);
expect(prompt).toMatch(/do not retry/i);
});
it('treats invalid/expired tokens the same as not-logged-in', () => {
// Issue #2332 showed the agent attempting the publish even after `gh
// auth status` reported "token for shangxinyu1 is invalid". The
// prompt now treats that case as a hard stop instead of a soft warn.
expect(prompt).toMatch(/invalid\/expired token|invalid token/i);
expect(prompt).toMatch(/STOP/);
});
it('interpolates the actual folder path into manifest and cd steps', () => {
// Sanity check that template-string interpolation didn't regress into
// literal `${folderPath}` substrings.
expect(prompt).toContain(`${FOLDER}/open-design.json`);
expect(prompt).toContain(`cd ${FOLDER}`);
expect(prompt).not.toContain('${folderPath}');
});
it('ends by handing the repo URL back to chat', () => {
expect(prompt).toMatch(/Paste the resolved `?https:\/\/github\.com\/<owner>\/<name>`? URL into chat/i);
});
});
describe('contribute (PR-based flow)', () => {
const prompt = buildPluginFolderAgentActionPrompt(FOLDER, 'contribute');
it('targets the nexu-io/open-design community catalog', () => {
expect(prompt).toContain('nexu-io/open-design');
expect(prompt).toContain('plugins/community/<name>/');
});
it('drives the full PR flow via gh, not via the issue-URL CLI', () => {
// The agent must drive raw gh commands rather than fall back to the
// legacy `od plugin publish --to open-design` issue-URL launcher.
expect(prompt).toContain('gh repo fork nexu-io/open-design');
expect(prompt).toContain('gh repo clone');
expect(prompt).toContain('git checkout -b plugin/');
expect(prompt).toContain('gh pr create');
// The legacy CLI is named in the prompt only as part of an explicit
// ban ("Do NOT call the legacy `od plugin publish --to open-design`")
// — verify the ban is in place, not the bare command.
expect(prompt).toMatch(/do not call the legacy `od plugin publish --to open-design`/i);
});
it('uses --web so the author confirms the PR in browser', () => {
// The "author keeps the final review click" invariant — preserved from
// 45f52d71's "We never POST anywhere" principle.
expect(prompt).toContain('--web');
expect(prompt).toMatch(/do not auto-submit/i);
});
it('hard-bans AskUserQuestion to avoid 600s mid-turn stalls', () => {
// Regression guard for the stall we observed during e2e: agent paused
// mid-turn on an AskUserQuestion tool waiting for a host answer the
// user never sent (they clicked the plugin-folder card instead).
expect(prompt).toContain('AskUserQuestion');
expect(prompt).toMatch(/do not call the `AskUserQuestion` tool|fire-and-forget/i);
});
it('forbids the agent from installing tools or retrying failures', () => {
expect(prompt).toMatch(/do not try to install/i);
expect(prompt).toMatch(/do not retry/i);
});
it('interpolates the actual folder path into manifest and copy steps', () => {
// Sanity check that template-string interpolation didn't regress into
// literal `${folderPath}` substrings (we already shipped that bug once).
expect(prompt).toContain(`${FOLDER}/open-design.json`);
expect(prompt).not.toContain('${folderPath}');
});
it('ends by handing the PR URL back to chat', () => {
expect(prompt).toMatch(/PR URL|pull\/new|paste it into chat/);
});
it('warns the agent against assuming standalone jq is installed', () => {
// QA hit this: agent ran `jq '{name,title,...}' generated-plugin/open-design.json`
// at step 2 and stopped with `zsh:1: command not found: jq` before
// even reaching the fork step. The prompt now lists portable
// alternatives (Read / cat / node -e) and bans the assumption.
expect(prompt).toMatch(/Do not assume the standalone `jq` binary is installed/);
expect(prompt).toMatch(/cat .*open-design\.json/);
expect(prompt).toMatch(/node -e/);
});
});
describe('jq guidance shared between contribute and publish', () => {
it('disambiguates standalone jq from gh\'s built-in --jq flag', () => {
// gh ships its own jq library, so `gh ... --jq` is fine — that's
// what RULE step "Resolve author identity" uses. The ban must
// single out the brew-installed standalone binary, otherwise the
// agent will read the ban literally and stop using gh's flag too.
const contributePrompt = buildPluginFolderAgentActionPrompt(FOLDER, 'contribute');
const publishPrompt = buildPluginFolderAgentActionPrompt(FOLDER, 'publish');
for (const prompt of [contributePrompt, publishPrompt]) {
expect(prompt).toMatch(/--jq` flag bundled with gh|gh ships its own embedded library|gh \.\.\. --jq` is fine/i);
}
});
});
});