open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-05-31 19:04:39 +07:00

History

lefarcen 551f967d2c ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list (#3156 ) * ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list Background: on PR #2355 (large AMR runtime add) the agent stopped at smoke level because the positive path was gated on a missing `vela` binary. The old prompt explicitly instructed "if setup prerequisites block, return inconclusive immediately" + "do not spend more than two attempts on test data" + "do not run arbitrary host shell commands", which made the agent give up rather than: - use the PR's own `tests/fixtures/fake-vela.mjs` - set the `VELA_BIN` env the runtime reads - probe `/api/integrations/vela/` directly via fetch This rewrite shifts disposition + adds 4 structured unblock steps: - Mindset: each /explore is a precious, expensive run; be thorough, not lazy. - STEP 0* Read PR body for `## Test Plan` section — declared cases = MUST-COVER. - STEP 1 Extract diff-driven probe list (new routes / components / env vars / fixtures / CLI flags). Anything skipped requires explicit written reason. - STEP 2 Before giving up, try (a) PR-provided fixtures, (b) build minimal stub inside container, (c) probe APIs directly via page.evaluate fetch, (d) search repo / related PRs / docs for unknown terms. - STEP 3 4-7 cases for substantive PRs (was hard cap 3). - STEP 4 Login / multi-tab / OAuth — use Playwright multi-page handling; read creds from env, never echo. - SECURITY strengthened: env vars matching common secret patterns are confidential; never echo / log / write to file / page.evaluate / report. - Report new required §Mitigations Attempted for Inconclusive verdicts — must list what was tried + why each didn't unblock. Kept unchanged: 3-min keepalive constraint, untrusted-data rule, no-host-shell rule, report markdown structure (✅/⚠️/❌/⚪ + case emoji). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(agent-pr-explore): truncation-aware + soft-request protocol + capability fix Addresses review feedback from @mrcfps: (1) STEP 1 — diff probe list was instructed to enumerate "every new route/component/env/fixture/flag" without acknowledging that the harness already truncates the diff upstream (file_patch_max_chars + context_max_bytes). On exactly the large PRs this prompt targets, the agent only sees a slice. Fix: prompt now explicitly tells the agent the context MAY be truncated and to (a) note the truncation in §🧭 Scope and (b) emit §📎 Needs to ask the maintainer to attach the missing source files into the private workspace for the next run. (2) STEP 2 — old text told the agent to "create stubs inside the sandbox container", "rewire env / PATH", and "run gh / grep searches". The harness does not expose docker exec or arbitrary shell to the agent (capabilities are fs:write on host + Playwright on host driving the dockerized app via HTTP). The instructions promised things the agent literally cannot do. Fix: STEP 2 now spells out the actual capabilities (host fs:write, Playwright page.evaluate / page.request, host-side $WORKSPACE_DIR if maintainer pre- attached one). The "unblock by stub" path is rewritten honestly: build a host-side stub if useful, but acknowledge container env is fixed at docker run and signal what's needed via §🔑 Needs / §📎 Needs for the next iteration. The "search repo for unknown term" step (which required gh/grep) is dropped in favor of using $WORKSPACE_DIR materials. (3) Soft-request protocol (new): The agent is READ-ONLY for secrets and workspace — it cannot self-attach. But it can SIGNAL what was needed via two new optional report sections: - §🔑 Needs — secret request ("VELA_RUNTIME_KEY: needed to verify ...") - §📎 Needs — workspace file request ("amr-auth-spec.md: clarifies ...") The dashboard (synclo platform; see nexu-io/synclo#79 RFC §6.8) will parse these structurally and surface as one-click attach hints to the maintainer on the run detail screen. Pure passive signal; no auto-action; zero prompt- injection risk (no code path takes the values). Hard rules: - No pasting of existing secret values (security) - Each item MUST tie back to a specific blocked case in §🧪 Cases — not speculative - Workspace privacy: agent may reference workspace files BY PURPOSE ("verified positive-1 from test-plan.md") but NEVER paste their content into the report This commit is non-trivial (~65 lines net) but the changes are tightly scoped: honesty about capabilities + a new signaling channel that replaces the impossible direct-action promises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(agent-pr-explore): remove gh-search + container-exec language from prompt MINDSET bullet: replace 'gh search issues/prs/code' with in-scope materials (PR body, diff context, workspace files) plus page.request/page.evaluate probes, matching actual harness capabilities. SECURITY bullet: replace the contradictory 'You may run commands INSIDE the sandbox container' with a clear statement that the agent has no shell or container exec access, only the Playwright browser. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * ci(agent-pr-explore): fix Needs section misuse and unawaited fetch body Move fixture env-var wiring request from §🔑 Needs to §📎 Needs and remove the concrete host path from the example; §🔑 Needs is secret-name-only and must not carry filesystem paths. Await r.text() in the page.evaluate fetch example so the body field resolves to a string instead of an unresolved Promise. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * ci(agent-pr-explore): broaden 📎 Needs to cover env/config wiring alongside file attachments Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * ci(agent-pr-explore): fix step-2b Needs routing and split AMR_USER/AMR_PASS example Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * ci(agent-pr-explore): fix mindset bullet — env var cannot be set mid-run Replace "set the env var" in the MINDSET mitigations list with "identify the env var and request the needed startup wiring in §📎 Needs" to match the actual capability boundary: container env is fixed at docker run time and cannot be changed by the agent mid-run. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-28 09:51:19 +00:00
..
release	feat(pack): add Windows portable zip target alongside NSIS installer (#2937 )	2026-05-26 06:14:44 +00:00
agent-pr-explore-local.sh	Add sandboxed agent PR exploration (#2604 )	2026-05-26 07:52:42 +00:00
agent-pr-explore-sandbox.sh	ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list (#3156 )	2026-05-28 09:51:19 +00:00
provision-agent-pr-explore-runner.sh	ci: add idempotent provision script for the agent-pr-explore runner (#3122 )	2026-05-27 14:51:59 +00:00