open-design/.github/scripts
lefarcen 551f967d2c
ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list (#3156)
* ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list

Background: on PR #2355 (large AMR runtime add) the agent stopped at smoke level
because the positive path was gated on a missing `vela` binary. The old prompt
explicitly instructed "if setup prerequisites block, return inconclusive
immediately" + "do not spend more than two attempts on test data" + "do not run
arbitrary host shell commands", which made the agent give up rather than:
- use the PR's own `tests/fixtures/fake-vela.mjs`
- set the `VELA_BIN` env the runtime reads
- probe `/api/integrations/vela/*` directly via fetch

This rewrite shifts disposition + adds 4 structured unblock steps:

- **Mindset**: each /explore is a precious, expensive run; be thorough, not lazy.
- **STEP 0** Read PR body for `## Test Plan` section — declared cases = MUST-COVER.
- **STEP 1** Extract diff-driven probe list (new routes / components / env vars /
  fixtures / CLI flags). Anything skipped requires explicit written reason.
- **STEP 2** Before giving up, try (a) PR-provided fixtures, (b) build minimal stub
  inside container, (c) probe APIs directly via page.evaluate fetch,
  (d) search repo / related PRs / docs for unknown terms.
- **STEP 3** 4-7 cases for substantive PRs (was hard cap 3).
- **STEP 4** Login / multi-tab / OAuth — use Playwright multi-page handling;
  read creds from env, never echo.
- **SECURITY** strengthened: env vars matching common secret patterns are
  confidential; never echo / log / write to file / page.evaluate / report.
- **Report** new required §Mitigations Attempted for Inconclusive verdicts —
  must list what was tried + why each didn't unblock.

Kept unchanged: 3-min keepalive constraint, untrusted-data rule, no-host-shell
rule, report markdown structure (/⚠️// + case emoji).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(agent-pr-explore): truncation-aware + soft-request protocol + capability fix

Addresses review feedback from @mrcfps:

(1) STEP 1 — diff probe list was instructed to enumerate "every new
    route/component/env/fixture/flag" without acknowledging that the harness
    already truncates the diff upstream (file_patch_max_chars + context_max_bytes).
    On exactly the large PRs this prompt targets, the agent only sees a slice.

    Fix: prompt now explicitly tells the agent the context MAY be truncated and
    to (a) note the truncation in §🧭 Scope and (b) emit §📎 Needs to ask the
    maintainer to attach the missing source files into the private workspace
    for the next run.

(2) STEP 2 — old text told the agent to "create stubs inside the sandbox
    container", "rewire env / PATH", and "run gh / grep searches". The harness
    does not expose docker exec or arbitrary shell to the agent (capabilities
    are fs:write on host + Playwright on host driving the dockerized app via
    HTTP). The instructions promised things the agent literally cannot do.

    Fix: STEP 2 now spells out the actual capabilities (host fs:write, Playwright
    page.evaluate / page.request, host-side $WORKSPACE_DIR if maintainer pre-
    attached one). The "unblock by stub" path is rewritten honestly: build a
    host-side stub if useful, but acknowledge container env is fixed at
    docker run and signal what's needed via §🔑 Needs / §📎 Needs for the next
    iteration. The "search repo for unknown term" step (which required gh/grep)
    is dropped in favor of using $WORKSPACE_DIR materials.

(3) Soft-request protocol (new):

    The agent is READ-ONLY for secrets and workspace — it cannot self-attach.
    But it can SIGNAL what was needed via two new optional report sections:

    - §🔑 Needs — secret request ("VELA_RUNTIME_KEY: needed to verify ...")
    - §📎 Needs — workspace file request ("amr-auth-spec.md: clarifies ...")

    The dashboard (synclo platform; see nexu-io/synclo#79 RFC §6.8) will parse
    these structurally and surface as one-click attach hints to the maintainer
    on the run detail screen. Pure passive signal; no auto-action; zero prompt-
    injection risk (no code path takes the values).

    Hard rules:
    - No pasting of existing secret values (security)
    - Each item MUST tie back to a specific blocked case in §🧪 Cases — not
      speculative
    - Workspace privacy: agent may reference workspace files BY PURPOSE
      ("verified positive-1 from test-plan.md") but NEVER paste their content
      into the report

This commit is non-trivial (~65 lines net) but the changes are tightly scoped:
honesty about capabilities + a new signaling channel that replaces the
impossible direct-action promises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(agent-pr-explore): remove gh-search + container-exec language from prompt

MINDSET bullet: replace 'gh search issues/prs/code' with in-scope
materials (PR body, diff context, workspace files) plus
page.request/page.evaluate probes, matching actual harness capabilities.

SECURITY bullet: replace the contradictory 'You may run commands INSIDE
the sandbox container' with a clear statement that the agent has no shell
or container exec access, only the Playwright browser.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix Needs section misuse and unawaited fetch body

Move fixture env-var wiring request from §🔑 Needs to §📎 Needs and
remove the concrete host path from the example; §🔑 Needs is
secret-name-only and must not carry filesystem paths. Await r.text() in
the page.evaluate fetch example so the body field resolves to a string
instead of an unresolved Promise.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): broaden 📎 Needs to cover env/config wiring alongside file attachments

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix step-2b Needs routing and split AMR_USER/AMR_PASS example

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix mindset bullet — env var cannot be set mid-run

Replace "set the env var" in the MINDSET mitigations list with
"identify the env var and request the needed startup wiring in §📎 Needs"
to match the actual capability boundary: container env is fixed at
docker run time and cannot be changed by the agent mid-run.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:51:19 +00:00
..
release feat(pack): add Windows portable zip target alongside NSIS installer (#2937) 2026-05-26 06:14:44 +00:00
agent-pr-explore-local.sh Add sandboxed agent PR exploration (#2604) 2026-05-26 07:52:42 +00:00
agent-pr-explore-sandbox.sh ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list (#3156) 2026-05-28 09:51:19 +00:00
provision-agent-pr-explore-runner.sh ci: add idempotent provision script for the agent-pr-explore runner (#3122) 2026-05-27 14:51:59 +00:00