* ci: trigger PR exploration via maintainer "/explore" comment (no approval)
Add a low-friction way to run the sandbox exploration: a maintainer
comments "/explore" on a PR.
- on: issue_comment (kept workflow_dispatch). The job `if` allows the
comment path only when it is on a PR and the commenter has write access
(author_association OWNER/MEMBER/COLLABORATOR), so randoms cannot trigger
it; untrusted PR code still runs only inside the Docker sandbox.
- Drop the agent-pr-explore environment approval gate: both triggers are
already write-gated and there is no auto-trigger, so the extra manual
approval is redundant. R2 creds are repo-level secrets (no env-scoped
secrets), so they stay available without the environment.
- Feedback: 👀 reaction on the command + a placeholder comment carrying
the report marker (so the run yields one evolving comment), 🚀 on
success, and 👎 + a failure note (with the run link) on failure.
Does not auto-run on every PR, so unrelated PRs stay clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: don't clobber a produced report with the /explore failure note
Review: the failure-feedback step ran after the always() report step, so
on the failure-with-report case (sandbox wrote a report then exited
non-zero) it overwrote the just-posted report with the generic "failed
before producing a report" note — losing the useful output.
Guard it: if the report file exists, leave the posted report in place and
skip the failure note/reaction. Only post the short failure note when no
report was produced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: clean agent report (write-to-file) + slim artifacts/uploads
Four related cleanups to the agent PR exploration output:
1. Clean report. The PR comment / report.md was assembled by dumping the
entire verbose expect.log (ACP init logs, "Git failed" warnings, the
~24KB echoed prompt, ANSI codes, progress checklist) under the trace
header -- ~28KB of noise. Instead, instruct the agent to write its
final Markdown report to a file via its file-write tool, and have the
runner read that file directly. Verified: Codex writes a clean report
to the given absolute path. Falls back to an inconclusive note if the
agent did not finish.
2. Drop duplicate trace/video. The script copied
playwright-smoke-trace.zip -> playwright-trace.zip (a ~28MB legacy
duplicate) and the webm likewise, and uploaded both to R2. Keep only
the canonical smoke-named artifacts.
3. Slim the GitHub artifact. The trace zips and videos are already on R2;
exclude *.zip / *.webm from the uploaded artifact so it drops from
~56MB to <1MB (report + logs only).
4. Persist report on the runner. Copy the report / agent-report /
expect.log / trace URL to a stable host dir
($HOME/.cache/agent-pr-explore/reports/pr-<n>) so dry runs
(skip_comment) can be inspected without downloading the artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: address review — keep advisory reports + recursive artifact excludes
Review findings on the report/artifact cleanup:
1. Regression fix: the non-app-surface and deterministic-verifier branches
write their pre-baked advisory report (Inconclusive / Pass / Fail) and
never run the agent, so they don't produce agent-report.md. After
switching write_agent_report_artifact to read only agent-report.md they
fell through to the "agent did not write a final report" fallback,
dropping the real advisory (and mis-reporting on .github-only PRs like
this one). Fix: those branches now write their advisory directly to
$agent_report_file — single source of truth for the report body.
2. Recursive artifact excludes: the source Playwright recording lives at
artifacts/playwright-video/<uuid>.webm; non-recursive !*.webm / !*.zip
didn't match the subdirectory. Use **/*.zip and **/*.webm so the slim
actually holds.
3. Drop the now-dangling summary.legacyTrace field (the legacy trace copy
is no longer produced), matching the legacyVideo removal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a `skip_comment` workflow_dispatch input (default false). When set,
the "Comment exploration report" step is skipped, so a validation/dry
run can exercise the full pipeline and produce the report artifact
without posting a public comment on the target PR (useful when testing
against an external contributor's PR). The report is still uploaded as
an artifact for review.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Checkout trusted base scripts" step did a full actions/checkout of
this large repo on the self-hosted runner. On a recent run it stalled in
the initial `git fetch --depth=1 origin <sha>` for many minutes before
the agent script ever started, and the run had to be cancelled.
The trusted host side only needs the self-contained
`.github/scripts/agent-pr-explore-sandbox.sh`; PR code is checked out
inside Docker and PR context is gathered via the API. Replace the full
checkout with a single-file fetch via `gh api` (raw), pinned to the same
trusted base/dispatch commit, which avoids the git-protocol fetch of the
whole repo entirely.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>