Commit graph

7 commits

Author SHA1 Message Date
lefarcen
ce9fa687ca
ci: trigger PR exploration via maintainer /explore comment (no approval) (#3139)
* ci: trigger PR exploration via maintainer "/explore" comment (no approval)

Add a low-friction way to run the sandbox exploration: a maintainer
comments "/explore" on a PR.

- on: issue_comment (kept workflow_dispatch). The job `if` allows the
  comment path only when it is on a PR and the commenter has write access
  (author_association OWNER/MEMBER/COLLABORATOR), so randoms cannot trigger
  it; untrusted PR code still runs only inside the Docker sandbox.
- Drop the agent-pr-explore environment approval gate: both triggers are
  already write-gated and there is no auto-trigger, so the extra manual
  approval is redundant. R2 creds are repo-level secrets (no env-scoped
  secrets), so they stay available without the environment.
- Feedback: 👀 reaction on the command + a placeholder comment carrying
  the report marker (so the run yields one evolving comment), 🚀 on
  success, and 👎 + a failure note (with the run link) on failure.

Does not auto-run on every PR, so unrelated PRs stay clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: don't clobber a produced report with the /explore failure note

Review: the failure-feedback step ran after the always() report step, so
on the failure-with-report case (sandbox wrote a report then exited
non-zero) it overwrote the just-posted report with the generic "failed
before producing a report" note — losing the useful output.

Guard it: if the report file exists, leave the posted report in place and
skip the failure note/reaction. Only post the short failure note when no
report was produced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 16:48:58 +00:00
lefarcen
bf61a39cb5
ci: clean agent report (write-to-file) + slim artifacts/uploads (#3116)
* ci: clean agent report (write-to-file) + slim artifacts/uploads

Four related cleanups to the agent PR exploration output:

1. Clean report. The PR comment / report.md was assembled by dumping the
   entire verbose expect.log (ACP init logs, "Git failed" warnings, the
   ~24KB echoed prompt, ANSI codes, progress checklist) under the trace
   header -- ~28KB of noise. Instead, instruct the agent to write its
   final Markdown report to a file via its file-write tool, and have the
   runner read that file directly. Verified: Codex writes a clean report
   to the given absolute path. Falls back to an inconclusive note if the
   agent did not finish.

2. Drop duplicate trace/video. The script copied
   playwright-smoke-trace.zip -> playwright-trace.zip (a ~28MB legacy
   duplicate) and the webm likewise, and uploaded both to R2. Keep only
   the canonical smoke-named artifacts.

3. Slim the GitHub artifact. The trace zips and videos are already on R2;
   exclude *.zip / *.webm from the uploaded artifact so it drops from
   ~56MB to <1MB (report + logs only).

4. Persist report on the runner. Copy the report / agent-report /
   expect.log / trace URL to a stable host dir
   ($HOME/.cache/agent-pr-explore/reports/pr-<n>) so dry runs
   (skip_comment) can be inspected without downloading the artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: address review — keep advisory reports + recursive artifact excludes

Review findings on the report/artifact cleanup:

1. Regression fix: the non-app-surface and deterministic-verifier branches
   write their pre-baked advisory report (Inconclusive / Pass / Fail) and
   never run the agent, so they don't produce agent-report.md. After
   switching write_agent_report_artifact to read only agent-report.md they
   fell through to the "agent did not write a final report" fallback,
   dropping the real advisory (and mis-reporting on .github-only PRs like
   this one). Fix: those branches now write their advisory directly to
   $agent_report_file — single source of truth for the report body.

2. Recursive artifact excludes: the source Playwright recording lives at
   artifacts/playwright-video/<uuid>.webm; non-recursive !*.webm / !*.zip
   didn't match the subdirectory. Use **/*.zip and **/*.webm so the slim
   actually holds.

3. Drop the now-dangling summary.legacyTrace field (the legacy trace copy
   is no longer produced), matching the legacyVideo removal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:30:05 +00:00
lefarcen
1ac3da130f
ci: add skip_comment dry-run input to agent PR exploration (#3080)
Add a `skip_comment` workflow_dispatch input (default false). When set,
the "Comment exploration report" step is skipped, so a validation/dry
run can exercise the full pipeline and produce the report artifact
without posting a public comment on the target PR (useful when testing
against an external contributor's PR). The report is still uploaded as
an artifact for review.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 05:51:48 +00:00
lefarcen
80639d4da4
ci: make agent PR exploration trusted checkout lightweight (#3071)
The "Checkout trusted base scripts" step did a full actions/checkout of
this large repo on the self-hosted runner. On a recent run it stalled in
the initial `git fetch --depth=1 origin <sha>` for many minutes before
the agent script ever started, and the run had to be cancelled.

The trusted host side only needs the self-contained
`.github/scripts/agent-pr-explore-sandbox.sh`; PR code is checked out
inside Docker and PR context is gathered via the API. Replace the full
checkout with a single-file fetch via `gh api` (raw), pinned to the same
trusted base/dispatch commit, which avoids the git-protocol fetch of the
whole repo entirely.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 04:18:19 +00:00
lefarcen
7b8bf0d9fb
ci: map agent trace upload to existing R2 secrets (#3013)
* ci: map agent trace upload to existing R2 secrets

* ci: make agent report comments macos-compatible

* ci: ensure Playwright browsers for agent traces
2026-05-27 03:01:36 +00:00
lefarcen
a0ea9bdaf3
ci: make agent PR exploration manual only (#2993)
* Make agent PR explore manual dispatch only

* chore: retrigger PR checks

* chore: retrigger CI after Actions recovery
2026-05-26 12:59:58 +00:00
lefarcen
b5bf28060b
Add sandboxed agent PR exploration (#2604) 2026-05-26 07:52:42 +00:00