mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

feat(tools-pr): add maintainer PR-duty workspace (#1259 )

* feat(tools-pr): add maintainer PR-duty workspace

Adds `tools/pr` as the maintainer-only control plane for PR-duty work on
this repo. Thin `gh` wrapper that encodes repo-specific knowledge:
review lanes, forbidden surfaces, lane-specific checklists, validation
command derivation from touched packages.

Subcommands:
- `list` — triage open queue by lane and review-state bucket.
- `view <num>` — agent-friendly review brief for a single PR.
- `classify [num]` — emit script-level tags for one PR or the whole
  open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/`
  with rate-limit telemetry per run.
- `assignment` — assigner-perspective view of PR ownership, idle time,
  and blockers (derived from existing tags; no new judgments).

Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase,
forbidden-surface, unlabeled, duplicate-title, non-ascii-slug,
maintainer-edits-disabled, org-member, unresolved-changes-requested,
stale-approval, and three awaiting-* timing tags. Each rule is
expressible as one factual sentence over `gh` data + repo paths — see
`tools/pr/AGENTS.md` for the full dictionary plus precision rules.

Templates in `tools/pr/templates/*.md` are aesthetic references for
recurring maintainer comments (duplicate-title ask, awaiting-author
nudge, agent-review brief shape). `templates/examples/` holds
frozen-in-time agent-review snapshots for three PR shapes.

Infrastructure:
- `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s
  backoff) on transient 5xx / network errors. Persistent failures still
  surface — retry is anti-jitter, not an exponential-backoff resilience
  layer.
- Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines)
  use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to
  stay under GitHub's GraphQL server-side timeout. Light chunks stay on
  `gh pr list --json`.
- `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members
  --paginate`.

Wiring:
- Root `package.json` adds `pnpm tools-pr` to the allowed root entry
  points.
- `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace
  packages.
- `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and
  `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level
  layout allowlist.
- Root `AGENTS.md` and `tools/AGENTS.md` document the new command
  surface, root-command-boundary update, and per-tool ownership.

* docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md

Adds a `PR-duty tooling` section to the root AGENTS.md summarising what
`pnpm tools-pr` is, listing the four common subcommands (list / view /
classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for
the full tag dictionary, operational playbook, templates, and design
rules. The section keeps root-level guidance to high-level orientation
while details stay local to the tool's own AGENTS.md.

* fix(tools-pr): drop overly broad touches-root-package.json forbidden hit

`deriveForbidden` was flagging any change to root `package.json` as a
forbidden-surface hit, but AGENTS.md §Root command boundary only forbids
specific *lifecycle* aliases (pnpm dev / test / build / daemon / preview
/ start) — tools-control-plane entrypoints like `pnpm tools-pr` are
explicitly allowed. Distinguishing "forbidden alias" from "allowed
entry" requires reading the diff content, which is `pnpm guard`'s job
rather than a path-derived classify tag.

Dogfooded on this branch's own PR (#1259), which added the `pnpm
tools-pr` script and was incorrectly flagged. Removing the hit aligns
the `forbidden-surface` tag with what tools-pr can mechanically detect
from file paths alone (apps/nextjs/, packages/shared/).

* fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator

Three review follow-ups on #1259, all factual fixes:

- `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a
  one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps
  connection page size at 100, so the previous implementation would
  fail at runtime when callers passed `--limit > 100`. The paginated
  path makes the commits fetch consistent with the other heavy chunks
  (reviews, comments, assignment timelines) and removes the artificial
  ceiling entirely. The `limit` parameter is dropped from
  `fetchOpenPrCommits`; the CLI `--limit` continues to bound the
  `gh pr list --json` chunks.
- `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision`
  and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge
  state `CLEAN` or `UNSTABLE` and carries no blockers, status renders
  as `ready to merge` instead of falling through to `in review`. The
  assignment view loses its main triage signal without this — a clean
  human-approved PR rendered identical to a REVIEW_REQUIRED one.
- `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both
  constructed the title-index key with a literal NUL byte between
  author and title, which made the file appear as binary in `git diff`
  / review tooling. Replaced the literal byte with a Unicode escape
  sequence in source; the runtime string value is identical, the
  source stays plain text and round-trips through review tooling
  cleanly.

* fix(tools-pr): raise default --limit to 1000 to cover the live open queue

mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`)
defaults to `--limit 100`, which silently drops every PR past the first
100 in the open queue. The repo currently sits at 104 open PRs, so the
out-of-the-box run was already omitting four PRs.

Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`,
and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates
internally, so a high cap is cheap. Users can still pass `--limit <small>`
for a truncated preview. CLI help text on the three subcommands updated to
match.

* fix(web): pass designTemplates to ProjectView render helper

#955 made `designTemplates` a required Prop on ProjectView, but the test
helper added in #1244 (`renderProjectView` in
`ProjectView.api-empty-response.test.tsx`) was never updated. The two
PRs landed on main without conflicting, leaving `apps/web` typecheck red
for every PR that rebases past b5eb8c16.

Pass `designTemplates={[] as SkillSummary[]}` alongside the existing
`skills={[] as SkillSummary[]}` so the helper compiles. The component
already treats the array shape (empty included) as a no-op fallback in
the empty-response paths the test exercises.

* fix(tools-pr): correct author signal + merge inline review comments

Two correctness gaps in the awaiting-* signal pipeline surfaced during
review of the new tools-pr commands:

1. `authorSignalAt` iterated every PR commit unconditionally. On
   `maintainerCanModify=true` PRs a maintainer's follow-up push would
   advance the author timestamp, masking a stalled author response.
   Filter commits to those whose `authorLogin` matches `facts.author`,
   mirroring the same filter already applied to comments.

2. `fetchOpenPrComments` (and `fetchView`) only fetched
   `pullRequest.comments` / `gh pr view --json comments`, which is the
   issue-conversation thread. Inline review-thread replies — where
   authors and reviewers actually exchange most fix-up replies — live in
   `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let
   `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point
   at the wrong side after someone replied inline. Extend the list-mode
   GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`,
   and add a parallel REST inline-comments fetch in `fetchView` that
   merges into `GhView.comments`.

2026-05-11 19:17:21 +08:00

16 KiB

Raw Blame History

tools/pr

Follow the root AGENTS.md and tools/AGENTS.md first. This tool owns the maintainer PR-duty command surface for nexu-io/open-design.

Owns

Repository-specific triage and review preparation on top of gh.
Lane derivation from touched paths (default / contract / skill / design-system / craft / docs / multi), per docs/code-review-guidelines.md §4.
Forbidden-surface detection on diff paths (e.g. apps/nextjs/, packages/shared/), per docs/code-review-guidelines.md §2.
Per-lane rule citations matching the hard lines in docs/code-review-guidelines.md §4.x and CONTRIBUTING.zh-CN.md, with source noted inline.
Validation command derivation from touched packages, citing AGENTS.md §Validation strategy.
Factual brief assembly: de-noised top files, bot-stripped reviews/comments, CI rollup, body preview.
Script-level tag emission via classify, with each tag carrying a sharp, mechanical rule and a source token (see §Tag dictionary).

Does not own

gh configuration, GitHub credentials, or authentication.
PR side effects (approve / request changes / merge / close). Side effects stay in gh invocations the maintainer runs explicitly.
Branch checkout, local rebase, or push operations.
Sidecar protocol, runtime topology, packaged release, or app business logic.

Rules

Output is strictly factual. Every line of human or JSON output must be either (a) data observed from gh / the diff / repository paths, or (b) a direct citation of a rule from this repo's own docs (AGENTS.md, docs/code-review-guidelines.md §X, CONTRIBUTING.zh-CN.md, etc.) with the source noted inline. Tools-pr does not emit risk verdicts (LOW/MEDIUM/HIGH), merge recommendations, or directive language (should, must, do not, recommended, encouraged, suggested). Judgment belongs to the reviewer who consumes the brief, not to the tool.
Stay a thin wrapper. Each subcommand corresponds to a real PR-duty action; do not introduce abstractions that have no caller.
Keep dependencies minimal: cac for subcommand parsing, no GitHub SDKs. Use gh via node:child_process.
gh pr list with 12+ JSON fields returns HTTP 502 across this repo's open queue. Chunk --json selections into multiple smaller calls and join by PR number (src/gh.ts:fetchOpenPrs).
Heavy chunks (reviews, comments) use cursor-paginated gh api graphql via fetchPaginatedPrList with PR_LIST_PAGE_SIZE = 30. Light chunks (meta, stats, files) stay on gh pr list --json — they're already cheap. The split keeps per-page node count low and lets gh retry pages individually when the upstream gateway flakes.
Transient gateway errors (HTTP 5xx, EAI_AGAIN, etc.) trigger a minimum-touch retry inside gh() (two attempts at 1s + 2s backoff). Anything else (4xx, auth failure, schema rejection, JSON parse) surfaces immediately — retry must not mask real problems.
Lane and forbidden-surface rules track docs/code-review-guidelines.md. When that document changes, update src/lane.ts in the same PR.
Per-lane rules must point at the hard lines that already exist in the review/contribution docs, with the source cited inline — do not invent new requirements here.
Output formats are stable contracts: human report is for terminal eyes, --json is for downstream agents and future subcommands. Adding/removing JSON fields counts as a breaking change for the JSON consumer surface.

Tag dictionary (v1)

Each tag has a single mechanical rule. Adding new tags requires the rule to be expressible as one factual sentence, derivable purely from gh data + file paths, and unlikely to false-positive in legitimate use. Patterns that fail this test (e.g. missing-test-changes, contract-no-consumer-update, bulk-author) are intentionally not implemented — see feedback_tools_pr_precise_boundaries in maintainer memory for the exclusion list.

Tag	Rule	Data source
`bot-only-approval`	`reviewDecision === "APPROVED"` and every review with `state === "APPROVED"` matches `isBotAuthored()`	gh.reviewDecision + latestReviews
`needs-rebase`	`mergeStateStatus ∈ {DIRTY, BEHIND}`	gh.mergeStateStatus
`forbidden-surface`	A touched path matches the regex set in `lane.ts:deriveForbidden` (AGENTS.md §Forbidden surfaces)	files + lane.deriveForbidden
`unlabeled`	The PR is missing at least one of the `size/`, `risk/`, `type/` label prefixes	gh.labels
`duplicate-title`	Another open PR by the same author has a byte-for-byte identical `title`	cross-PR titleIndexByAuthor
`non-ascii-slug`	A design-system root touched by the PR has a slug that fails `/^[a-z0-9-]+$/`	files + lane.DESIGN_DIR
`maintainer-edits-disabled`	`maintainerCanModify === false`	gh.maintainerCanModify
`org-member`	PR author's GitHub login appears in `gh api orgs/<repo-owner>/members`	gh REST orgs members list
`unresolved-changes-requested`	Any reviewer's latest review has `state === "CHANGES_REQUESTED"`	gh.latestReviews[].state
`stale-approval`	Any `APPROVED` review's `commit.oid` differs from current `headRefOid`	gh.latestReviews[].commit.oid + gh.headRefOid
`awaiting-author-response-24h`	Latest human-reviewer signal time is newer than the latest author signal time and is ≥ 24h ago	latestReviews + comments + commits
`awaiting-reviewer-response-24h`	Latest author signal time is newer than the latest human-reviewer signal time, ≥ 24h ago, and at least one human-reviewer signal exists	latestReviews + comments + commits
`awaiting-first-review-24h`	No human review or non-author non-bot comment exists, and `createdAt` is ≥ 24h ago	latestReviews + comments + createdAt

Signal-time definitions (used by the three awaiting-* tags):

author signal = max(commits[].committedDate) ∪ max(comments[?author.login==prAuthor].createdAt)
human-reviewer signal = max(latestReviews[?author!=prAuthor && !isBotAuthored].submittedAt) ∪ max(comments[?author!=prAuthor && !isBotAuthored].createdAt)

The three awaiting-* tags are mutually exclusive by construction. Each of them also sets tag.awaitingHours — the integer hour count between the awaiting-window start (latest reviewer signal / latest author signal / createdAt respectively) and the classify-run moment. Downstream consumers use it to sort PRs within an awaiting bucket by actual stuck duration, or floor-divide by 24 for days.

Rate-limit telemetry

classify --all records a rate object in the report and in the summary line so detector additions that quietly inflate API cost are visible immediately:

rate.before / rate.after: GraphQL rateLimit { remaining, limit, resetAt } snapshots taken before and after the bulk fetch.
rate.cost: before.remaining − after.remaining when both snapshots fall in the same reset window; null when the hourly window rolls over between snapshots.

Each snapshot itself costs 1 point; the two extra snapshot calls per --all run are negligible against the 5000-point hourly budget.

Templates

tools/pr/templates/*.md holds aesthetic references for the recurring comment kinds surfaced by classify tags. Each file describes the beats the comment should hit and shows one exemplar phrasing in the maintainer's voice.

Templates are not fill-in forms. Do not sed-substitute placeholders and post the rendered text — repeated identical comments break the human-to-human tone we want to keep with contributors. Instead, for each post: read the template to absorb the tone, weave the PR-specific facts (author, awaiting duration, branch names, diff size) into a fresh comment that hits the same beats with locally adapted wording.

Author-addressed comments adapt to the author's language. For nudge / duplicate-ask / close-with-reason comments — i.e. anything @-mentioning a specific contributor in a private-feeling exchange — detect the author's preferred language before writing the comment:

gh pr view <num> --json body,comments,author --jq '
  .author.login as $a |
  ([.body, (.comments[] | select(.author.login == $a) | .body)] | join("\n"))
'

If the resulting text contains CJK characters (grep -P "[\\p{Han}]"), write the comment in Chinese; otherwise English. Broadcasting comments (PR descriptions, commit messages, review summaries visible to all reviewers) stay in English regardless. See maintainer memory feedback_public_artifacts_english for the full scope rule.

The frontmatter of each template lists the beats and the placeholder slots. Templates are intentionally text files maintained alongside the tool source, not generated by tools-pr — the tool itself stays side-effect-free.

Current templates:

Template	Triggered by	Posted on
`duplicate-title-ask.md`	`duplicate-title` tag (same author, byte-for-byte identical title)	The older / more-iterated PR of the pair
`awaiting-author-nudge.md`	`awaiting-author-response-24h` tag, when `tag.awaitingHours ≥ 96` (≥ 4 days)	The PR; addressed to the author
`agent-review.md`	Bucket-3 high-value / high-risk technical PR review prep	Not posted; internal analysis artifact at `.tmp/tools-pr/reviews/<num>.md`

Operational playbook

Per classify tag bucket, the maintainer workflow. Each row is the minimum action — escalation (close, force-merge, etc.) is the maintainer's call.

Direct merge (APPROVED + CLEAN, surgical)

Sanity-check the merge state:

gh pr view <num> --json state,reviewDecision,mergeStateStatus,statusCheckRollup \
  --jq '{state, reviewDecision, mergeStateStatus,
         checks: [.statusCheckRollup[] | {conclusion, name: (.workflowName // .name)}]}'

Expect state=OPEN, reviewDecision=APPROVED, mergeStateStatus=CLEAN, every check SUCCESS.

If tools-pr classify <num> includes bot-only-approval, verify the change is surgical (size/XS, single file, < ~30 lines, no boundary or contract surface) before proceeding. Judgment lives in maintainer memory (feedback_bot_only_approval), not in the tool.
Squash-merge per repo convention: gh pr merge <num> --squash.
Confirm: gh pr view <num> --json state,mergedAt,mergeCommit --jq '{state, mergedAt, sha: .mergeCommit.oid[0:10]}'.

`duplicate-title`

Inspect both PRs to pick the older / more-iterated one (the author may want to preserve its history). Useful comparison:
```
gh pr view <NUM> --json number,headRefName,commits,additions,deletions,createdAt,updatedAt
```
Read templates/duplicate-title-ask.md for tone and beat structure, then write a fresh comment that hits the same beats with the actual PR facts woven in (author login, both branch names, both commit counts, both diff sizes). Post to the older PR:
```
# Write the composed comment to a scratch file, then:
gh pr comment <older-num> -F /tmp/dup-ask-<older-num>.md
```
Wait for author response. If no response after 7d, close the older PR with gh pr close <older-num> --comment "Superseded by #<newer-num>.".

`awaiting-author-response-24h` (long tail, ≥ 4 days)

Filter the classify report for the PRs that crossed the 96h threshold, exclude org-member PRs, and rank by awaitingHours:

jq '[.byTag["awaiting-author-response-24h"][] as $n
     | .byNumber[($n|tostring)] as $tags
     | select($tags | map(.name) | contains(["org-member"]) | not)
     | $tags[]
     | select(.name=="awaiting-author-response-24h" and .awaitingHours >= 96)
     | {n: $n, h: .awaitingHours}]
    | sort_by(-.h)' .tmp/tools-pr/classify/<latest>.json

For each remaining PR, read templates/awaiting-author-nudge.md for tone, then compose a fresh comment that hits the same beats with the actual author login and human-formatted awaiting duration. Vary the wording slightly between PRs nudged in the same session (a contributor seeing identical pings across their notifications breaks the friendly-human feel).
Post: gh pr comment <num> -F /tmp/nudge-<num>.md.
Re-check the classify report in a follow-up run; the awaiting tag should clear once the author responds. If no response by 14d, escalate (a more direct stale-warning or close-after-warning).

`org-member`

org-member is informational and pairs with the other operational tags rather than triggering its own GitHub action. When a PR carries org-member alongside awaiting-* / duplicate-title / maintainer-edits-disabled / etc., the run-of-the-mill GitHub-comment workflow does not apply — those communications are routed through the team's internal IM instead. See maintainer memory feedback_org_members_im_channel for the channel split (operational nudges → IM; substantive review feedback and decisions → GitHub).

Every operational playbook step that posts a public comment must filter org-member out first; the awaiting-author-response-24h flow above shows the canonical filter.

`tools-pr assignment` — assigner-perspective ownership view

Read-only aggregation that pivots the open PR queue by assignee. For each currently-assigned PR per assignee, surfaces:

assigned-since: now − assignedAt, where assignedAt is the latest ASSIGNED_EVENT for that assignee that has not been superseded by an UNASSIGNED_EVENT (fetched via gh api graphql timeline — only path that exposes it).
assigned-by: the actor on that event. Marked (self-assigned) when actor == assignee.
idle-for: now − max(assignedAt, last assignee activity), where activity = commit / comment / review by that assignee.
state badges: reviewDecision, mergeStateStatus, draft (only when non-trivial).
status / blockers: composed from existing classify tags (needs-rebase, unresolved-changes-requested, stale-approval, awaiting-*, bot-only-approval). No new judgments.

Flow:

gh issue/pr edit <num> --add-assignee <login> is still the assignment action (tools-pr is read-only on this surface).
pnpm tools-pr assignment shows the resulting state grouped by assignee, sorted by idle-hours desc within each bucket.
Use --user me (or --user <login>) for one bucket, --unassigned to expand the un-owned tail.
--json for cron / digest consumption.

Timeline fetch uses the cursor-paginated graphql path (fetchPaginatedPrList) — same retry + page-size primitives as reviews/comments.

Agent review (bucket-3 high-value / high-risk PRs)

For PRs that warrant a deep technical pre-review (contract lane PRs, large refactors, security-sensitive fixes, scope-mixed PRs flagged via classify, etc.), an agent produces an analysis brief under .tmp/tools-pr/reviews/<num>.md. The brief is an internal artifact for the maintainer's consumption — not a GitHub comment. If maintainer decides to post review feedback, a separate downstream step adapts findings to public-facing review text (typically rephrased to address the author directly, with the channel respecting org-member routing).