Commit graph

20 commits

Author SHA1 Message Date
lefarcen
551f967d2c
ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list (#3156)
* ci(agent-pr-explore): rewrite prompt — non-lazy disposition + mandatory probe list

Background: on PR #2355 (large AMR runtime add) the agent stopped at smoke level
because the positive path was gated on a missing `vela` binary. The old prompt
explicitly instructed "if setup prerequisites block, return inconclusive
immediately" + "do not spend more than two attempts on test data" + "do not run
arbitrary host shell commands", which made the agent give up rather than:
- use the PR's own `tests/fixtures/fake-vela.mjs`
- set the `VELA_BIN` env the runtime reads
- probe `/api/integrations/vela/*` directly via fetch

This rewrite shifts disposition + adds 4 structured unblock steps:

- **Mindset**: each /explore is a precious, expensive run; be thorough, not lazy.
- **STEP 0** Read PR body for `## Test Plan` section — declared cases = MUST-COVER.
- **STEP 1** Extract diff-driven probe list (new routes / components / env vars /
  fixtures / CLI flags). Anything skipped requires explicit written reason.
- **STEP 2** Before giving up, try (a) PR-provided fixtures, (b) build minimal stub
  inside container, (c) probe APIs directly via page.evaluate fetch,
  (d) search repo / related PRs / docs for unknown terms.
- **STEP 3** 4-7 cases for substantive PRs (was hard cap 3).
- **STEP 4** Login / multi-tab / OAuth — use Playwright multi-page handling;
  read creds from env, never echo.
- **SECURITY** strengthened: env vars matching common secret patterns are
  confidential; never echo / log / write to file / page.evaluate / report.
- **Report** new required §Mitigations Attempted for Inconclusive verdicts —
  must list what was tried + why each didn't unblock.

Kept unchanged: 3-min keepalive constraint, untrusted-data rule, no-host-shell
rule, report markdown structure (/⚠️// + case emoji).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(agent-pr-explore): truncation-aware + soft-request protocol + capability fix

Addresses review feedback from @mrcfps:

(1) STEP 1 — diff probe list was instructed to enumerate "every new
    route/component/env/fixture/flag" without acknowledging that the harness
    already truncates the diff upstream (file_patch_max_chars + context_max_bytes).
    On exactly the large PRs this prompt targets, the agent only sees a slice.

    Fix: prompt now explicitly tells the agent the context MAY be truncated and
    to (a) note the truncation in §🧭 Scope and (b) emit §📎 Needs to ask the
    maintainer to attach the missing source files into the private workspace
    for the next run.

(2) STEP 2 — old text told the agent to "create stubs inside the sandbox
    container", "rewire env / PATH", and "run gh / grep searches". The harness
    does not expose docker exec or arbitrary shell to the agent (capabilities
    are fs:write on host + Playwright on host driving the dockerized app via
    HTTP). The instructions promised things the agent literally cannot do.

    Fix: STEP 2 now spells out the actual capabilities (host fs:write, Playwright
    page.evaluate / page.request, host-side $WORKSPACE_DIR if maintainer pre-
    attached one). The "unblock by stub" path is rewritten honestly: build a
    host-side stub if useful, but acknowledge container env is fixed at
    docker run and signal what's needed via §🔑 Needs / §📎 Needs for the next
    iteration. The "search repo for unknown term" step (which required gh/grep)
    is dropped in favor of using $WORKSPACE_DIR materials.

(3) Soft-request protocol (new):

    The agent is READ-ONLY for secrets and workspace — it cannot self-attach.
    But it can SIGNAL what was needed via two new optional report sections:

    - §🔑 Needs — secret request ("VELA_RUNTIME_KEY: needed to verify ...")
    - §📎 Needs — workspace file request ("amr-auth-spec.md: clarifies ...")

    The dashboard (synclo platform; see nexu-io/synclo#79 RFC §6.8) will parse
    these structurally and surface as one-click attach hints to the maintainer
    on the run detail screen. Pure passive signal; no auto-action; zero prompt-
    injection risk (no code path takes the values).

    Hard rules:
    - No pasting of existing secret values (security)
    - Each item MUST tie back to a specific blocked case in §🧪 Cases — not
      speculative
    - Workspace privacy: agent may reference workspace files BY PURPOSE
      ("verified positive-1 from test-plan.md") but NEVER paste their content
      into the report

This commit is non-trivial (~65 lines net) but the changes are tightly scoped:
honesty about capabilities + a new signaling channel that replaces the
impossible direct-action promises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(agent-pr-explore): remove gh-search + container-exec language from prompt

MINDSET bullet: replace 'gh search issues/prs/code' with in-scope
materials (PR body, diff context, workspace files) plus
page.request/page.evaluate probes, matching actual harness capabilities.

SECURITY bullet: replace the contradictory 'You may run commands INSIDE
the sandbox container' with a clear statement that the agent has no shell
or container exec access, only the Playwright browser.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix Needs section misuse and unawaited fetch body

Move fixture env-var wiring request from §🔑 Needs to §📎 Needs and
remove the concrete host path from the example; §🔑 Needs is
secret-name-only and must not carry filesystem paths. Await r.text() in
the page.evaluate fetch example so the body field resolves to a string
instead of an unresolved Promise.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): broaden 📎 Needs to cover env/config wiring alongside file attachments

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix step-2b Needs routing and split AMR_USER/AMR_PASS example

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* ci(agent-pr-explore): fix mindset bullet — env var cannot be set mid-run

Replace "set the env var" in the MINDSET mitigations list with
"identify the env var and request the needed startup wiring in §📎 Needs"
to match the actual capability boundary: container env is fixed at
docker run time and cannot be changed by the agent mid-run.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:51:19 +00:00
lefarcen
ae7a417208
ci: add idempotent provision script for the agent-pr-explore runner (#3122)
* ci: add idempotent provision script for the agent-pr-explore runner

The self-hosted runner's setup was hand-assembled and easy to lose on a
rebuild — most dangerously the codex-acp pin: expect-cli bundles
codex-acp 0.10, which is incompatible with ChatGPT-account auth (every
model rejected); we run 0.15, but any expect-cli reinstall silently
reverts it and breaks the agent.

Add a self-contained, idempotent provision script that brings the
runner's config layer back to a working state and is safe to re-run:
codex model pin (gpt-5.4), the codex-acp 0.15 pin (npm pack + extract +
chmod), deploy-key generation, base-repo git mirror seed/refresh,
pnpm-store/reports dirs, the weekly image-refresh helper + cron, and the
readiness self-check helper. The header documents the manual/secret
steps it intentionally does not automate (base toolchain + colima, the
interactive `codex login`, registering the deploy key on the repo, and
registering the Actions runner service).

Verified idempotent against the live runner (all checks pass, no config
disturbed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: provision — update codex model key in place, don't truncate config.toml

Review: step 2 overwrote the whole ~/.codex/config.toml with just the
model line whenever the exact pin wasn't already present, dropping any
other Codex settings on a re-run — destructive, contradicting the
idempotent goal. Now: replace an existing `model =` line in place (sed),
append only when the key is absent, and leave the rest of config.toml
untouched. Verified preservation locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: provision — create ~/.ssh before ssh-keygen on fresh host

Review: on the fresh-rebuild path this script targets, ~/.ssh usually
does not exist, so `ssh-keygen -f ~/.ssh/od_agent_deploy` fails with
"No such file or directory" and the deploy key (and downstream mirror
bootstrap) never gets created. mkdir -p the key's parent dir (chmod 700)
before keygen, and only print the pubkey when it actually exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:51:59 +00:00
lefarcen
54f225d6b3
ci: retry PR-context gh calls so a transient API blip doesn't abort the run (#3128)
* ci: retry PR-context gh calls so a transient API blip doesn't abort the run

The early PR-context gathering calls `gh pr diff`, `gh pr view`, and
`gh api .../files`. gh hits api.github.com under the hood, and a single
transient timeout/5xx there aborts the whole run before any exploration
(seen on #3083: "could not find pull request diff: Get \"https://api.
github.com/...\": net/http timeout"). These were the only network calls
in the run without a retry (source fetch + npm already retry).

Add a small gh_retry helper (4 attempts, linear backoff) and wrap the
three read-only context calls. gh writes nothing to stdout on a failed
API call, so retrying is safe even for the calls piped into the context
file; the retry warning goes to stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: address review — buffer gh retries to files (no paginated duplication)

Review (Siri-Ray): wrapping `gh api --paginate` retries inline while the
context block is redirected to the file means a mid-pagination failure
leaves partial pages in the context, and the retry appends them again —
duplicating the patches section and burning the context budget the agent
reads.

Replace the in-pipe gh_retry with gh_retry_file: each call buffers to its
own file per attempt (`>` truncates on open, so a failed/partial attempt
is discarded before the next), and the context block just cats the
finished files. Fetch PR body + patches to files up front, then assemble.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:42:08 +00:00
lefarcen
a6a56099ca
ci: show per-case pass/fail status emoji in agent report (#3118)
Reviewers asked for at-a-glance outcomes. Instruct the agent to begin each
"Cases Tested" bullet with a status emoji ( pass /  fail / ⚠️ warning /
 inconclusive) and a bold case name, so the report shows which checks
passed or failed without reading each line.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:34:29 +00:00
lefarcen
bf61a39cb5
ci: clean agent report (write-to-file) + slim artifacts/uploads (#3116)
* ci: clean agent report (write-to-file) + slim artifacts/uploads

Four related cleanups to the agent PR exploration output:

1. Clean report. The PR comment / report.md was assembled by dumping the
   entire verbose expect.log (ACP init logs, "Git failed" warnings, the
   ~24KB echoed prompt, ANSI codes, progress checklist) under the trace
   header -- ~28KB of noise. Instead, instruct the agent to write its
   final Markdown report to a file via its file-write tool, and have the
   runner read that file directly. Verified: Codex writes a clean report
   to the given absolute path. Falls back to an inconclusive note if the
   agent did not finish.

2. Drop duplicate trace/video. The script copied
   playwright-smoke-trace.zip -> playwright-trace.zip (a ~28MB legacy
   duplicate) and the webm likewise, and uploaded both to R2. Keep only
   the canonical smoke-named artifacts.

3. Slim the GitHub artifact. The trace zips and videos are already on R2;
   exclude *.zip / *.webm from the uploaded artifact so it drops from
   ~56MB to <1MB (report + logs only).

4. Persist report on the runner. Copy the report / agent-report /
   expect.log / trace URL to a stable host dir
   ($HOME/.cache/agent-pr-explore/reports/pr-<n>) so dry runs
   (skip_comment) can be inspected without downloading the artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: address review — keep advisory reports + recursive artifact excludes

Review findings on the report/artifact cleanup:

1. Regression fix: the non-app-surface and deterministic-verifier branches
   write their pre-baked advisory report (Inconclusive / Pass / Fail) and
   never run the agent, so they don't produce agent-report.md. After
   switching write_agent_report_artifact to read only agent-report.md they
   fell through to the "agent did not write a final report" fallback,
   dropping the real advisory (and mis-reporting on .github-only PRs like
   this one). Fix: those branches now write their advisory directly to
   $agent_report_file — single source of truth for the report body.

2. Recursive artifact excludes: the source Playwright recording lives at
   artifacts/playwright-video/<uuid>.webm; non-recursive !*.webm / !*.zip
   didn't match the subdirectory. Use **/*.zip and **/*.webm so the slim
   actually holds.

3. Drop the now-dangling summary.legacyTrace field (the legacy trace copy
   is no longer produced), matching the legacyVideo removal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:30:05 +00:00
lefarcen
995601de9c
ci: make agent exploration finalize promptly (avoid inactivity abort) (#3100)
Real Codex runs (#3060) explored correctly — verifying 3-4 UI cases with
DOM evidence — but Codex over-planned (6 steps), executed the high-value
ones, then went silent chasing a remaining planned step and tripped
expect-cli's ~180s no-output watchdog, aborting the turn before it emitted
a final report. The run then fell back to an advisory artifact, so the
real findings never reached report.md.

Tighten the prompt so Codex finishes and submits before going idle:
- cap at 3 cases (was 6) and target 2-3, quality over breadth;
- add a CRITICAL instruction stating the runner aborts with no report
  after ~3 min of no output, so Codex must stop after 2-3 cases and emit
  the complete report in one final turn rather than leaving planned steps
  pending or retrying silently.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 17:15:06 +08:00
lefarcen
fed464509b
ci: drive agent PR exploration with the Codex ACP backend (#3086)
expect-cli defaults to the Claude Code ACP provider, which is not
installed on the self-hosted runner, so the exploration step errored
(AcpProviderNotInstalledError) and fell back to a reachability-only smoke
trace instead of real UI exploration.

Pass `-a codex` to expect-cli so it drives the Codex agent (installed on
the runner, authenticated via CODEX_HOME). Configurable via
OD_EXPECT_AGENT (set to empty to use expect-cli's default). When the
agent is unavailable the existing smoke-trace fallback still applies, so
this is safe even before Codex is authenticated.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:05:03 +08:00
lefarcen
114be63a4e
ci: route agent sandbox installs through the China npm mirror (#3084)
After fixing source acquisition (#3078), the #3060 validation run reached
the container and got through most of `pnpm install`, then failed building
the better-sqlite3 native module: prebuild-install could not reach github
releases and the node-gyp fallback could not fetch node headers from
nodejs.org (ECONNRESET). The electron postinstall hits the same blocked
hosts, and package tarballs from npmjs were throttled to ~20 KB/s.

The runner's network to npmjs / nodejs.org / github releases is throttled
or reset by GFW; the China npm mirror (npmmirror.com) is fast and complete
(verified from the runner: registry ~2.4 MB/s, node headers ~3.6 MB/s,
better-sqlite3 prebuilt present). Point the in-container install at it via
registry + disturl (node-gyp headers) + electron / electron-builder /
better-sqlite3 binary mirrors + Playwright download host.

Package integrity is still verified against the lockfile, so the mirror
only changes transport. Once a native module builds, pnpm's side-effects
cache in the persistent store keeps it warm for later runs.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:30:59 +08:00
lefarcen
12141648e4
ci: fetch agent sandbox PR source on the host over SSH via a local mirror (#3078)
The sandbox checked out PR code with `git fetch https://github.com/...`
*inside* the container. The self-hosted runner's bandwidth to github.com
is throttled across every transport (HTTPS/SSH/codeload/API, all
~30-90 KB/s) and the HTTPS handshake is frequently RST'd, so a
from-scratch fetch of this ~200MB repo is impractical and unreliable per
run (run 26491460889 failed here with repeated GnuTLS resets).

Move source acquisition to the trusted host and make it incremental:

- Keep a persistent bare mirror of the base repo
  ($HOME/.cache/agent-pr-explore/open-design.git, overridable via
  OD_SANDBOX_REPO_MIRROR). Each run fetches only the PR's delta via
  `refs/pull/<n>/head` over SSH -- the one transport GFW doesn't reset --
  using a read-only deploy key (OD_SANDBOX_GIT_SSH_KEY).
- Take the head from the BASE repo's pull ref so fork PRs work without
  depending on the head fork, and verify it equals the resolved HEAD_SHA.
- Check the PR head into a per-run worktree and mount it read-only into
  the container; the container copies it into a writable workdir and no
  longer needs (or has) any github access.

The deploy key stays on the trusted host and is never exposed to the
untrusted PR code. The mirror must be seeded once on the runner (the
error message prints the exact clone command if it is missing).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 05:36:13 +00:00
lefarcen
2ed93e9c5d
ci: reuse cached docker image and persist pnpm store for agent sandbox (#3074)
* ci: skip docker pull when agent sandbox image is already cached

The agent PR exploration script ran an unconditional `docker pull
"$image"` before `docker run`. Under `set -e`, a transient registry
timeout (the self-hosted runner's network to docker.io is unreliable)
aborts the whole run even when the base image (node:24-bookworm) is
already cached locally — which is what happened on run 26490782540.

Skip the pull entirely when the image is already present, and only pull
when it is missing. This avoids both the failure and the wasted pull
timeout on every run, and keeps a run's base image stable. Refreshing
the cached image is a separate, explicit operation on the runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: persist agent sandbox pnpm store across runs

The pnpm store was placed under $RUNNER_TEMP, which the Actions runner
wipes per job, so every agent exploration re-downloaded all dependencies
from the npm registry — slow, and as fragile as the runner's docker.io
access (the same network class that already broke the docker pull).

Move the store to a persistent host path ($HOME/.cache/agent-pr-explore/
pnpm-store, overridable via OD_SANDBOX_PNPM_STORE) so a warm,
content-addressed store is reused across runs. `rm -rf "$root"` no longer
touches it since it lives outside the per-run root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:49:26 +08:00
lefarcen
7b8bf0d9fb
ci: map agent trace upload to existing R2 secrets (#3013)
* ci: map agent trace upload to existing R2 secrets

* ci: make agent report comments macos-compatible

* ci: ensure Playwright browsers for agent traces
2026-05-27 03:01:36 +00:00
lefarcen
b5bf28060b
Add sandboxed agent PR exploration (#2604) 2026-05-26 07:52:42 +00:00
PerishFire
bfcafc81fd
feat(pack): add Windows portable zip target alongside NSIS installer (#2937)
Adds a new `--to zip` (and `--to all`) tools-pack Windows build target that
produces a portable `.zip` from the cached `win-unpacked` tree using the
bundled 7z. The zip lays files at the archive root so users can extract it
anywhere and launch `Open Design.exe` without going through the NSIS
installer, addressing the no-install download request.

Release plumbing is updated to publish the portable zip and its sha256
beside the existing installer on R2 for beta, preview, and stable channels
(default on, gated by `WINDOWS_INCLUDE_ZIP`/`WIN_INCLUDE_ZIP`). The
electron-updater `latest.yml` feed continues to point only at the
installer; the zip is a manual-download convenience and is intentionally
excluded from the in-app updater.

Closes #1121

Generated-By: looper 0.0.0-dev (runner=worker, agent=claude-code)

Co-authored-by: libertecode <libertecode@proton.me>
2026-05-26 06:14:44 +00:00
PerishFire
526c7f7c26
Fix packaged auto-update release validation (#2565)
* fix: tighten packaged updater flow

* test: prune noisy extended ui coverage

* fix: hide unpublished release artifacts

* test: validate release updater channels

* fix: align prerelease release namespaces
2026-05-21 18:15:53 +08:00
PerishFire
bb13eee765
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries

* chore(ci): tighten measured workspace steps

* chore(release): tighten beta setup steps

* chore(release): slim beta windows smoke

* chore(ci): shard daemon tests

* chore(ci): harden runtime trace lookup

* chore(release): avoid mac pnpm cache in beta

* chore(ci): split critical playwright checks

* chore(release): publish beta platforms from builders

* test(e2e): update beta release workflow expectation

* chore(ci): stop gating PRs on nix check

* fix(release): keep beta latest complete
2026-05-19 18:06:28 +08:00
PerishCode
43b1b94c8e Add preview release channel 2026-05-14 19:15:16 +08:00
PerishFire
976edaf38e
test: harden e2e smoke and release reports (#1140)
* test: harden e2e inspect specs

* test: wire e2e release reports

* chore: bump packaged beta base to 0.6.1

* test: run release smoke vitest directly

* test: add suite-owned tools-dev lifecycle

* ci: harden stable release packaging

* fix(release,e2e): gate stable signing on verify and harden suite cleanup

- restore `needs: [metadata, verify]` on the stable release `build_mac`,
  `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple
  signing/notarization and Windows release builds cannot run before
  pnpm guard, typecheck, and layout checks complete on the metadata commit.
- in `runToolsDevSuite`, drop the `started` flag and always attempt
  `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and
  when the test body succeeded, escalate the stop failure to the suite
  result and rethrow — so orphan daemon/web processes from an interrupted
  `startToolsDevWeb` or a broken shutdown can no longer pass silently.

Addresses PR #1140 review feedback from lefarcen and mrcfps.
2026-05-11 13:11:16 +08:00
PerishFire
cc343f8828
ci: optimize beta release packaging cache (#1095)
* ci: optimize beta release packaging cache

* fix: version windows builder cache

* fix: forward linux app version in container
2026-05-10 10:11:05 +08:00
Gavin Zeng
7518cfc107
feat: add macOS Intel (x64) build support to release workflows (#759)
* feat: add macOS Intel (x64) build support to release workflows

Add build_mac_intel job to both release-beta.yml and release-stable.yml
using macos-13 runners (last Intel-based GitHub Actions runner).

Key changes:
- release-beta.yml: add enable_mac_intel input (default false), build
  job, and wire into publish/verify/summary
- release-stable.yml: add always-on build_mac_intel job, wire into
  publish (downloads + copies to GitHub Release), verify, and summary
- publish.sh: add ENABLE_MAC_INTEL uploads, outputs, and metadata entry
- verify.sh: add mac-intel URL verification when enabled
- summary.sh: add macOS x64 (Intel) row to platform/report tables
- mac-intel.sh: new asset script for unsigned DMG+ZIP production

Intel builds are unsigned (like Windows). No auto-update feed.
Artifact naming: open-design-<ver>.unsigned-mac-x64.{dmg,zip}

Closes #746

* fix: resolve beta macIntel asset name mismatch (P1)

Add MAC_INTEL_ASSET_SUFFIX to publish.sh (mirroring existing
WIN_ASSET_SUFFIX / LINUX_ASSET_SUFFIX pattern) so that the beta
publish job can correctly locate unsigned Intel artifacts.

- publish.sh: add mac_intel_asset_suffix variable with fallback
- release-beta.yml: pass MAC_INTEL_ASSET_SUFFIX: .unsigned to publish

---------

Co-authored-by: ZengGanghui <zghui0@gmail.com>
2026-05-09 19:50:50 +08:00
PerishFire
dcfab797c2
[codex] Add stable nightly promotion gate (#962)
* Upload beta e2e spec reports to R2

* Expose beta report URLs in summary

* Complete Indonesian deploy locale keys

* chore: factor release workflow scripts

* chore: bump packaged beta base version

* test: wait for mac packaged runtime health

* fix: capture mac packaged startup logs

* chore: improve mac release build observability

* fix: ad-hoc sign unsigned mac builds

* chore: diagnose mac packaged startup

* fix: relax unsigned mac launch signing

* chore: improve mac launch diagnostics

* chore: simplify beta mac release artifacts

* fix: align packaged mac smoke launch config

* fix: externalize mac daemon wasm dependency

* chore: require signed stable mac releases

* fix: use stable app version for nightly package builds

* chore: clean release artifacts after publish

* chore: publish beta reports as zip

* ci: disable beta mac tools-pack cache

* fix: skip mac framework binary symlinks when signing

* fix: sign mac framework version bundles

* ci: disable beta mac pnpm cache

* chore: align stable release reports

* ci: require matching nightly before stable release

* ci: avoid mac pnpm cache for packaged smoke
2026-05-08 21:48:54 +08:00