open-design/.github/workflows/blog-indexing-on-deploy.yml
lefarcen 7312c64580
ci(landing): split landing deploy into staging gate + manual production (#2994)
* ci(landing): split landing deploy into staging gate + manual production

A merge to `main` previously published the landing page straight to
production (open-design.ai) via `landing-page-deploy`. There was no
buffer to review the rendered site, so a bad merge was live instantly.

Split deploys across two Cloudflare Pages projects so production is only
ever reached by an explicit human action:

- `landing-page-staging` (push to main) -> staging project
  `open-design-landing-staging` -> staging.open-design.ai.
- `landing-page-production` (manual workflow_dispatch only) -> production
  project `open-design-landing` -> open-design.ai. Only this workflow
  names the production project; gate it with required reviewers on the
  `production` GitHub environment.
- `landing-page-ci` now also deploys a per-PR preview into the staging
  project (`--branch=pr-<n>`) for same-repo branches and comments the URL.
  Fork PRs (no secrets / read-only token) skip the deploy and keep just
  the build validation. Path filters already scope this to landing edits.

Decouple search-engine indexing from staging:

- `blog-indexing-on-deploy` now triggers on `landing-page-production`
  (not every main push), so the test environment is never submitted to
  Google/IndexNow.
- It diffs from a new `blog-indexed-prod` tag (the last indexed prod
  commit) instead of `HEAD^`, and force-advances the tag after a
  successful run, so a manual promotion bundling several merged posts
  indexes all of them rather than only the last commit.

Staging and PR-preview builds drop `PUBLIC_GA_MEASUREMENT_ID` so test
traffic does not pollute the production GA property.

* ci(landing): keep staging + PR previews out of the search index

staging.open-design.ai mirrors production and is exposed via cert
transparency logs, so search engines can discover it. Indexing the
mirror competes with open-design.ai for the same content.

Emit `<meta name="robots" content="noindex, nofollow">` whenever
OD_LANDING_NOINDEX=1, and set that flag on the staging and PR-preview
builds (production leaves it unset and stays indexable). noindex is
used rather than a robots.txt Disallow so crawlers can still fetch the
page and read both the tag and the canonical, which already points at
the production origin.

* fix(landing): make staging noindex actually take effect

The previous commit read `process.env.OD_LANDING_NOINDEX` directly in
`seo-head.astro`, but `.astro` frontmatter is transformed by Vite and
does not see process.env, so the meta never rendered. Two fixes:

- Inject the flag as the compile-time constant `__OD_LANDING_NOINDEX__`
  via `vite.define` in astro.config.ts (config runs in Node and can read
  process.env); SeoHead consumes that constant.
- The homepage (`index.astro`) and `og.astro` build their own <head> and
  never use SeoHead, so a per-component meta can miss pages. Add an
  `astro:build:done` integration that appends a catch-all
  `/*  X-Robots-Tag: noindex, nofollow` to the Cloudflare Pages `_headers`
  on staging/preview builds, covering every response (homepage, assets,
  any custom-head page) at the HTTP layer. Production builds leave
  `_headers` untouched.

Verified: build with OD_LANDING_NOINDEX=1 emits the _headers block and
the SeoHead <meta>; build without the flag emits neither; astro check
clean.

* fix(landing): address review — pin prod checkout to main, defer index pointer

Two blockers from review:

- landing-page-production: workflow_dispatch can be launched from any ref
  via the Actions "Use workflow from" dropdown, so an operator could ship
  an arbitrary branch to open-design.ai. Pin the checkout to `ref: main`
  so the deployed artifact always equals reviewed main.

- blog-indexing-on-deploy: the `blog-indexed-prod` pointer was advanced
  right after sitemap submission, before Inspect / Search Analytics /
  Render status / Open status PR. A failure in any of those still moved
  the pointer, so the next production run skipped those posts. Move the
  advance to the very end, gated on `success()`, so a failure leaves the
  tag in place and the range is re-processed next run (submissions are
  idempotent).

* fix(landing): gate production promotion to the main ref only

Follow-up to the production-path review note: pinning checkout to main
fixed the deployed content, but the workflow was still dispatchable from
any ref, which records a non-main production run and would dodge
blog-indexing's `workflow_run` `branches: [main]` filter. Gate the whole
job on `github.ref == 'refs/heads/main'` so a dispatch from any other
branch/tag is skipped outright.
2026-05-26 14:05:04 +00:00

301 lines
13 KiB
YAML

name: blog-indexing-on-deploy
# Runs after every successful `landing-page-production` promotion. Staging
# deploys (`landing-page-staging`) intentionally do NOT trigger indexing, so
# the test environment is never submitted to search engines. The job is
# idempotent and follows the blog-indexing-automation skill:
#
# 1. Detect blog URLs added/modified in the deploy
# 2. Verify each URL is operationally ready (200, no noindex, canonical, in sitemap)
# 3. Submit the sitemap-index to Google Search Console (one call per deploy)
# 4. Capture a baseline URL Inspection per new URL (monitoring, not submission)
# 5. Open a PR that updates docs/blog-indexing-status.{md,json}
#
# Indexing is NOT requested via API — Google's Indexing API does not
# support normal blog content. The skill explicitly forbids UI
# automation against Search Console. Operationally, sitemap submission
# + healthy internal linking is what makes URLs discoverable.
on:
workflow_run:
workflows: ['landing-page-production']
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
head_sha:
description: 'Commit SHA to diff against its parent. Defaults to HEAD on main.'
required: false
base_sha:
description: 'Optional base SHA for multi-commit deploy diffs.'
required: false
permissions:
# `contents: write` lets the job advance the `blog-indexed-prod` tag after a
# successful run so the next promotion diffs from exactly where this one
# stopped. The status PR uses a separate GitHub App token (below).
contents: write
concurrency:
group: blog-indexing-on-deploy
cancel-in-progress: false
jobs:
index:
name: Index newly deployed blog posts
if: >-
github.repository == 'nexu-io/open-design'
&& (
github.event_name == 'workflow_dispatch'
|| github.event.workflow_run.conclusion == 'success'
)
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6.0.2
with:
# Need history to diff against the parent commit and to
# compute first-commit dates for posts.
fetch-depth: 0
ref: >-
${{
github.event.inputs.head_sha
|| github.event.workflow_run.head_sha
|| github.sha
}}
- name: Setup pnpm
uses: pnpm/action-setup@v5
with:
version: 10.33.2
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24
cache: pnpm
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Check indexing configuration
id: config
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
BOT_APP_ID: ${{ secrets.BOT_APP_ID }}
BOT_APP_PRIVATE_KEY: ${{ secrets.BOT_APP_PRIVATE_KEY }}
run: |
gsc=false
bot=false
if { [ -n "$GSC_OAUTH_CLIENT_ID" ] && [ -n "$GSC_OAUTH_CLIENT_SECRET" ] && [ -n "$GSC_OAUTH_REFRESH_TOKEN" ]; } || [ -n "$GSC_SERVICE_ACCOUNT_KEY" ]; then
gsc=true
fi
if [ -n "$BOT_APP_ID" ] && [ -n "$BOT_APP_PRIVATE_KEY" ]; then
bot=true
fi
echo "gsc=$gsc" >> "$GITHUB_OUTPUT"
echo "bot=$bot" >> "$GITHUB_OUTPUT"
if [ "$gsc" != "true" ]; then
echo "::warning title=Blog indexing not fully configured::GSC auth is missing; sitemap submission, URL Inspection, Search Analytics, and status rendering will be skipped."
fi
if [ "$bot" != "true" ]; then
echo "::warning title=Blog indexing status PR disabled::Open Design bot secrets are missing; status PR creation will be skipped."
fi
{
echo "### Blog indexing configuration"
echo "- GSC auth configured: \`$gsc\`"
echo "- Open Design bot configured: \`$bot\`"
if [ "$gsc" != "true" ]; then
echo ""
echo "GSC-dependent sitemap submission and URL Inspection steps will be skipped."
fi
if [ "$bot" != "true" ]; then
echo ""
echo "Status PR creation will be skipped."
fi
} >> "$GITHUB_STEP_SUMMARY"
- name: Restore pending indexing status state
run: |
if ! git fetch origin automation/blog-indexing-status:refs/remotes/origin/automation/blog-indexing-status; then
{
echo "### Blog indexing status restore"
echo "No pending \`automation/blog-indexing-status\` branch was found. Starting from the committed status files."
} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
if git checkout refs/remotes/origin/automation/blog-indexing-status -- \
docs/blog-indexing-status.md \
docs/blog-indexing-status.json; then
{
echo "### Blog indexing status restore"
echo "Restored pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
else
{
echo "### Blog indexing status restore"
echo "Failed to restore pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
echo "::error title=Blog indexing status restore failed::Fetched automation/blog-indexing-status but could not checkout the status files."
exit 1
fi
- name: Detect changed blog URLs
id: detect
run: |
mkdir -p .blog-indexing
BASE="${{ github.event.inputs.base_sha || '' }}"
if [ -z "$BASE" ]; then
# Diff from the last production deploy we already indexed, tracked
# by the `blog-indexed-prod` tag. Production is a manual promotion
# that may bundle several merged posts, so a HEAD^ diff would miss
# all but the last commit. The tag is advanced at the end of a
# successful run (see "Advance indexed-production pointer").
git fetch --no-tags origin "+refs/tags/blog-indexed-prod:refs/tags/blog-indexed-prod" 2>/dev/null || true
BASE="$(git rev-parse --verify --quiet 'refs/tags/blog-indexed-prod^{commit}' || true)"
if [ -n "$BASE" ]; then
echo "Diffing from last indexed production commit (blog-indexed-prod): $BASE"
else
BASE="$(git rev-parse HEAD^)"
echo "No blog-indexed-prod tag yet; bootstrapping from HEAD^: $BASE"
fi
fi
echo "base=$BASE" >> "$GITHUB_OUTPUT"
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/detect-changed-urls.ts \
--base "$BASE" \
--head HEAD \
--out ../../.blog-indexing/changed-urls.json
echo '--- changed-urls.json ---'
cat .blog-indexing/changed-urls.json
count=$(node -e "const j=require('./.blog-indexing/changed-urls.json');console.log((j.addedUrls.length+j.modifiedUrls.length))")
echo "count=$count" >> "$GITHUB_OUTPUT"
- name: Verify readiness
if: steps.detect.outputs.count != '0'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/verify-readiness.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/readiness.json \
--timeout-ms 240000
- name: Submit URLs to IndexNow
if: steps.detect.outputs.count != '0'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/submit-indexnow.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/indexnow.json
- name: Submit sitemap to Search Console
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/submit-sitemap.ts
- name: Inspect new URLs (baseline)
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/inspect-urls.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/inspections.json
echo '--- inspections.json ---'
cat .blog-indexing/inspections.json
- name: Query Search Console traffic (baseline)
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/query-search-analytics.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/analytics.json
- name: Render status report
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/render-status.ts \
--inspections ../../.blog-indexing/inspections.json \
--analytics ../../.blog-indexing/analytics.json
- name: Upload artifacts
if: always() && steps.detect.outputs.count != '0'
uses: actions/upload-artifact@v4
with:
name: blog-indexing-${{ github.run_id }}
path: |
.blog-indexing/changed-urls.json
.blog-indexing/readiness.json
.blog-indexing/indexnow.json
.blog-indexing/inspections.json
.blog-indexing/analytics.json
if-no-files-found: ignore
retention-days: 30
- name: Generate Open Design bot token
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
id: open-design-bot-token
uses: actions/create-github-app-token@v2
with:
app-id: ${{ secrets.BOT_APP_ID }}
private-key: ${{ secrets.BOT_APP_PRIVATE_KEY }}
owner: nexu-io
repositories: open-design
permission-contents: write
permission-pull-requests: write
- name: Open status PR
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
uses: peter-evans/create-pull-request@v8
with:
token: ${{ steps.open-design-bot-token.outputs.token }}
add-paths: |
docs/blog-indexing-status.md
docs/blog-indexing-status.json
base: main
branch: automation/blog-indexing-status
delete-branch: true
commit-message: 'docs(blog): refresh indexing status after deploy'
author: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
committer: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
title: 'docs(blog): refresh indexing status after deploy'
body: |
Refreshes `docs/blog-indexing-status.md` with the URL Inspection
verdicts captured immediately after the latest landing-page deploy.
Generated by `.github/workflows/blog-indexing-on-deploy.yml`. The
sidecar `docs/blog-indexing-status.json` is the canonical state;
the markdown file is rendered from it.
# Advance the pointer LAST, only after every post-deploy step above
# (detect, verify, submit, inspect, analytics, render, status PR)
# succeeded. `success()` is false if any earlier step failed, so a
# failure leaves the tag where it was and the next production run
# re-processes the same range rather than silently skipping posts.
# Skipped steps (count == 0, or GSC/bot not configured) do not count as
# failures, so the pointer still advances over an empty/partial range.
# Restricted to the real production-deploy trigger; an ad-hoc manual
# dispatch must not move the production baseline.
- name: Advance indexed-production pointer
if: success() && github.event_name == 'workflow_run'
run: |
HEAD_SHA="$(git rev-parse HEAD)"
git tag -f blog-indexed-prod "$HEAD_SHA"
git push -f origin "refs/tags/blog-indexed-prod"
echo "Advanced blog-indexed-prod -> $HEAD_SHA"