* ci(landing): split landing deploy into staging gate + manual production A merge to `main` previously published the landing page straight to production (open-design.ai) via `landing-page-deploy`. There was no buffer to review the rendered site, so a bad merge was live instantly. Split deploys across two Cloudflare Pages projects so production is only ever reached by an explicit human action: - `landing-page-staging` (push to main) -> staging project `open-design-landing-staging` -> staging.open-design.ai. - `landing-page-production` (manual workflow_dispatch only) -> production project `open-design-landing` -> open-design.ai. Only this workflow names the production project; gate it with required reviewers on the `production` GitHub environment. - `landing-page-ci` now also deploys a per-PR preview into the staging project (`--branch=pr-<n>`) for same-repo branches and comments the URL. Fork PRs (no secrets / read-only token) skip the deploy and keep just the build validation. Path filters already scope this to landing edits. Decouple search-engine indexing from staging: - `blog-indexing-on-deploy` now triggers on `landing-page-production` (not every main push), so the test environment is never submitted to Google/IndexNow. - It diffs from a new `blog-indexed-prod` tag (the last indexed prod commit) instead of `HEAD^`, and force-advances the tag after a successful run, so a manual promotion bundling several merged posts indexes all of them rather than only the last commit. Staging and PR-preview builds drop `PUBLIC_GA_MEASUREMENT_ID` so test traffic does not pollute the production GA property. * ci(landing): keep staging + PR previews out of the search index staging.open-design.ai mirrors production and is exposed via cert transparency logs, so search engines can discover it. Indexing the mirror competes with open-design.ai for the same content. Emit `<meta name="robots" content="noindex, nofollow">` whenever OD_LANDING_NOINDEX=1, and set that flag on the staging and PR-preview builds (production leaves it unset and stays indexable). noindex is used rather than a robots.txt Disallow so crawlers can still fetch the page and read both the tag and the canonical, which already points at the production origin. * fix(landing): make staging noindex actually take effect The previous commit read `process.env.OD_LANDING_NOINDEX` directly in `seo-head.astro`, but `.astro` frontmatter is transformed by Vite and does not see process.env, so the meta never rendered. Two fixes: - Inject the flag as the compile-time constant `__OD_LANDING_NOINDEX__` via `vite.define` in astro.config.ts (config runs in Node and can read process.env); SeoHead consumes that constant. - The homepage (`index.astro`) and `og.astro` build their own <head> and never use SeoHead, so a per-component meta can miss pages. Add an `astro:build:done` integration that appends a catch-all `/* X-Robots-Tag: noindex, nofollow` to the Cloudflare Pages `_headers` on staging/preview builds, covering every response (homepage, assets, any custom-head page) at the HTTP layer. Production builds leave `_headers` untouched. Verified: build with OD_LANDING_NOINDEX=1 emits the _headers block and the SeoHead <meta>; build without the flag emits neither; astro check clean. * fix(landing): address review — pin prod checkout to main, defer index pointer Two blockers from review: - landing-page-production: workflow_dispatch can be launched from any ref via the Actions "Use workflow from" dropdown, so an operator could ship an arbitrary branch to open-design.ai. Pin the checkout to `ref: main` so the deployed artifact always equals reviewed main. - blog-indexing-on-deploy: the `blog-indexed-prod` pointer was advanced right after sitemap submission, before Inspect / Search Analytics / Render status / Open status PR. A failure in any of those still moved the pointer, so the next production run skipped those posts. Move the advance to the very end, gated on `success()`, so a failure leaves the tag in place and the range is re-processed next run (submissions are idempotent). * fix(landing): gate production promotion to the main ref only Follow-up to the production-path review note: pinning checkout to main fixed the deployed content, but the workflow was still dispatchable from any ref, which records a non-main production run and would dodge blog-indexing's `workflow_run` `branches: [main]` filter. Gate the whole job on `github.ref == 'refs/heads/main'` so a dispatch from any other branch/tag is skipped outright.
15 KiB
Blog indexing automation
The Open Design landing page automates the parts of search-engine indexing that Google officially supports for normal blog content. It does NOT pretend to "submit" or "request indexing" for blog posts via unsupported APIs or browser automation.
This file is the operating manual. The skill that defines the rules
lives at ~/.codex/skills/blog-indexing-automation/SKILL.md; this
doc is its concrete implementation in nexu-io/open-design.
What is automated
| Trigger | Job | Outcome |
|---|---|---|
landing-page-ci |
lint-blog-seo.ts + check-blog-url-changes.ts |
Changed posts are checked for frontmatter, internal/external links, rendered canonical/JSON-LD/OG metadata, and slug delete/rename redirects before they can merge. |
landing-page-production promotion finishes successfully |
blog-indexing-on-deploy.yml |
New blog URLs are detected, verified ready, submitted to IndexNow, the sitemap-index is re-submitted to GSC, baseline URL Inspection is captured, and baseline Search Analytics is queried. Staging deploys (landing-page-staging) never trigger this. |
Daily cron: 0 2 * * * |
blog-indexing-monitor.yml |
Every blog post in the T+1 / T+3 / T+7 / T+14 window is re-inspected; GSC Search Analytics is refreshed; stall and low-traffic issues are opened/refreshed when needed. |
Daily cron: 0 2 * * * (10:00 Asia/Shanghai) |
blog-3day-report.yml |
T-3 cohort + 30-day rolling cohort traffic digest written to docs/blog-traffic-digest.md via the automation/blog-traffic-digest PR, with an optional Feishu group push. Read-only against GSC. |
Manual workflow_dispatch |
blog-indexing-monitor.yml |
Maintainers can dry-run or explicitly publish a token-gated dev.to/Hashnode cross-post with canonical URL pointing back to Open Design. |
The monitor and 3-day digest workflows commit their durable outputs
back via the open-design-bot GitHub App. The monitor opens or
refreshes the automation/blog-indexing-status PR; the traffic digest
opens or refreshes the automation/blog-traffic-digest PR. The
human-readable indexing view is docs/blog-indexing-status.md; the
canonical indexing state is the sidecar
docs/blog-indexing-status.json. The human-readable traffic view is
docs/blog-traffic-digest.md.
Before each run renders a new report, it restores the latest files from
the pending automation/blog-indexing-status branch when that branch
exists. That keeps inspection history continuous even if the previous
status PR has not been merged yet. If that branch exists but the status
files cannot be restored, the workflow fails and records the restore
failure in the job summary instead of silently starting from stale state.
What is deliberately NOT automated
Per the blog-indexing-automation skill:
- We do not call Google's Indexing API. It officially supports only Job Postings and Livestreams; using it for blog posts risks policy flags and provides no real benefit.
- We do not automate clicks against the Search Console UI to "Request Indexing." The skill labels that as a brittle last resort.
- We do not ping the legacy
https://www.google.com/ping?sitemap=endpoint. Google deprecated it in 2023. - We do not attempt to inspect every URL on the site every day. We only inspect changed URLs after deploy and posts in the T+1/T+3/T+7/T+14 window.
- We do not auto-publish cross-posts. The cross-post scaffold is dry-run
by default and requires both platform tokens and
publish_crosspost=true.
When automation cannot solve an indexing problem (e.g. Google has the URL but refuses to index it), the monitor opens a GitHub issue describing the likely failure mode so a human can fix the underlying content / SEO issue.
Architecture
Because production is a manual promotion that can bundle several merged
posts, detect-changed-urls diffs from the blog-indexed-prod git tag (the
last commit this workflow successfully indexed) rather than HEAD^. The tag
is force-advanced to the deployed commit at the end of a successful run, so
the next promotion picks up exactly the posts merged in between. On the very
first run the tag does not exist yet and the workflow bootstraps from HEAD^
— create the tag manually (git tag blog-indexed-prod <sha>; git push origin blog-indexed-prod) if an initial multi-commit backfill is needed.
landing-page-production ──success──▶ blog-indexing-on-deploy
│
detect-changed-urls (base = blog-indexed-prod tag)
│
verify-readiness (200 / canonical / sitemap)
│
submit-indexnow
│
submit-sitemap (one PUT)
│
inspect-urls (baseline)
│
query-search-analytics
│
render-status ──▶ docs/blog-indexing-status.md
│
bot PR
cron 02:00 UTC ──▶ blog-indexing-monitor
│
scheduled-window (T+1/T+3/T+7/T+14 today)
│
inspect-urls
│
query-search-analytics
│
render-status ──▶ docs/blog-indexing-status.md
│
escalate-stalls ──▶ open / refresh / close stall issue
│
escalate-low-traffic ──▶ open / refresh / close traffic issue
│
bot PR
cron 02:00 UTC ──▶ blog-3day-report
│
report-3day (T-3 cohort + 30-day rolling cohort)
│
querySearchAnalytics (windowDays=3)
│
inspectUrl (T-3 cohort only)
│
upsert ──▶ docs/blog-traffic-digest.md
│
post-feishu-digest (optional webhook)
│
bot PR (automation/blog-traffic-digest)
All scripts live in apps/landing-page/scripts/blog-indexing/ and run
under tsx directly — no compile step. Most scripts depend only on
Node 24 built-ins (crypto, fetch, child_process). RSS uses
@astrojs/rss.
One-time setup
Done once per environment by a maintainer. Repeating this is harmless but unnecessary.
1. Configure Google Search Console auth
Preferred path: OAuth user refresh token. This avoids the Google Search
Console UI bug where newly-created service account emails sometimes
fail with email not found.
-
Go to https://console.cloud.google.com/projectcreate and create a project named
open-design-blog-indexing(or reuse an existing project the team owns). -
Enable the Search Console API under https://console.cloud.google.com/apis/library/searchconsole.googleapis.com.
-
Create an OAuth client under https://console.cloud.google.com/apis/credentials:
- Application type: Desktop app
- Name:
open-design-gsc-local
-
In the OAuth consent screen, keep the app in Testing and add every Google account that may grant access under Audience → Test users.
-
Run the local helper:
GSC_OAUTH_CLIENT_ID='<client-id>' \ GSC_OAUTH_CLIENT_SECRET='<client-secret>' \ pnpm --filter @open-design/landing-page exec tsx \ scripts/blog-indexing/authorize-gsc-oauth.ts \ --out /tmp/open-design-gsc-refresh-token.txt -
Open the printed Google URL and authorize with an account that is an Owner of the
open-design.aiSearch Console property.
Fallback path: service account. Create gsc-indexing-bot, download a
JSON key, then try adding the client_email as an Owner in Search
Console. If Search Console shows email not found, use OAuth instead.
2. Add auth secrets to GitHub
- Open https://github.com/nexu-io/open-design/settings/secrets/actions.
- Preferred OAuth secrets:
GSC_OAUTH_CLIENT_IDGSC_OAUTH_CLIENT_SECRETGSC_OAUTH_REFRESH_TOKEN
- Optional service-account fallback:
GSC_SERVICE_ACCOUNT_KEY
- Confirm the existing
BOT_APP_IDandBOT_APP_PRIVATE_KEYsecrets already exist — they are reused from therefresh-contributors-wallautomation. The bot needscontents:write,pull-requests:write, andissues:writefornexu-io/open-design(already configured).
If these secrets are not present yet, the workflows do not fail the main deploy path. They record the missing configuration in the job summary, emit a GitHub Actions warning, and skip the GSC / bot-write steps until the secrets are added.
3. Optional platform secrets
These are not required for indexing.
DEVTO_API_KEY— only needed if a maintainer wantsblog-indexing-monitor.ymlto publish a dev.to cross-post.HASHNODE_TOKENandHASHNODE_PUBLICATION_ID— only needed for Hashnode cross-posts.FEISHU_BLOG_DIGEST_WEBHOOK— optional Feishu custom bot webhook for the dailyblog-3day-report.ymldigest push. Missing this secret does not fail the workflow; the digest still lands indocs/blog-traffic-digest.mdand as an Actions artifact.CLOUDFLARE_ZONE_ID— optional future optimization if we choose to purge cache directly. Current automation polls the live sitemap until the new URLs appear, so this secret is not required.
IndexNow does not need a secret. The public verification key is committed
at apps/landing-page/public/96b0928121e24fd7b4ef85ae0f8bf1d8.txt.
4. Smoke test
Trigger blog-indexing-on-deploy.yml manually with the SHA of any
recent commit that added a blog post:
gh workflow run blog-indexing-on-deploy.yml \
-R nexu-io/open-design \
-f head_sha=<sha>
A successful run produces:
- a green check on the workflow
- the
automation/blog-indexing-statusPR refreshed with new rows indocs/blog-indexing-status.md - the artifact
blog-indexing-<run-id>containing the raw JSON outputs - an
indexnow.jsonartifact with the IndexNow submission result
If the run fails on the Submit sitemap step with a 403, the service account is not yet an Owner on the GSC property (Step 2).
Operating
The expected steady state:
- PR opens →
landing-page-ciruns SEO lint and URL-change guards. A post cannot merge if it deletes/renames a live slug without an explicit redirect, or if the rendered HTML loses canonical/JSON-LD/OG metadata. - Renames are handled as both a redirect requirement for the old slug and a newly deployed URL for the destination slug, so the new page is included in the post-deploy readiness and baseline inspection flow.
- New post ships →
landing-page-productionpromotion runs →blog-indexing-on-deployruns → IndexNow is called, GSC sitemap is submitted, and the bot PR opens with the baseline verdict plus any available 7d/28d traffic metrics. - Daily monitor runs → at T+1 the post usually moves to
Crawled - currently not indexed. By T+3–T+7 a healthy post isSubmitted and indexed. The status table reflects this. - If T+7 passes and the post is still not indexed, the monitor opens
a
Blog indexing — URLs stalled in Search Consoleissue listing the affected URLs, re-submits them to IndexNow, and records a history comment on every refresh. Triage manually using the URL Inspection live test if the issue stays open. - If T+14 passes, a post is indexed, and GSC still reports zero
impressions, the monitor opens
Blog traffic — indexed posts with zero impressions. Treat that as a distribution/query-fit issue, not an indexing issue. - Daily traffic digest runs at 10:00 Asia/Shanghai. It writes
docs/blog-traffic-digest.md, uploads the Markdown plus compact JSON summary as an artifact, optionally sends a compact Feishu card, and opens or refreshesautomation/blog-traffic-digest. The Feishu card is delivery-only; the Markdown file and bot PR remain the source of truth.
The status PR is intentionally not auto-merged. A maintainer reviews each refresh so the daily diff is part of the team's awareness of search-side health.
Files
apps/landing-page/scripts/blog-indexing/lib.ts— GSC auth, URL Inspection helper, Search Analytics helper, sitemap helper, retry wrapper, type defs.apps/landing-page/scripts/blog-indexing/detect-changed-urls.ts— diff a deploy commit against its parent for added / modified blog files.apps/landing-page/scripts/blog-indexing/verify-readiness.ts— HTTP, canonical, noindex, and sitemap presence checks; polls until Cloudflare propagation completes.apps/landing-page/scripts/blog-indexing/lint-blog-seo.ts— source/rendered SEO lint for changed posts in CI.apps/landing-page/scripts/blog-indexing/check-blog-url-changes.ts— prevents slug deletes/renames without redirects.apps/landing-page/scripts/blog-indexing/submit-indexnow.ts— submits changed/stalled blog URLs to IndexNow-compatible engines.apps/landing-page/scripts/blog-indexing/submit-sitemap.ts— PUT the sitemap to Search Console (one call per deploy).apps/landing-page/scripts/blog-indexing/inspect-urls.ts— call URL Inspection API per URL; emitInspectionRecord[].apps/landing-page/scripts/blog-indexing/query-search-analytics.ts— query URL-level 7d/28d impressions, clicks, CTR, and position.apps/landing-page/scripts/blog-indexing/render-status.ts— rewritedocs/blog-indexing-status.mdfrom the JSON sidecar.apps/landing-page/scripts/blog-indexing/scheduled-window.ts— emit URLs in today's T+1 / T+3 / T+7 / T+14 buckets.apps/landing-page/scripts/blog-indexing/escalate-stalls.ts— decide whether the stall issue needs to open / refresh / close.apps/landing-page/scripts/blog-indexing/escalate-low-traffic.ts— decide whether indexed-but-zero-impression posts need a traffic issue.apps/landing-page/scripts/blog-indexing/crosspost.ts— dry-run/token-gated dev.to or Hashnode cross-post scaffold.apps/landing-page/scripts/blog-indexing/report-3day.ts— daily T-3 cohort + 30-day rolling cohort digest written todocs/blog-traffic-digest.md.apps/landing-page/scripts/blog-indexing/post-feishu-digest.ts— send the compact 3-day digest summary to the optional Feishu custom bot webhook.apps/landing-page/app/pages/rss.xml.tsapps/landing-page/public/llms.txtapps/landing-page/public/_redirects.github/workflows/blog-indexing-on-deploy.yml.github/workflows/blog-indexing-monitor.yml.github/workflows/blog-3day-report.ymldocs/blog-indexing-status.md— human view (auto-generated)docs/blog-indexing-status.json— canonical state (auto-generated)docs/blog-traffic-digest.md— daily traffic digest (auto-generated)
The JSON state records firstInspectedAt as the first time automation
successfully captured an inspection for a URL. It is not Google's
first-discovery time; escalation scripts prefer the post frontmatter date
for age windows and only use this inspection timestamp as a fallback.