feat(landing): automate blog indexing monitoring (#1825)

* feat(landing): add blog indexing automation

Automate supported blog discovery checks through sitemap submission, URL Inspection monitoring, IndexNow notifications, and guarded SEO CI checks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(landing): support oauth for blog indexing

Use OAuth refresh-token auth as the preferred Search Console path while keeping service-account auth as a fallback, so the indexing workflows can run despite GSC service-account invite issues.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(landing): tighten blog indexing observability

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
ashleyashli 2026-05-15 18:32:30 +08:00 committed by GitHub
parent 25aeb0bf49
commit 772ef97476
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
25 changed files with 2998 additions and 0 deletions

View file

@ -0,0 +1,306 @@
name: blog-indexing-monitor
# Daily check: for every blog post that is 1, 3, 7, or 14 days old
# (counting from the first commit that introduced the file on main),
# call URL Inspection and refresh `docs/blog-indexing-status.md`.
#
# This is the monitoring half of the blog-indexing-automation skill.
# We do NOT call any "request indexing" API for normal blog posts —
# Google's Indexing API does not support that. When a URL stalls in
# `Discovered - currently not indexed` past T+7 we open / refresh a
# tracking issue so a human can fix the underlying SEO/content problem.
on:
schedule:
- cron: '0 2 * * *'
workflow_dispatch:
inputs:
crosspost_url:
description: 'Optional canonical blog URL to dry-run or publish as a cross-post.'
required: false
crosspost_platform:
description: 'Cross-post platform: devto or hashnode.'
required: false
default: devto
publish_crosspost:
description: 'Set true to publish. Default false performs dry-run only.'
required: false
default: 'false'
permissions:
contents: read
issues: write
concurrency:
group: blog-indexing-monitor
cancel-in-progress: false
jobs:
monitor:
name: Monitor recent blog URLs
if: github.repository == 'nexu-io/open-design'
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6.0.2
with:
fetch-depth: 0
- name: Setup pnpm
uses: pnpm/action-setup@v5
with:
version: 10.33.2
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24
cache: pnpm
- name: Install dependencies (landing-page only)
run: pnpm install --frozen-lockfile --filter @open-design/landing-page
- name: Check indexing configuration
id: config
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
BOT_APP_ID: ${{ secrets.BOT_APP_ID }}
BOT_APP_PRIVATE_KEY: ${{ secrets.BOT_APP_PRIVATE_KEY }}
run: |
gsc=false
bot=false
if { [ -n "$GSC_OAUTH_CLIENT_ID" ] && [ -n "$GSC_OAUTH_CLIENT_SECRET" ] && [ -n "$GSC_OAUTH_REFRESH_TOKEN" ]; } || [ -n "$GSC_SERVICE_ACCOUNT_KEY" ]; then
gsc=true
fi
if [ -n "$BOT_APP_ID" ] && [ -n "$BOT_APP_PRIVATE_KEY" ]; then
bot=true
fi
echo "gsc=$gsc" >> "$GITHUB_OUTPUT"
echo "bot=$bot" >> "$GITHUB_OUTPUT"
if [ "$gsc" != "true" ]; then
echo "::warning title=Blog indexing monitor not fully configured::GSC auth is missing; URL Inspection, Search Analytics, status refresh, and escalation checks will be skipped."
fi
if [ "$bot" != "true" ]; then
echo "::warning title=Blog indexing monitor writes disabled::Open Design bot secrets are missing; issue and status PR writes will be skipped."
fi
{
echo "### Blog indexing monitor configuration"
echo "- GSC auth configured: \`$gsc\`"
echo "- Open Design bot configured: \`$bot\`"
if [ "$gsc" != "true" ]; then
echo ""
echo "URL Inspection, Search Analytics, and status refresh steps will be skipped."
fi
if [ "$bot" != "true" ]; then
echo ""
echo "Issue and status PR writes will be skipped."
fi
} >> "$GITHUB_STEP_SUMMARY"
- name: Restore pending indexing status state
run: |
if ! git fetch origin automation/blog-indexing-status:refs/remotes/origin/automation/blog-indexing-status; then
{
echo "### Blog indexing status restore"
echo "No pending \`automation/blog-indexing-status\` branch was found. Starting from the committed status files."
} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
if git checkout refs/remotes/origin/automation/blog-indexing-status -- \
docs/blog-indexing-status.md \
docs/blog-indexing-status.json; then
{
echo "### Blog indexing status restore"
echo "Restored pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
else
{
echo "### Blog indexing status restore"
echo "Failed to restore pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
echo "::error title=Blog indexing status restore failed::Fetched automation/blog-indexing-status but could not checkout the status files."
exit 1
fi
- name: Compute today's inspection window
id: window
run: |
mkdir -p .blog-indexing
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/scheduled-window.ts \
--out ../../.blog-indexing/window.json
echo '--- window.json ---'
cat .blog-indexing/window.json
count=$(node -e "console.log(require('./.blog-indexing/window.json').urls.length)")
echo "count=$count" >> "$GITHUB_OUTPUT"
- name: Inspect URLs in window
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
node -e "const j=require('./.blog-indexing/window.json');require('fs').writeFileSync('./.blog-indexing/window-urls.json',JSON.stringify({urls:j.urls},null,2))"
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/inspect-urls.ts \
--urls ../../.blog-indexing/window-urls.json \
--out ../../.blog-indexing/inspections.json
- name: Query Search Console traffic
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/query-search-analytics.ts \
--urls ../../.blog-indexing/window-urls.json \
--out ../../.blog-indexing/analytics.json
- name: Render status report
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/render-status.ts \
--inspections ../../.blog-indexing/inspections.json \
--analytics ../../.blog-indexing/analytics.json
- name: Compute stalls
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true'
id: stalls
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/escalate-stalls.ts \
--out ../../.blog-indexing/stalls.json
should=$(node -e "console.log(require('./.blog-indexing/stalls.json').shouldEscalate)")
echo "should=$should" >> "$GITHUB_OUTPUT"
- name: Re-submit stalled URLs to IndexNow
if: steps.stalls.outputs.should == 'true' && steps.config.outputs.gsc == 'true'
run: |
node -e "const j=require('./.blog-indexing/stalls.json');require('fs').writeFileSync('./.blog-indexing/stalled-urls.json',JSON.stringify({urls:j.stalled.map(s=>s.url)},null,2))"
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/submit-indexnow.ts \
--urls ../../.blog-indexing/stalled-urls.json \
--out ../../.blog-indexing/stalled-indexnow.json
- name: Compute low-traffic posts
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true'
id: lowtraffic
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/escalate-low-traffic.ts \
--out ../../.blog-indexing/low-traffic.json
should=$(node -e "console.log(require('./.blog-indexing/low-traffic.json').shouldEscalate)")
echo "should=$should" >> "$GITHUB_OUTPUT"
- name: Generate Open Design bot token
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
id: open-design-bot-token
uses: actions/create-github-app-token@v2
with:
app-id: ${{ secrets.BOT_APP_ID }}
private-key: ${{ secrets.BOT_APP_PRIVATE_KEY }}
owner: nexu-io
repositories: open-design
permission-contents: write
permission-pull-requests: write
permission-issues: write
- name: Open / refresh stall issue
if: steps.stalls.outputs.should == 'true' && steps.config.outputs.bot == 'true'
env:
GH_TOKEN: ${{ steps.open-design-bot-token.outputs.token }}
run: |
title=$(node -e "console.log(require('./.blog-indexing/stalls.json').issueTitle)")
body_file=.blog-indexing/issue-body.md
node -e "require('fs').writeFileSync('${body_file}',require('./.blog-indexing/stalls.json').issueBody)"
existing=$(gh issue list --repo nexu-io/open-design --state open --search "in:title \"${title}\"" --json number --jq '.[0].number // empty')
if [ -n "$existing" ]; then
echo "Updating issue #$existing"
gh issue edit "$existing" --repo nexu-io/open-design --title "$title" --body-file "$body_file"
gh issue comment "$existing" --repo nexu-io/open-design --body-file "$body_file"
else
echo "Opening new issue"
gh issue create --repo nexu-io/open-design --title "$title" --body-file "$body_file"
fi
- name: Open / refresh low-traffic issue
if: steps.lowtraffic.outputs.should == 'true' && steps.config.outputs.bot == 'true'
env:
GH_TOKEN: ${{ steps.open-design-bot-token.outputs.token }}
run: |
title=$(node -e "console.log(require('./.blog-indexing/low-traffic.json').issueTitle)")
body_file=.blog-indexing/low-traffic-issue-body.md
node -e "require('fs').writeFileSync('${body_file}',require('./.blog-indexing/low-traffic.json').issueBody)"
existing=$(gh issue list --repo nexu-io/open-design --state open --search "in:title \"${title}\"" --json number --jq '.[0].number // empty')
if [ -n "$existing" ]; then
gh issue edit "$existing" --repo nexu-io/open-design --title "$title" --body-file "$body_file"
gh issue comment "$existing" --repo nexu-io/open-design --body-file "$body_file"
else
gh issue create --repo nexu-io/open-design --title "$title" --body-file "$body_file"
fi
- name: Close stall issue if all clear
if: steps.stalls.outputs.should == 'false' && steps.window.outputs.count != '0' && steps.config.outputs.bot == 'true'
env:
GH_TOKEN: ${{ steps.open-design-bot-token.outputs.token }}
run: |
existing=$(gh issue list --repo nexu-io/open-design --state open --search 'in:title "Blog indexing — URLs stalled in Search Console"' --json number --jq '.[0].number // empty')
if [ -n "$existing" ]; then
gh issue comment "$existing" --repo nexu-io/open-design --body 'All previously stalled URLs have reached `indexed` status. Closing automatically. — `blog-indexing-monitor`'
gh issue close "$existing" --repo nexu-io/open-design
fi
- name: Close low-traffic issue if all clear
if: steps.lowtraffic.outputs.should == 'false' && steps.window.outputs.count != '0' && steps.config.outputs.bot == 'true'
env:
GH_TOKEN: ${{ steps.open-design-bot-token.outputs.token }}
run: |
existing=$(gh issue list --repo nexu-io/open-design --state open --search 'in:title "Blog traffic — indexed posts with zero impressions"' --json number --jq '.[0].number // empty')
if [ -n "$existing" ]; then
gh issue comment "$existing" --repo nexu-io/open-design --body 'All previously low-traffic tracked URLs now have impressions or no longer match the escalation window. Closing automatically. — `blog-indexing-monitor`'
gh issue close "$existing" --repo nexu-io/open-design
fi
- name: Open status PR
if: steps.window.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
uses: peter-evans/create-pull-request@v8
with:
token: ${{ steps.open-design-bot-token.outputs.token }}
add-paths: |
docs/blog-indexing-status.md
docs/blog-indexing-status.json
branch: automation/blog-indexing-status
delete-branch: true
commit-message: 'docs(blog): refresh daily indexing status'
author: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
committer: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
title: 'docs(blog): refresh daily indexing status'
body: |
Daily refresh of `docs/blog-indexing-status.md` with URL
Inspection verdicts for posts in the T+1 / T+3 / T+7 / T+14
window today.
Generated by `.github/workflows/blog-indexing-monitor.yml`.
See [docs/blog-indexing-automation.md](../docs/blog-indexing-automation.md)
for the architecture.
- name: Optional cross-post scaffold
if: github.event_name == 'workflow_dispatch' && github.event.inputs.crosspost_url != ''
env:
DEVTO_API_KEY: ${{ secrets.DEVTO_API_KEY }}
HASHNODE_TOKEN: ${{ secrets.HASHNODE_TOKEN }}
HASHNODE_PUBLICATION_ID: ${{ secrets.HASHNODE_PUBLICATION_ID }}
run: |
publish_flag=""
if [ "${{ github.event.inputs.publish_crosspost }}" = "true" ]; then
publish_flag="--publish"
fi
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/crosspost.ts \
--url "${{ github.event.inputs.crosspost_url }}" \
--platform "${{ github.event.inputs.crosspost_platform }}" \
$publish_flag

View file

@ -0,0 +1,265 @@
name: blog-indexing-on-deploy
# Runs after every successful `landing-page-deploy` on main. The job is
# idempotent and follows the blog-indexing-automation skill:
#
# 1. Detect blog URLs added/modified in the deploy
# 2. Verify each URL is operationally ready (200, no noindex, canonical, in sitemap)
# 3. Submit the sitemap-index to Google Search Console (one call per deploy)
# 4. Capture a baseline URL Inspection per new URL (monitoring, not submission)
# 5. Open a PR that updates docs/blog-indexing-status.{md,json}
#
# Indexing is NOT requested via API — Google's Indexing API does not
# support normal blog content. The skill explicitly forbids UI
# automation against Search Console. Operationally, sitemap submission
# + healthy internal linking is what makes URLs discoverable.
on:
workflow_run:
workflows: ['landing-page-deploy']
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
head_sha:
description: 'Commit SHA to diff against its parent. Defaults to HEAD on main.'
required: false
base_sha:
description: 'Optional base SHA for multi-commit deploy diffs.'
required: false
permissions:
contents: read
concurrency:
group: blog-indexing-on-deploy
cancel-in-progress: false
jobs:
index:
name: Index newly deployed blog posts
if: >-
github.repository == 'nexu-io/open-design'
&& (
github.event_name == 'workflow_dispatch'
|| github.event.workflow_run.conclusion == 'success'
)
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6.0.2
with:
# Need history to diff against the parent commit and to
# compute first-commit dates for posts.
fetch-depth: 0
ref: >-
${{
github.event.inputs.head_sha
|| github.event.workflow_run.head_sha
|| github.sha
}}
- name: Setup pnpm
uses: pnpm/action-setup@v5
with:
version: 10.33.2
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24
cache: pnpm
- name: Install dependencies (landing-page only)
run: pnpm install --frozen-lockfile --filter @open-design/landing-page
- name: Check indexing configuration
id: config
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
BOT_APP_ID: ${{ secrets.BOT_APP_ID }}
BOT_APP_PRIVATE_KEY: ${{ secrets.BOT_APP_PRIVATE_KEY }}
run: |
gsc=false
bot=false
if { [ -n "$GSC_OAUTH_CLIENT_ID" ] && [ -n "$GSC_OAUTH_CLIENT_SECRET" ] && [ -n "$GSC_OAUTH_REFRESH_TOKEN" ]; } || [ -n "$GSC_SERVICE_ACCOUNT_KEY" ]; then
gsc=true
fi
if [ -n "$BOT_APP_ID" ] && [ -n "$BOT_APP_PRIVATE_KEY" ]; then
bot=true
fi
echo "gsc=$gsc" >> "$GITHUB_OUTPUT"
echo "bot=$bot" >> "$GITHUB_OUTPUT"
if [ "$gsc" != "true" ]; then
echo "::warning title=Blog indexing not fully configured::GSC auth is missing; sitemap submission, URL Inspection, Search Analytics, and status rendering will be skipped."
fi
if [ "$bot" != "true" ]; then
echo "::warning title=Blog indexing status PR disabled::Open Design bot secrets are missing; status PR creation will be skipped."
fi
{
echo "### Blog indexing configuration"
echo "- GSC auth configured: \`$gsc\`"
echo "- Open Design bot configured: \`$bot\`"
if [ "$gsc" != "true" ]; then
echo ""
echo "GSC-dependent sitemap submission and URL Inspection steps will be skipped."
fi
if [ "$bot" != "true" ]; then
echo ""
echo "Status PR creation will be skipped."
fi
} >> "$GITHUB_STEP_SUMMARY"
- name: Restore pending indexing status state
run: |
if ! git fetch origin automation/blog-indexing-status:refs/remotes/origin/automation/blog-indexing-status; then
{
echo "### Blog indexing status restore"
echo "No pending \`automation/blog-indexing-status\` branch was found. Starting from the committed status files."
} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
if git checkout refs/remotes/origin/automation/blog-indexing-status -- \
docs/blog-indexing-status.md \
docs/blog-indexing-status.json; then
{
echo "### Blog indexing status restore"
echo "Restored pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
else
{
echo "### Blog indexing status restore"
echo "Failed to restore pending status files from \`automation/blog-indexing-status\`."
} >> "$GITHUB_STEP_SUMMARY"
echo "::error title=Blog indexing status restore failed::Fetched automation/blog-indexing-status but could not checkout the status files."
exit 1
fi
- name: Detect changed blog URLs
id: detect
run: |
mkdir -p .blog-indexing
BASE="${{ github.event.inputs.base_sha || '' }}"
if [ -z "$BASE" ]; then
BASE="$(git rev-parse HEAD^)"
fi
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/detect-changed-urls.ts \
--base "$BASE" \
--head HEAD \
--out ../../.blog-indexing/changed-urls.json
echo '--- changed-urls.json ---'
cat .blog-indexing/changed-urls.json
count=$(node -e "const j=require('./.blog-indexing/changed-urls.json');console.log((j.addedUrls.length+j.modifiedUrls.length))")
echo "count=$count" >> "$GITHUB_OUTPUT"
- name: Verify readiness
if: steps.detect.outputs.count != '0'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/verify-readiness.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/readiness.json \
--timeout-ms 240000
- name: Submit URLs to IndexNow
if: steps.detect.outputs.count != '0'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/submit-indexnow.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/indexnow.json
- name: Submit sitemap to Search Console
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/submit-sitemap.ts
- name: Inspect new URLs (baseline)
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/inspect-urls.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/inspections.json
echo '--- inspections.json ---'
cat .blog-indexing/inspections.json
- name: Query Search Console traffic (baseline)
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
env:
GSC_OAUTH_CLIENT_ID: ${{ secrets.GSC_OAUTH_CLIENT_ID }}
GSC_OAUTH_CLIENT_SECRET: ${{ secrets.GSC_OAUTH_CLIENT_SECRET }}
GSC_OAUTH_REFRESH_TOKEN: ${{ secrets.GSC_OAUTH_REFRESH_TOKEN }}
GSC_SERVICE_ACCOUNT_KEY: ${{ secrets.GSC_SERVICE_ACCOUNT_KEY }}
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/query-search-analytics.ts \
--urls ../../.blog-indexing/changed-urls.json \
--out ../../.blog-indexing/analytics.json
- name: Render status report
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true'
run: |
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/render-status.ts \
--inspections ../../.blog-indexing/inspections.json \
--analytics ../../.blog-indexing/analytics.json
- name: Upload artifacts
if: always() && steps.detect.outputs.count != '0'
uses: actions/upload-artifact@v4
with:
name: blog-indexing-${{ github.run_id }}
path: |
.blog-indexing/changed-urls.json
.blog-indexing/readiness.json
.blog-indexing/indexnow.json
.blog-indexing/inspections.json
.blog-indexing/analytics.json
if-no-files-found: ignore
retention-days: 30
- name: Generate Open Design bot token
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
id: open-design-bot-token
uses: actions/create-github-app-token@v2
with:
app-id: ${{ secrets.BOT_APP_ID }}
private-key: ${{ secrets.BOT_APP_PRIVATE_KEY }}
owner: nexu-io
repositories: open-design
permission-contents: write
permission-pull-requests: write
- name: Open status PR
if: steps.detect.outputs.count != '0' && steps.config.outputs.gsc == 'true' && steps.config.outputs.bot == 'true'
uses: peter-evans/create-pull-request@v8
with:
token: ${{ steps.open-design-bot-token.outputs.token }}
add-paths: |
docs/blog-indexing-status.md
docs/blog-indexing-status.json
branch: automation/blog-indexing-status
delete-branch: true
commit-message: 'docs(blog): refresh indexing status after deploy'
author: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
committer: 'open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>'
title: 'docs(blog): refresh indexing status after deploy'
body: |
Refreshes `docs/blog-indexing-status.md` with the URL Inspection
verdicts captured immediately after the latest landing-page deploy.
Generated by `.github/workflows/blog-indexing-on-deploy.yml`. The
sidecar `docs/blog-indexing-status.json` is the canonical state;
the markdown file is rendered from it.

View file

@ -6,6 +6,8 @@ on:
# Workflow files
- .github/workflows/landing-page-ci.yml
- .github/workflows/landing-page-deploy.yml
- .github/workflows/blog-indexing-on-deploy.yml
- .github/workflows/blog-indexing-monitor.yml
# Landing page sources
- apps/landing-page/**
# Content sources globbed by Astro content collections — without
@ -25,6 +27,8 @@ on:
paths:
- .github/workflows/landing-page-ci.yml
- .github/workflows/landing-page-deploy.yml
- .github/workflows/blog-indexing-on-deploy.yml
- .github/workflows/blog-indexing-monitor.yml
- apps/landing-page/**
- skills/**
- design-systems/**
@ -51,6 +55,8 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v6.0.2
with:
fetch-depth: 0
- name: Setup pnpm
uses: pnpm/action-setup@v5
@ -99,6 +105,27 @@ jobs:
- name: Build landing page
run: pnpm --filter @open-design/landing-page build
- name: Lint changed blog SEO
run: |
BASE="${{ github.event.pull_request.base.sha || github.event.before || 'HEAD^' }}"
if [ "$BASE" = "0000000000000000000000000000000000000000" ]; then
BASE="HEAD^"
fi
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/lint-blog-seo.ts \
--base "$BASE" \
--head HEAD \
--rendered-out apps/landing-page/out
- name: Guard blog URL changes
run: |
BASE="${{ github.event.pull_request.base.sha || github.event.before || 'HEAD^' }}"
if [ "$BASE" = "0000000000000000000000000000000000000000" ]; then
BASE="HEAD^"
fi
pnpm --filter @open-design/landing-page exec tsx scripts/blog-indexing/check-blog-url-changes.ts \
--base "$BASE" \
--head HEAD
- name: Verify zero external JavaScript
run: |
node <<'NODE'

View file

@ -0,0 +1 @@
96b0928121e24fd7b4ef85ae0f8bf1d8

View file

@ -0,0 +1,9 @@
# Cloudflare Pages redirects.
#
# Blog slug changes must add explicit redirects here so Google and
# inbound links do not see a 404. The CI guard accepts either:
#
# /blog/old-slug/ /blog/new-slug/ 301
# /blog/old-slug/ /blog/ 301
#
# Keep this file committed even when empty; it documents the contract.

View file

@ -0,0 +1,32 @@
# Open Design
Open Design is the open-source skill layer that turns local coding
agents into design engines. It is useful for readers researching
agent-native design workflows, local-first design tooling, BYOK design
passes, design systems as Markdown, and open-source alternatives to
hosted AI design tools.
## Canonical Entry Points
- Home: https://open-design.ai/
- Blog: https://open-design.ai/blog/
- Skills: https://open-design.ai/skills/
- Systems: https://open-design.ai/systems/
- Craft: https://open-design.ai/craft/
- Templates: https://open-design.ai/templates/
- RSS: https://open-design.ai/rss.xml
- Sitemap: https://open-design.ai/sitemap-index.xml
## Key Blog Posts
- https://open-design.ai/blog/open-source-alternative-to-claude-design/
- https://open-design.ai/blog/why-we-built-open-design-as-a-skill-layer/
- https://open-design.ai/blog/31-skills-72-systems-how-the-library-works/
- https://open-design.ai/blog/byok-design-workflow-claude-codex-qwen/
- https://open-design.ai/blog/byok-reality-check-5-things-that-break/
## Citation Guidance
Prefer canonical Open Design URLs above. Do not cite preview deploys,
GitHub source pages, or screenshot-only `/og/` routes when a canonical
page exists.

View file

@ -12,3 +12,7 @@ Allow: /
Disallow: /og/
Sitemap: https://open-design.ai/sitemap-index.xml
# Machine-readable discovery surfaces for feed readers and LLM crawlers.
# RSS: https://open-design.ai/rss.xml
# LLMs: https://open-design.ai/llms.txt

View file

@ -0,0 +1,105 @@
/*
* authorize-gsc-oauth one-time local helper to create a Google OAuth
* refresh token for Search Console automation.
*
* Usage:
* GSC_OAUTH_CLIENT_ID=... GSC_OAUTH_CLIENT_SECRET=... \
* tsx scripts/blog-indexing/authorize-gsc-oauth.ts --out /tmp/gsc-refresh-token.txt
*
* The script starts a loopback server, prints an authorization URL,
* exchanges the callback code, and writes ONLY the refresh token to
* `--out`. Do not commit the output file.
*/
import http from 'node:http';
import { writeFileSync } from 'node:fs';
import { fetchWithRetry } from './lib.ts';
const SCOPE = 'https://www.googleapis.com/auth/webmasters';
const REDIRECT_URI = 'http://127.0.0.1:17666/oauth2callback';
interface Args {
out: string;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--out') args.out = argv[++i];
}
if (!args.out) throw new Error('--out is required');
return args as Args;
}
function waitForCode(): Promise<string> {
return new Promise((resolve, reject) => {
const server = http.createServer((req, res) => {
try {
const url = new URL(req.url ?? '/', REDIRECT_URI);
if (url.pathname !== '/oauth2callback') {
res.writeHead(404);
res.end('Not found');
return;
}
const error = url.searchParams.get('error');
if (error) throw new Error(error);
const code = url.searchParams.get('code');
if (!code) throw new Error('No authorization code in callback.');
res.writeHead(200, { 'content-type': 'text/html; charset=utf-8' });
res.end('<h1>Open Design GSC authorization complete</h1><p>You can close this tab and return to Cursor.</p>');
server.close();
resolve(code);
} catch (err) {
server.close();
reject(err);
}
});
server.listen(17666, '127.0.0.1');
});
}
async function main() {
const args = parseArgs(process.argv.slice(2));
const clientId = process.env.GSC_OAUTH_CLIENT_ID;
const clientSecret = process.env.GSC_OAUTH_CLIENT_SECRET;
if (!clientId || !clientSecret) {
throw new Error('GSC_OAUTH_CLIENT_ID and GSC_OAUTH_CLIENT_SECRET are required.');
}
const authUrl = new URL('https://accounts.google.com/o/oauth2/v2/auth');
authUrl.searchParams.set('client_id', clientId);
authUrl.searchParams.set('redirect_uri', REDIRECT_URI);
authUrl.searchParams.set('response_type', 'code');
authUrl.searchParams.set('scope', SCOPE);
authUrl.searchParams.set('access_type', 'offline');
authUrl.searchParams.set('prompt', 'consent');
console.log('Open this URL in your browser and approve access:');
console.log(authUrl.toString());
const code = await waitForCode();
const res = await fetchWithRetry('https://oauth2.googleapis.com/token', {
method: 'POST',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
client_id: clientId,
client_secret: clientSecret,
code,
grant_type: 'authorization_code',
redirect_uri: REDIRECT_URI,
}),
});
if (!res.ok) {
throw new Error(`OAuth code exchange failed: ${res.status} ${await res.text()}`);
}
const body = (await res.json()) as { refresh_token?: string };
if (!body.refresh_token) {
throw new Error('Google did not return a refresh_token. Re-run with prompt=consent and ensure the app is in Testing with your email as a test user.');
}
writeFileSync(args.out, body.refresh_token);
console.log(`Refresh token written to ${args.out}`);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,99 @@
/*
* check-blog-url-changes protects live blog URLs from accidental 404s.
*
* Usage:
* tsx check-blog-url-changes.ts --base <sha> [--head <sha>]
*
* If a blog markdown file is deleted or renamed, the old slug must have
* an explicit redirect in apps/landing-page/public/_redirects.
*/
import { existsSync, readFileSync } from 'node:fs';
import path from 'node:path';
import {
REPO_ROOT,
assertSafeGitRef,
fileToSlug,
git,
isPostFile,
} from './lib.ts';
interface Args {
base?: string;
head: string;
}
function parseArgs(argv: string[]): Args {
const args: Args = { head: 'HEAD' };
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--base') args.base = argv[++i];
else if (argv[i] === '--head') args.head = argv[++i];
}
return args;
}
function redirectsText(): string {
const file = path.join(REPO_ROOT, 'apps/landing-page/public/_redirects');
return existsSync(file) ? readFileSync(file, 'utf8') : '';
}
function hasRedirect(redirects: string, oldSlug: string, newSlug?: string): boolean {
const oldPath = `/blog/${oldSlug}/`;
const lines = redirects
.split('\n')
.map((line) => line.trim())
.filter((line) => line && !line.startsWith('#'));
return lines.some((line) => {
const [from, to, status] = line.split(/\s+/);
if (from !== oldPath) return false;
if (status !== '301' && status !== '302') return false;
if (newSlug) return to === `/blog/${newSlug}/`;
return to.startsWith('/blog/');
});
}
function main() {
const args = parseArgs(process.argv.slice(2));
const head = assertSafeGitRef(args.head, 'head');
const base = assertSafeGitRef(args.base ?? `${head}^`, 'base');
const raw = git(
`diff --name-status ${base} ${head} -- apps/landing-page/app/content/blog/`,
);
const redirects = redirectsText();
const failures: string[] = [];
for (const line of raw.split('\n')) {
if (!line) continue;
const parts = line.split('\t');
const status = parts[0];
if (status.startsWith('R')) {
const [, oldFile, newFile] = parts;
if (!oldFile || !newFile || !isPostFile(oldFile)) continue;
const oldSlug = fileToSlug(oldFile);
const newSlug = isPostFile(newFile) ? fileToSlug(newFile) : undefined;
if (!hasRedirect(redirects, oldSlug, newSlug)) {
failures.push(
newSlug
? `renamed ${oldSlug} -> ${newSlug} but _redirects has no "/blog/${oldSlug}/ /blog/${newSlug}/ 301" entry`
: `renamed ${oldSlug} out of public blog routes but _redirects has no "/blog/${oldSlug}/ /blog/<target>/ 301" entry`,
);
}
} else if (status === 'D') {
const [, oldFile] = parts;
if (!oldFile || !isPostFile(oldFile)) continue;
const oldSlug = fileToSlug(oldFile);
if (!hasRedirect(redirects, oldSlug)) {
failures.push(
`deleted ${oldSlug} but _redirects has no "/blog/${oldSlug}/ /blog/<target>/ 301" entry`,
);
}
}
}
if (failures.length === 0) {
console.log('Blog URL change guard passed.');
return;
}
for (const failure of failures) console.error(`ERROR: ${failure}`);
process.exit(1);
}
main();

View file

@ -0,0 +1,126 @@
/*
* crosspost token-gated scaffold for publishing canonical copies to
* high-discovery platforms when a post stalls or needs distribution.
*
* Dry-run by default. It only sends network writes when BOTH are true:
* - --publish is passed
* - platform token exists (DEVTO_API_KEY / HASHNODE_TOKEN)
*
* Usage:
* tsx crosspost.ts --url https://open-design.ai/blog/foo/ --platform devto
* tsx crosspost.ts --url https://open-design.ai/blog/foo/ --platform devto --publish
*/
import { readFileSync } from 'node:fs';
import path from 'node:path';
import { BLOG_DIR, SITE, fetchWithRetry, slugFromUrl } from './lib.ts';
interface Args {
url: string;
platform: 'devto' | 'hashnode';
publish: boolean;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = { platform: 'devto', publish: false };
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--url') args.url = argv[++i];
else if (argv[i] === '--platform') args.platform = argv[++i] as Args['platform'];
else if (argv[i] === '--publish') args.publish = true;
}
if (!args.url) throw new Error('--url is required');
if (!['devto', 'hashnode'].includes(args.platform!)) {
throw new Error('--platform must be devto or hashnode');
}
return args as Args;
}
function parsePost(url: string): { title: string; summary: string; body: string; tags: string[] } {
const slug = slugFromUrl(url);
const raw = readFileSync(path.join(BLOG_DIR, `${slug}.md`), 'utf8');
const match = raw.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
if (!match) throw new Error(`No frontmatter for ${slug}`);
const data: Record<string, string> = {};
for (const line of match[1].split('\n')) {
const m = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
if (m) data[m[1]] = m[2].trim().replace(/^["']|["']$/g, '');
}
return {
title: data.title ?? slug,
summary: data.summary ?? '',
body: match[2].replace(/\]\(\//g, `](${SITE}/`),
tags: ['ai', 'design', 'opensource', 'agents'],
};
}
async function publishDevTo(url: string, post: ReturnType<typeof parsePost>, publish: boolean) {
const token = process.env.DEVTO_API_KEY;
const payload = {
article: {
title: post.title,
body_markdown: `${post.body}\n\n---\n\nOriginally published at ${url}`,
published: publish,
canonical_url: url,
description: post.summary,
tags: post.tags,
},
};
if (!publish || !token) return { dryRun: true, platform: 'devto', payload };
const res = await fetchWithRetry('https://dev.to/api/articles', {
method: 'POST',
headers: {
'api-key': token,
'content-type': 'application/json',
},
body: JSON.stringify(payload),
});
return { dryRun: false, platform: 'devto', status: res.status, body: await res.text() };
}
async function publishHashnode(url: string, post: ReturnType<typeof parsePost>, publish: boolean) {
const token = process.env.HASHNODE_TOKEN;
const publicationId = process.env.HASHNODE_PUBLICATION_ID;
const mutation = `
mutation PublishPost($input: PublishPostInput!) {
publishPost(input: $input) { post { id url } }
}
`;
const variables = {
input: {
publicationId,
title: post.title,
contentMarkdown: `${post.body}\n\n---\n\nOriginally published at ${url}`,
tags: post.tags.map((name) => ({ name, slug: name })),
originalArticleURL: url,
},
};
if (!publish || !token || !publicationId) {
return { dryRun: true, platform: 'hashnode', mutation, variables };
}
const res = await fetchWithRetry('https://gql.hashnode.com/', {
method: 'POST',
headers: {
authorization: token,
'content-type': 'application/json',
},
body: JSON.stringify({ query: mutation, variables }),
});
return { dryRun: false, platform: 'hashnode', status: res.status, body: await res.text() };
}
async function main() {
const args = parseArgs(process.argv.slice(2));
if (!args.url.startsWith(`${SITE}/blog/`)) {
throw new Error(`Refusing to cross-post off-site URL: ${args.url}`);
}
const post = parsePost(args.url);
const result =
args.platform === 'devto'
? await publishDevTo(args.url, post, args.publish)
: await publishHashnode(args.url, post, args.publish);
process.stdout.write(JSON.stringify(result, null, 2) + '\n');
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,82 @@
/*
* detect-changed-urls emit canonical URLs for blog posts ADDED or
* MODIFIED in `${BASE}..${HEAD}`.
*
* Usage: tsx detect-changed-urls.ts --base <sha> [--head <sha>] [--out file.json]
* Default base: `${HEAD}^`. Default head: HEAD. Default out: stdout.
*
* Output JSON shape:
*
* {
* "head": "<sha>",
* "base": "<sha>",
* "addedUrls": ["https://open-design.ai/blog/foo/"],
* "modifiedUrls": ["https://open-design.ai/blog/bar/"]
* }
*
* Underscore-prefixed files (e.g. `_topics.md`) are excluded they
* never become routes.
*/
import { writeFileSync } from 'node:fs';
import {
assertSafeGitRef,
blogSlugToUrl,
fileToSlug,
git,
isPostFile,
} from './lib.ts';
interface Args {
base?: string;
head: string;
out?: string;
}
function parseArgs(argv: string[]): Args {
const args: Args = { head: 'HEAD' };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--base') args.base = argv[++i];
else if (a === '--head') args.head = argv[++i];
else if (a === '--out') args.out = argv[++i];
}
return args;
}
function main() {
const { base: baseArg, head, out } = parseArgs(process.argv.slice(2));
const safeHead = assertSafeGitRef(head, 'head');
const base = assertSafeGitRef(baseArg ?? `${safeHead}^`, 'base');
// git diff --name-status emits lines like:
// A\tapps/landing-page/app/content/blog/foo.md
// M\tapps/landing-page/app/content/blog/bar.md
// D\tapps/landing-page/app/content/blog/old.md
// R100\tapps/landing-page/app/content/blog/old.md\tapps/landing-page/app/content/blog/new.md
const raw = git(
`diff --name-status ${base} ${safeHead} -- apps/landing-page/app/content/blog/`,
);
const added: string[] = [];
const modified: string[] = [];
for (const line of raw.split('\n')) {
if (!line) continue;
const [status, file, newFile] = line.split('\t');
const targetFile = status?.startsWith('R') ? newFile : file;
if (!status || !targetFile || !isPostFile(targetFile)) continue;
const url = blogSlugToUrl(fileToSlug(targetFile));
if (status === 'A' || status.startsWith('R')) added.push(url);
else if (status === 'M') modified.push(url);
}
const result = {
head: git(`rev-parse ${safeHead}`),
base: git(`rev-parse ${base}`),
addedUrls: added,
modifiedUrls: modified,
};
const json = JSON.stringify(result, null, 2);
if (out) writeFileSync(out, json + '\n');
else process.stdout.write(json + '\n');
}
main();

View file

@ -0,0 +1,122 @@
/*
* escalate-low-traffic opens a separate traffic issue when URLs are
* indexed but still earning no impressions after the T+14 window.
*
* Usage:
* tsx escalate-low-traffic.ts [--state file.json] [--out file.json] [--min-age-days 14]
*/
import { existsSync, readFileSync, writeFileSync } from 'node:fs';
import path from 'node:path';
import { type BlogIndexingState, BLOG_DIR, REPO_ROOT, slugFromUrl } from './lib.ts';
interface Args {
state: string;
out?: string;
minAgeDays: number;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--state') args.state = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
else if (argv[i] === '--min-age-days') args.minAgeDays = Number(argv[++i]);
}
args.state ??= path.join(REPO_ROOT, 'docs/blog-indexing-status.json');
args.minAgeDays ??= 14;
return args as Args;
}
function ageDays(firstInspectedAt?: string): number {
if (!firstInspectedAt) return 0;
const since = new Date(firstInspectedAt).getTime();
return Math.floor((Date.now() - since) / 86_400_000);
}
function frontmatterDate(slug: string): string | null {
const file = path.join(BLOG_DIR, `${slug}.md`);
if (!existsSync(file)) return null;
const raw = readFileSync(file, 'utf8');
return raw.match(/^---\n[\s\S]*?\ndate:\s*([0-9]{4}-[0-9]{2}-[0-9]{2})\b/m)?.[1] ?? null;
}
function ageDaysFromDate(date: string): number {
return Math.floor((Date.now() - new Date(`${date}T00:00:00Z`).getTime()) / 86_400_000);
}
function main() {
const { state: stateFile, out, minAgeDays } = parseArgs(process.argv.slice(2));
const state: BlogIndexingState = existsSync(stateFile)
? JSON.parse(readFileSync(stateFile, 'utf8'))
: { latest: {}, history: [] };
const lowTraffic: Array<{
url: string;
slug: string;
ageDays: number;
impressions7d: number;
impressions28d: number;
clicks28d: number;
position28d: number;
}> = [];
for (const [url, record] of Object.entries(state.latest ?? {})) {
if ('error' in record.result || !record.result.isIndexed) continue;
const slug = slugFromUrl(url);
const publishedAt = frontmatterDate(slug);
const firstInspectedAt =
state.firstInspectedAt?.[url] ?? state.firstSeenAt?.[url];
const age = publishedAt
? ageDaysFromDate(publishedAt)
: ageDays(firstInspectedAt);
if (age < minAgeDays) continue;
const perf7 = state.performance?.[url]?.['7'];
const perf28 = state.performance?.[url]?.['28'];
const impressions7d = perf7?.impressions ?? 0;
const impressions28d = perf28?.impressions ?? 0;
if (impressions7d > 0 || impressions28d > 0) continue;
lowTraffic.push({
url,
slug,
ageDays: age,
impressions7d,
impressions28d,
clicks28d: perf28?.clicks ?? 0,
position28d: perf28?.position ?? 0,
});
}
const issueTitle = 'Blog traffic — indexed posts with zero impressions';
const issueBody = lowTraffic.length
? [
'The daily blog monitor found posts that are indexed but still have zero Google Search impressions after the T+14 window.',
'',
'| URL | Age (days) | 7d impressions | 28d impressions | 28d clicks | 28d avg position |',
'|---|---:|---:|---:|---:|---:|',
...lowTraffic.map(
(p) =>
`| ${p.url} | ${p.ageDays} | ${p.impressions7d} | ${p.impressions28d} | ${p.clicks28d} | ${p.position28d || '—'} |`,
),
'',
'This is not an indexing bug. Treat it as a distribution / query-fit issue:',
'',
'1. Re-check title and summary against the target query.',
'2. Add internal links from `/blog/` and the closest related posts.',
'3. Consider cross-posting with canonical URL back to Open Design.',
'4. Re-score the topic in `_topics.md` if it still has zero impressions after 28 days.',
'',
'This issue is generated by `.github/workflows/blog-indexing-monitor.yml`.',
].join('\n')
: '';
const result = {
shouldEscalate: lowTraffic.length > 0,
lowTraffic,
issueTitle,
issueBody,
};
const json = JSON.stringify(result, null, 2);
if (out) writeFileSync(out, json + '\n');
else process.stdout.write(json + '\n');
}
main();

View file

@ -0,0 +1,158 @@
/*
* escalate-stalls reads the indexing state file and decides whether
* a stall issue needs to be opened/updated.
*
* Usage: tsx escalate-stalls.ts [--state file.json] [--out file.json] [--min-age-days 7]
*
* A URL counts as "stalled" when ALL of the following hold:
* - we have at least one inspection on file
* - the latest verdict is NOT `indexed`
* - the latest coverage state contains `Discovered - currently not indexed`
* OR `Crawled - currently not indexed`
* - the post is at least `--min-age-days` old (default 7)
*
* Output JSON:
*
* {
* "shouldEscalate": boolean,
* "stalled": [{ url, slug, coverageState, ageDays, lastInspected }],
* "issueTitle": string,
* "issueBody": string
* }
*
* The calling workflow uses `gh issue list` to find an existing open
* issue with the same title; if found it updates the body, otherwise
* it opens a new one.
*/
import { existsSync, readFileSync, writeFileSync } from 'node:fs';
import path from 'node:path';
import {
type BlogIndexingState,
BLOG_DIR,
REPO_ROOT,
slugFromUrl,
} from './lib.ts';
interface Args {
state: string;
out?: string;
minAgeDays: number;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--state') args.state = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
else if (argv[i] === '--min-age-days') args.minAgeDays = Number(argv[++i]);
}
args.state ??= path.join(REPO_ROOT, 'docs/blog-indexing-status.json');
args.minAgeDays ??= 7;
return args as Args;
}
function frontmatterDate(slug: string): string | null {
const file = path.join(BLOG_DIR, `${slug}.md`);
if (!existsSync(file)) return null;
const raw = readFileSync(file, 'utf8');
return raw.match(/^---\n[\s\S]*?\ndate:\s*([0-9]{4}-[0-9]{2}-[0-9]{2})\b/m)?.[1] ?? null;
}
function ageDaysFromDate(addedAt: string): number {
const a = Date.now();
const b = new Date(addedAt + 'T00:00:00Z').getTime();
return Math.floor((a - b) / 86_400_000);
}
function ageDaysFromIso(iso: string): number {
return Math.floor((Date.now() - new Date(iso).getTime()) / 86_400_000);
}
function postAgeDays(slug: string, record: { url: string; inspectedAt: string }, state: BlogIndexingState): number {
const publishedAt = frontmatterDate(slug);
if (publishedAt) return ageDaysFromDate(publishedAt);
const firstInspectedAt =
state.firstInspectedAt?.[record.url] ?? state.firstSeenAt?.[record.url];
if (firstInspectedAt) return ageDaysFromIso(firstInspectedAt);
return ageDaysFromDate(record.inspectedAt.slice(0, 10));
}
function isStallCoverage(coverageState: string): boolean {
return /Discovered - currently not indexed|Crawled - currently not indexed/i.test(
coverageState,
);
}
function main() {
const { state: stateFile, out, minAgeDays } = parseArgs(process.argv.slice(2));
const state: BlogIndexingState = existsSync(stateFile)
? JSON.parse(readFileSync(stateFile, 'utf8'))
: { latest: {} };
const stalled: Array<{
url: string;
slug: string;
coverageState: string;
ageDays: number;
lastInspected: string;
}> = [];
for (const record of Object.values(state.latest)) {
if ('error' in record.result) continue;
const r = record.result;
if (r.isIndexed) continue;
if (!isStallCoverage(r.coverageState)) continue;
const slug = slugFromUrl(record.url);
const age = postAgeDays(slug, record, state);
if (age < minAgeDays) continue;
stalled.push({
url: record.url,
slug,
coverageState: r.coverageState,
ageDays: age,
lastInspected: record.inspectedAt.slice(0, 10),
});
}
const issueTitle = 'Blog indexing — URLs stalled in Search Console';
const issueBody = stalled.length
? [
'The post-deploy + scheduled indexing monitor has detected blog URLs that Google has discovered but is not indexing past the T+7 window.',
'',
'| URL | Coverage state | Age (days) | Last inspected |',
'|---|---|---|---|',
...stalled.map(
(s) => `| ${s.url} | ${s.coverageState} | ${s.ageDays} | ${s.lastInspected} |`,
),
'',
'Likely causes (per blog-indexing-automation skill, Step 5):',
'',
'- thin or duplicate content (Google decided not to index)',
'- canonical or hreflang signal Google disagrees with',
'- low internal linking from indexed pages',
'- crawl-budget pressure (resolves on its own for healthy sites)',
'',
'Resolution path:',
'',
'1. Open each URL in [Search Console URL Inspection](https://search.google.com/search-console/inspect?resource_id=sc-domain%3Aopen-design.ai)',
'2. Confirm the rendered HTML matches what we ship (live test).',
'3. If the page looks fine, improve the underlying SEO/content signals: title/query fit, internal links, canonical clarity, and content depth.',
'4. Redeploy the fix, then let the scheduled monitor re-inspect the URL.',
'',
'This issue is auto-updated by `.github/workflows/blog-indexing-monitor.yml`. It will close itself once all listed URLs reach `indexed` status.',
].join('\n')
: '';
const result = {
shouldEscalate: stalled.length > 0,
stalled,
issueTitle,
issueBody,
};
const json = JSON.stringify(result, null, 2);
if (out) writeFileSync(out, json + '\n');
else process.stdout.write(json + '\n');
}
main();

View file

@ -0,0 +1,73 @@
/*
* inspect-urls calls GSC URL Inspection API for each URL and writes
* an InspectionRecord[] JSON.
*
* Usage: tsx inspect-urls.ts --urls <file-or-csv> [--out file.json]
*
* `--urls` accepts either:
* - a JSON file with shape { addedUrls: string[], modifiedUrls?: string[] }
* OR { urls: string[] } OR string[]
* - a comma-separated list of URLs (e.g. when called with @{job.outputs})
*
* Failures on individual URLs are recorded inline as `{ error: string }`
* one bad URL doesn't take the rest of the batch down.
*/
import { readFileSync, writeFileSync } from 'node:fs';
import { type InspectionRecord, fileExists, inspectUrl } from './lib.ts';
interface Args {
urls: string;
out?: string;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--urls') args.urls = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
}
if (!args.urls) throw new Error('--urls is required');
return args as Args;
}
function loadUrls(input: string): string[] {
if (fileExists(input)) {
const raw = JSON.parse(readFileSync(input, 'utf8'));
if (Array.isArray(raw)) return raw as string[];
if (Array.isArray(raw.urls)) return raw.urls as string[];
return [...(raw.addedUrls ?? []), ...(raw.modifiedUrls ?? [])] as string[];
}
return input
.split(',')
.map((s) => s.trim())
.filter(Boolean);
}
async function main() {
const { urls: urlsArg, out } = parseArgs(process.argv.slice(2));
const urls = [...new Set(loadUrls(urlsArg))];
const records: InspectionRecord[] = [];
for (const url of urls) {
const inspectedAt = new Date().toISOString();
try {
const result = await inspectUrl(url);
records.push({ url, inspectedAt, result });
} catch (err) {
records.push({
url,
inspectedAt,
result: { error: (err as Error).message },
});
}
}
const json = JSON.stringify(records, null, 2);
if (out) writeFileSync(out, json + '\n');
else process.stdout.write(json + '\n');
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,464 @@
/*
* Blog indexing shared helpers.
*
* One-stop module for the post-deploy / cron indexing scripts. Keeps
* the surface tiny so each task script (detect-changed-urls,
* verify-readiness, submit-sitemap, inspect-urls, render-status,
* scheduled-window) stays focused.
*
* Authoritative reference: ~/.codex/skills/blog-indexing-automation/SKILL.md.
*
* - Treat URL Inspection as a monitoring API, not a submission API.
* - Treat Google Indexing API as out of scope for normal blog posts.
* - One sitemap submission per deploy, not one per URL.
*
* Auth supports two modes:
* 1. OAuth user refresh token:
* `GSC_OAUTH_CLIENT_ID`, `GSC_OAUTH_CLIENT_SECRET`,
* `GSC_OAUTH_REFRESH_TOKEN`
* 2. Service account JSON:
* `GSC_SERVICE_ACCOUNT_KEY`
*
* Prefer OAuth while Search Console has intermittent "service account
* email not found" bugs in the Users and permissions UI.
*/
import { execSync } from 'node:child_process';
import { createSign } from 'node:crypto';
import { existsSync, readFileSync } from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
export const SITE = 'https://open-design.ai';
export const GSC_SITE_URL = 'sc-domain:open-design.ai';
export const SITEMAP_URL = `${SITE}/sitemap-index.xml`;
export const SITEMAP_CHILD_URL = `${SITE}/sitemap-0.xml`;
export const INDEXNOW_KEY = '96b0928121e24fd7b4ef85ae0f8bf1d8';
export const INDEXNOW_KEY_LOCATION = `${SITE}/${INDEXNOW_KEY}.txt`;
const HERE = path.dirname(fileURLToPath(import.meta.url));
export const REPO_ROOT = path.resolve(HERE, '../../../..');
export const BLOG_DIR = path.join(
REPO_ROOT,
'apps/landing-page/app/content/blog',
);
export interface ServiceAccountKey {
client_email: string;
private_key: string;
token_uri?: string;
}
interface TokenResponse {
access_token: string;
expires_in: number;
}
export interface InspectionVerdict {
/** Pass-through verdict from URL Inspection API. */
verdict: 'PASS' | 'PARTIAL' | 'FAIL' | 'NEUTRAL' | 'VERDICT_UNSPECIFIED';
coverageState: string;
pageFetchState?: string;
indexingState?: string;
lastCrawlTime?: string;
googleCanonical?: string;
userCanonical?: string;
robotsTxtState?: string;
/** True when Google has indexed the URL. */
isIndexed: boolean;
}
export interface InspectionRecord {
url: string;
inspectedAt: string;
result: InspectionVerdict | { error: string };
}
export interface ReadinessResult {
url: string;
ok: boolean;
failures: string[];
status?: number;
canonical?: string;
}
export interface SearchAnalyticsRecord {
url: string;
queriedAt: string;
windowDays: 7 | 28;
startDate: string;
endDate: string;
clicks: number;
impressions: number;
ctr: number;
position: number;
}
export interface BlogIndexingState {
/** url -> latest URL Inspection record */
latest: Record<string, InspectionRecord>;
/** newest first, capped by renderer */
history: InspectionRecord[];
/** url -> window -> latest Search Analytics record */
performance?: Record<string, Partial<Record<'7' | '28', SearchAnalyticsRecord>>>;
/** url -> ISO timestamp first inspected by the indexing workflows */
firstInspectedAt?: Record<string, string>;
/** @deprecated Migrated to firstInspectedAt. Kept for pending status branch reads. */
firstSeenAt?: Record<string, string>;
}
export function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
/**
* Fetch wrapper with conservative retry/backoff for the flaky parts of
* this automation (Google APIs, IndexNow, live sitemap polling).
*/
export async function fetchWithRetry(
url: string,
init: RequestInit = {},
options: { attempts?: number; baseDelayMs?: number } = {},
): Promise<Response> {
const attempts = options.attempts ?? 3;
const baseDelayMs = options.baseDelayMs ?? 1_000;
let lastError: unknown;
for (let attempt = 1; attempt <= attempts; attempt++) {
try {
const res = await fetch(url, init);
if (res.ok || ![408, 429, 500, 502, 503, 504].includes(res.status)) {
return res;
}
lastError = new Error(`${res.status} ${await res.text()}`);
} catch (err) {
lastError = err;
}
if (attempt < attempts) {
await sleep(baseDelayMs * 2 ** (attempt - 1));
}
}
throw lastError instanceof Error ? lastError : new Error(String(lastError));
}
/* ----------------------------- auth ----------------------------- */
let cachedToken: { token: string; expiresAt: number } | null = null;
/**
* Returns a Google OAuth2 access token for the service account in
* `GSC_SERVICE_ACCOUNT_KEY`. Caches in-process for ~50 minutes.
*
* Tokens are JWT-signed locally (RS256) and exchanged with Google's
* OAuth2 endpoint. We avoid the full `googleapis` package to keep the
* landing-page workspace dep-free for what is purely a CI surface.
*/
export async function getAccessToken(): Promise<string> {
if (cachedToken && Date.now() < cachedToken.expiresAt) {
return cachedToken.token;
}
const oauthToken = await getOAuthAccessToken();
if (oauthToken) return oauthToken;
return getServiceAccountAccessToken();
}
async function getOAuthAccessToken(): Promise<string | null> {
const clientId = process.env.GSC_OAUTH_CLIENT_ID;
const clientSecret = process.env.GSC_OAUTH_CLIENT_SECRET;
const refreshToken = process.env.GSC_OAUTH_REFRESH_TOKEN;
if (!clientId || !clientSecret || !refreshToken) return null;
const res = await fetchWithRetry('https://oauth2.googleapis.com/token', {
method: 'POST',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
client_id: clientId,
client_secret: clientSecret,
refresh_token: refreshToken,
grant_type: 'refresh_token',
}),
});
if (!res.ok) {
throw new Error(`OAuth token refresh failed: ${res.status} ${await res.text()}`);
}
const body = (await res.json()) as TokenResponse;
cachedToken = {
token: body.access_token,
expiresAt: Date.now() + (body.expires_in - 600) * 1000,
};
return cachedToken.token;
}
async function getServiceAccountAccessToken(): Promise<string> {
const raw = process.env.GSC_SERVICE_ACCOUNT_KEY;
if (!raw) {
throw new Error(
'No GSC auth configured. Set either GSC_OAUTH_CLIENT_ID/GSC_OAUTH_CLIENT_SECRET/GSC_OAUTH_REFRESH_TOKEN or GSC_SERVICE_ACCOUNT_KEY.',
);
}
const key = JSON.parse(raw) as ServiceAccountKey;
const now = Math.floor(Date.now() / 1000);
const claim = {
iss: key.client_email,
scope: 'https://www.googleapis.com/auth/webmasters',
aud: key.token_uri ?? 'https://oauth2.googleapis.com/token',
iat: now,
exp: now + 3600,
};
const header = { alg: 'RS256', typ: 'JWT' };
const b64 = (s: string) =>
Buffer.from(s)
.toString('base64')
.replace(/=/g, '')
.replace(/\+/g, '-')
.replace(/\//g, '_');
const signingInput = `${b64(JSON.stringify(header))}.${b64(JSON.stringify(claim))}`;
const signer = createSign('RSA-SHA256');
signer.update(signingInput);
signer.end();
const signature = signer
.sign(key.private_key)
.toString('base64')
.replace(/=/g, '')
.replace(/\+/g, '-')
.replace(/\//g, '_');
const jwt = `${signingInput}.${signature}`;
const res = await fetchWithRetry(claim.aud, {
method: 'POST',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'urn:ietf:params:oauth:grant-type:jwt-bearer',
assertion: jwt,
}),
});
if (!res.ok) {
throw new Error(
`Token exchange failed: ${res.status} ${await res.text()}`,
);
}
const body = (await res.json()) as TokenResponse;
cachedToken = {
token: body.access_token,
// Refresh 10 minutes before expiry.
expiresAt: Date.now() + (body.expires_in - 600) * 1000,
};
return cachedToken.token;
}
/* ---------------------- GSC REST helpers ---------------------- */
/**
* Submits (or re-submits) a sitemap to Google Search Console.
* Idempotent calling repeatedly is safe.
*/
export async function submitSitemap(feedpath = SITEMAP_URL): Promise<void> {
const token = await getAccessToken();
const url = `https://www.googleapis.com/webmasters/v3/sites/${encodeURIComponent(GSC_SITE_URL)}/sitemaps/${encodeURIComponent(feedpath)}`;
const res = await fetchWithRetry(url, {
method: 'PUT',
headers: { authorization: `Bearer ${token}` },
});
if (!res.ok && res.status !== 204) {
throw new Error(`Sitemap submit failed (${res.status}): ${await res.text()}`);
}
}
/**
* Calls URL Inspection API for one URL.
* Treat the response as MONITORING data not a submission for indexing.
*/
export async function inspectUrl(url: string): Promise<InspectionVerdict> {
const token = await getAccessToken();
const res = await fetchWithRetry(
'https://searchconsole.googleapis.com/v1/urlInspection/index:inspect',
{
method: 'POST',
headers: {
authorization: `Bearer ${token}`,
'content-type': 'application/json',
},
body: JSON.stringify({
inspectionUrl: url,
siteUrl: GSC_SITE_URL,
languageCode: 'en-US',
}),
},
);
if (!res.ok) {
throw new Error(`URL Inspection failed (${res.status}): ${await res.text()}`);
}
const body = (await res.json()) as {
inspectionResult?: {
indexStatusResult?: {
verdict?: InspectionVerdict['verdict'];
coverageState?: string;
pageFetchState?: string;
indexingState?: string;
lastCrawlTime?: string;
googleCanonical?: string;
userCanonical?: string;
robotsTxtState?: string;
};
};
};
const isr = body.inspectionResult?.indexStatusResult ?? {};
const verdict = isr.verdict ?? 'VERDICT_UNSPECIFIED';
return {
verdict,
coverageState: isr.coverageState ?? 'UNKNOWN',
pageFetchState: isr.pageFetchState,
indexingState: isr.indexingState,
lastCrawlTime: isr.lastCrawlTime,
googleCanonical: isr.googleCanonical,
userCanonical: isr.userCanonical,
robotsTxtState: isr.robotsTxtState,
isIndexed:
verdict === 'PASS' &&
/Submitted and indexed|Indexed/i.test(isr.coverageState ?? ''),
};
}
/**
* Pulls Search Console Performance data for one canonical URL. This is
* the traffic half of the workflow: indexed pages can still earn zero
* impressions, and that is a different problem than discovery.
*/
export async function querySearchAnalytics(
url: string,
windowDays: 7 | 28,
): Promise<SearchAnalyticsRecord> {
const token = await getAccessToken();
const end = new Date();
// GSC Search Analytics data lags by ~2 days. Querying through
// yesterday often returns partial data; use a stable 2-day offset.
end.setUTCDate(end.getUTCDate() - 2);
const start = new Date(end);
start.setUTCDate(start.getUTCDate() - windowDays + 1);
const fmt = (d: Date) => d.toISOString().slice(0, 10);
const endpoint = `https://www.googleapis.com/webmasters/v3/sites/${encodeURIComponent(GSC_SITE_URL)}/searchAnalytics/query`;
const res = await fetchWithRetry(endpoint, {
method: 'POST',
headers: {
authorization: `Bearer ${token}`,
'content-type': 'application/json',
},
body: JSON.stringify({
startDate: fmt(start),
endDate: fmt(end),
dimensions: ['page'],
rowLimit: 1,
dimensionFilterGroups: [
{
filters: [
{
dimension: 'page',
operator: 'equals',
expression: url,
},
],
},
],
}),
});
if (!res.ok) {
throw new Error(`Search Analytics failed (${res.status}): ${await res.text()}`);
}
const body = (await res.json()) as {
rows?: Array<{
clicks?: number;
impressions?: number;
ctr?: number;
position?: number;
}>;
};
const row = body.rows?.[0] ?? {};
return {
url,
queriedAt: new Date().toISOString(),
windowDays,
startDate: fmt(start),
endDate: fmt(end),
clicks: row.clicks ?? 0,
impressions: row.impressions ?? 0,
ctr: row.ctr ?? 0,
position: row.position ?? 0,
};
}
/* --------------------------- URLs ---------------------------- */
/**
* Maps a blog markdown filename (sans extension) to its canonical URL.
* Mirrors `apps/landing-page/app/pages/blog/[slug].astro`.
*/
export function blogSlugToUrl(slug: string): string {
return `${SITE}/blog/${slug}/`;
}
/** Returns true for a blog post markdown file the loader will surface. */
export function isPostFile(file: string): boolean {
const base = path.basename(file);
return (
file.startsWith('apps/landing-page/app/content/blog/') &&
base.endsWith('.md') &&
!base.startsWith('_')
);
}
/** Strips the blog prefix and `.md` to derive the post slug. */
export function fileToSlug(file: string): string {
return path.basename(file).replace(/\.md$/, '');
}
/* -------------------------- IO utils ------------------------- */
export function readJsonFile<T>(file: string): T {
return JSON.parse(readFileSync(file, 'utf8')) as T;
}
export function fileExists(file: string): boolean {
return existsSync(file);
}
export function loadUrlInput(input: string): string[] {
if (fileExists(input)) {
const raw = JSON.parse(readFileSync(input, 'utf8'));
if (Array.isArray(raw)) return raw as string[];
if (Array.isArray(raw.urls)) return raw.urls as string[];
return [...(raw.addedUrls ?? []), ...(raw.modifiedUrls ?? [])] as string[];
}
return input
.split(',')
.map((s) => s.trim())
.filter(Boolean);
}
export function slugFromUrl(url: string): string {
return url.replace(/^https?:\/\/[^/]+\/blog\//, '').replace(/\/$/, '');
}
/**
* Workflow-dispatch inputs can reach git diff/log commands. Keep them to
* plain ref-ish values so shell metacharacters never become part of a command.
*/
export function assertSafeGitRef(value: string, label: string): string {
if (!/^[A-Za-z0-9_./:-]+$/.test(value)) {
throw new Error(`Unsafe git ref for ${label}: ${value}`);
}
return value;
}
/**
* Runs `git <cmd>` from the repo root and returns trimmed stdout.
* All blog-indexing scripts must use this rather than execing `git`
* directly the scripts are invoked from arbitrary cwds (locally, in
* CI, from `pnpm --filter` which sets cwd to the package dir) and
* relative paths in `git diff` / `git log` would otherwise resolve
* against the wrong directory.
*/
export function git(cmd: string): string {
return execSync(`git ${cmd}`, { encoding: 'utf8', cwd: REPO_ROOT }).trim();
}

View file

@ -0,0 +1,188 @@
/*
* lint-blog-seo source + rendered SEO checks for Open Design blog posts.
*
* Usage:
* tsx lint-blog-seo.ts [--base <sha> --head <sha>] [--files file1,file2]
* [--rendered-out ../../apps/landing-page/out]
*
* Rules are intentionally split into:
* - errors: technical/indexability blockers that fail CI
* - warnings: editorial quality signals that should be reviewed but
* should not block shipping by themselves
*/
import { existsSync, readFileSync, readdirSync } from 'node:fs';
import path from 'node:path';
import {
BLOG_DIR,
REPO_ROOT,
assertSafeGitRef,
fileToSlug,
git,
isPostFile,
} from './lib.ts';
interface Args {
base?: string;
head: string;
files?: string[];
renderedOut?: string;
}
interface Finding {
file: string;
level: 'error' | 'warning';
message: string;
}
function parseArgs(argv: string[]): Args {
const args: Args = { head: 'HEAD' };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--base') args.base = argv[++i];
else if (a === '--head') args.head = argv[++i];
else if (a === '--files') {
args.files = argv[++i]
.split(',')
.map((s) => s.trim())
.filter(Boolean);
} else if (a === '--rendered-out') args.renderedOut = argv[++i];
}
return args;
}
function changedFiles(base: string, head: string): string[] {
const safeBase = assertSafeGitRef(base, 'base');
const safeHead = assertSafeGitRef(head, 'head');
const raw = git(`diff --name-only --diff-filter=AM ${safeBase} ${safeHead} -- apps/landing-page/app/content/blog/`);
return raw.split('\n').filter(Boolean).filter(isPostFile);
}
function allPostFiles(): string[] {
return readdirSync(BLOG_DIR)
.filter((f) => f.endsWith('.md') && !f.startsWith('_'))
.map((f) => `apps/landing-page/app/content/blog/${f}`);
}
function parseFrontmatter(raw: string): { data: Record<string, string>; body: string } {
const match = raw.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
if (!match) return { data: {}, body: raw };
const data: Record<string, string> = {};
for (const line of match[1].split('\n')) {
const m = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
if (!m) continue;
data[m[1]] = m[2].trim().replace(/^["']|["']$/g, '');
}
return { data, body: match[2] };
}
function markdownLinks(body: string): Array<{ text: string; href: string }> {
return Array.from(body.matchAll(/\[([^\]]+)\]\(([^)]+)\)/g)).map((m) => ({
text: m[1],
href: m[2],
}));
}
function checkRendered(slug: string, renderedOut: string, file: string): Finding[] {
const htmlPath = path.join(REPO_ROOT, renderedOut, 'blog', slug, 'index.html');
if (!existsSync(htmlPath)) return [];
const html = readFileSync(htmlPath, 'utf8');
const findings: Finding[] = [];
if (!/<link\b[^>]*\brel=["']canonical["'][^>]*\bhref=["'][^"']+["']/i.test(html)) {
findings.push({ file, level: 'error', message: 'rendered page has no canonical link' });
}
if (!/<script\b[^>]*type=["']application\/ld\+json["'][^>]*>/i.test(html)) {
findings.push({ file, level: 'error', message: 'rendered page has no Article JSON-LD' });
}
if (!/<meta\b[^>]*property=["']og:image["'][^>]*content=["'][^"']+["']/i.test(html)) {
findings.push({ file, level: 'error', message: 'rendered page has no og:image' });
}
if (/<meta\b[^>]*name=["']robots["'][^>]*content=["'][^"']*\bnoindex\b/i.test(html)) {
findings.push({ file, level: 'error', message: 'rendered page is noindex' });
}
return findings;
}
function lintFile(file: string, renderedOut?: string): Finding[] {
const abs = path.join(REPO_ROOT, file);
const raw = readFileSync(abs, 'utf8');
const { data, body } = parseFrontmatter(raw);
const findings: Finding[] = [];
const title = data.title ?? '';
const summary = data.summary ?? '';
const required = ['title', 'date', 'category', 'readingTime', 'summary'];
for (const key of required) {
if (!data[key]) findings.push({ file, level: 'error', message: `missing frontmatter: ${key}` });
}
if (title && (title.length < 40 || title.length > 65)) {
findings.push({
file,
level: 'warning',
message: `title length ${title.length}; target is 40-65 characters`,
});
}
if (summary && (summary.length < 120 || summary.length > 200)) {
findings.push({
file,
level: 'warning',
message: `summary length ${summary.length}; target is 120-200 characters`,
});
}
const links = markdownLinks(body);
const internal = links.filter((l) => l.href.startsWith('/blog/') || l.href.startsWith('/skills/') || l.href.startsWith('/systems/') || l.href.startsWith('/craft/'));
const external = links.filter((l) => /^https?:\/\//.test(l.href));
if (internal.length < 2) {
findings.push({
file,
level: 'warning',
message: `only ${internal.length} internal links; target is at least 2`,
});
}
if (external.length < 1) {
findings.push({ file, level: 'warning', message: 'no external authoritative links found' });
}
if (/^#\s+/m.test(body)) {
findings.push({
file,
level: 'error',
message: 'markdown body contains an H1; route already renders frontmatter title as H1',
});
}
if (/\b(the future of design|AI is replacing designers|unlock your creativity)\b/i.test(body)) {
findings.push({
file,
level: 'warning',
message: 'body contains a generic/banned blog-factory phrase',
});
}
if (renderedOut) {
findings.push(...checkRendered(fileToSlug(file), renderedOut, file));
}
return findings;
}
function main() {
const args = parseArgs(process.argv.slice(2));
const files = args.files
? args.files.filter(isPostFile)
: args.base
? changedFiles(args.base, args.head)
: allPostFiles();
if (files.length === 0) {
console.log('No changed blog posts to lint.');
return;
}
const findings = files.flatMap((file) => lintFile(file, args.renderedOut));
for (const finding of findings) {
console.log(`${finding.level.toUpperCase()}: ${finding.file}: ${finding.message}`);
}
const errors = findings.filter((f) => f.level === 'error');
const warnings = findings.filter((f) => f.level === 'warning');
console.log(
`Blog SEO lint checked ${files.length} post(s): ${errors.length} error(s), ${warnings.length} warning(s).`,
);
if (errors.length > 0) process.exitCode = 1;
}
main();

View file

@ -0,0 +1,49 @@
/*
* query-search-analytics fetches GSC Search Analytics metrics for
* canonical blog URLs over 7d and 28d windows.
*
* Usage:
* tsx query-search-analytics.ts --urls <file-or-csv> [--out file.json]
*/
import { writeFileSync } from 'node:fs';
import {
type SearchAnalyticsRecord,
SITE,
loadUrlInput,
querySearchAnalytics,
} from './lib.ts';
interface Args {
urls: string;
out?: string;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--urls') args.urls = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
}
if (!args.urls) throw new Error('--urls is required');
return args as Args;
}
async function main() {
const args = parseArgs(process.argv.slice(2));
const urls = [...new Set(loadUrlInput(args.urls))].filter((url) =>
url.startsWith(`${SITE}/blog/`),
);
const records: SearchAnalyticsRecord[] = [];
for (const url of urls) {
records.push(await querySearchAnalytics(url, 7));
records.push(await querySearchAnalytics(url, 28));
}
const json = JSON.stringify(records, null, 2);
if (args.out) writeFileSync(args.out, json + '\n');
else process.stdout.write(json + '\n');
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,193 @@
/*
* render-status merges new InspectionRecord[] into the human-facing
* status report `docs/blog-indexing-status.md`.
*
* Usage: tsx render-status.ts --inspections <file.json> [--out file.md]
*
* Strategy: the markdown file is regenerated wholesale from a JSON
* sidecar (`docs/blog-indexing-status.json`) so diffs stay legible and
* we don't risk parsing markdown to update it. Per-URL latest record
* wins; older records are kept under "Recent inspections" capped at 50
* entries.
*
* If `--inspections` is empty the script still rewrites the markdown
* (useful for backfill / formatting changes).
*/
import { existsSync, readFileSync, writeFileSync } from 'node:fs';
import path from 'node:path';
import {
type BlogIndexingState,
type InspectionRecord,
type SearchAnalyticsRecord,
REPO_ROOT,
readJsonFile,
} from './lib.ts';
interface Args {
inspections?: string;
analytics?: string;
out: string;
state: string;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--inspections') args.inspections = argv[++i];
else if (argv[i] === '--analytics') args.analytics = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
else if (argv[i] === '--state') args.state = argv[++i];
}
args.out ??= path.join(REPO_ROOT, 'docs/blog-indexing-status.md');
args.state ??= path.join(REPO_ROOT, 'docs/blog-indexing-status.json');
return args as Args;
}
function loadState(file: string): BlogIndexingState {
if (!existsSync(file)) return { latest: {}, history: [] };
const raw = JSON.parse(readFileSync(file, 'utf8'));
return {
latest: raw.latest ?? {},
history: raw.history ?? [],
performance: raw.performance ?? {},
firstInspectedAt: raw.firstInspectedAt ?? raw.firstSeenAt ?? {},
};
}
function fmtVerdictBadge(record: InspectionRecord): string {
if ('error' in record.result) return 'error';
const r = record.result;
if (r.isIndexed) return 'indexed';
if (r.verdict === 'PASS') return 'pass';
if (r.verdict === 'PARTIAL') return 'partial';
if (r.verdict === 'FAIL') return 'fail';
return r.verdict.toLowerCase();
}
function fmtCoverage(record: InspectionRecord): string {
if ('error' in record.result) return record.result.error;
return record.result.coverageState;
}
function fmtCanonical(record: InspectionRecord): string {
if ('error' in record.result) return '—';
const { userCanonical, googleCanonical } = record.result;
if (!userCanonical && !googleCanonical) return '—';
if (userCanonical && googleCanonical && userCanonical !== googleCanonical) {
return `mismatch: user=${userCanonical} google=${googleCanonical}`;
}
return googleCanonical ?? userCanonical ?? '—';
}
function fmtDate(iso?: string): string {
if (!iso) return '—';
return iso.slice(0, 10);
}
function fmtNum(n?: number): string {
if (n == null) return '—';
return new Intl.NumberFormat('en-US', { maximumFractionDigits: 0 }).format(n);
}
function fmtPercent(n?: number): string {
if (n == null) return '—';
return `${(n * 100).toFixed(1)}%`;
}
function fmtPosition(n?: number): string {
if (!n) return '—';
return n.toFixed(1);
}
function renderMarkdown(state: BlogIndexingState): string {
const latestRows = Object.values(state.latest)
.sort((a, b) => (a.url < b.url ? -1 : 1))
.map((r) => {
const slug = r.url.replace(/^https?:\/\/[^/]+\/blog\//, '').replace(/\/$/, '');
const lastCrawl =
'error' in r.result ? '—' : fmtDate(r.result.lastCrawlTime);
const perf7 = state.performance?.[r.url]?.['7'];
const perf28 = state.performance?.[r.url]?.['28'];
return `| ${slug} | ${fmtVerdictBadge(r)} | ${fmtCoverage(r)} | ${lastCrawl} | ${fmtDate(r.inspectedAt)} | ${fmtNum(perf7?.impressions)} | ${fmtNum(perf7?.clicks)} | ${fmtPercent(perf7?.ctr)} | ${fmtPosition(perf28?.position)} |`;
})
.join('\n');
const historyRows = state.history
.slice(0, 50)
.map((r) => {
const slug = r.url.replace(/^https?:\/\/[^/]+\/blog\//, '').replace(/\/$/, '');
return `| ${fmtDate(r.inspectedAt)} | ${slug} | ${fmtVerdictBadge(r)} | ${fmtCoverage(r)} | ${fmtCanonical(r)} |`;
})
.join('\n');
return `# Blog indexing status
_Generated by \`.github/workflows/blog-indexing-monitor.yml\` and
\`.github/workflows/blog-indexing-on-deploy.yml\`. Do not hand-edit; the
sidecar JSON at \`docs/blog-indexing-status.json\` is the source of
truth and the next workflow run will overwrite this file._
Verdict legend (from Google URL Inspection API):
- **indexed** \`PASS\` and coverage state contains "Submitted and indexed"
- **pass** Google has the URL but is not yet indexed
- **partial** Google sees the URL with one or more soft warnings
- **fail** page failed inspection (likely fetch / canonical / robots)
- **error** automation could not call the API for this URL
Traffic columns come from the GSC Search Analytics API. They lag by
roughly two days and represent canonical URL-level performance, not
site-wide totals.
If a URL stays in \`Discovered - currently not indexed\` or
\`Crawled - currently not indexed\` past the T+7 window, the monitor
workflow will open a tracking issue. See
[docs/blog-indexing-automation.md](./blog-indexing-automation.md) for
the full architecture.
## Latest verdict per URL
| Slug | Verdict | Coverage state | Last Google crawl | Last inspected | 7d impressions | 7d clicks | 7d CTR | 28d avg position |
|---|---|---|---|---|---|---|---|---|
${latestRows || '| _no inspections recorded yet_ | — | — | — | — | — | — | — | — |'}
## Recent inspections
| Inspected | Slug | Verdict | Coverage state | Canonical |
|---|---|---|---|---|
${historyRows || '| _no inspections recorded yet_ | — | — | — | — |'}
`;
}
function main() {
const args = parseArgs(process.argv.slice(2));
const state = loadState(args.state);
if (args.inspections && existsSync(args.inspections)) {
const fresh = readJsonFile<InspectionRecord[]>(args.inspections);
for (const record of fresh) {
state.latest[record.url] = record;
state.history.unshift(record);
state.firstInspectedAt ??= {};
state.firstInspectedAt[record.url] ??= record.inspectedAt;
}
state.history = state.history.slice(0, 50);
}
if (args.analytics && existsSync(args.analytics)) {
const fresh = readJsonFile<SearchAnalyticsRecord[]>(args.analytics);
state.performance ??= {};
for (const record of fresh) {
state.performance[record.url] ??= {};
state.performance[record.url][String(record.windowDays) as '7' | '28'] = record;
}
}
writeFileSync(args.state, JSON.stringify(state, null, 2) + '\n');
writeFileSync(args.out, renderMarkdown(state));
console.log(
`Wrote ${args.out} (${Object.keys(state.latest).length} URLs tracked, ${state.history.length} history entries).`,
);
}
main();

View file

@ -0,0 +1,81 @@
/*
* scheduled-window emits the URLs that fall into the T+1, T+3, T+7,
* or T+14 inspection window today.
*
* Usage: tsx scheduled-window.ts [--out file.json] [--max-age-days 14]
*
* For each blog post .md file, the post's publish date is the
* frontmatter `date`. A URL is emitted when `today - publishDate` is
* exactly 1, 3, 7, or 14 days.
*
* Output JSON shape:
*
* {
* "today": "2026-05-15",
* "windows": { "1": [...], "3": [...], "7": [...], "14": [...] },
* "urls": ["https://open-design.ai/blog/foo/", ...] // dedupe of all 4 buckets
* }
*
* If no URLs match, exits 0 with `urls: []` so the calling workflow
* can branch on `length`.
*/
import { readFileSync, readdirSync, writeFileSync } from 'node:fs';
import path from 'node:path';
import { BLOG_DIR, REPO_ROOT, blogSlugToUrl, fileToSlug } from './lib.ts';
const WINDOWS = [1, 3, 7, 14] as const;
type Window = (typeof WINDOWS)[number];
interface Args {
out?: string;
maxAgeDays: number;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--out') args.out = argv[++i];
else if (argv[i] === '--max-age-days') args.maxAgeDays = Number(argv[++i]);
}
return { maxAgeDays: 14, ...args } as Args;
}
function frontmatterDate(file: string): string | null {
const raw = readFileSync(path.join(REPO_ROOT, file), 'utf8');
const match = raw.match(/^---\n[\s\S]*?\ndate:\s*([0-9]{4}-[0-9]{2}-[0-9]{2})\b/m);
return match?.[1] ?? null;
}
function diffDays(today: string, addedAt: string): number {
const a = new Date(today + 'T00:00:00Z').getTime();
const b = new Date(addedAt + 'T00:00:00Z').getTime();
return Math.round((a - b) / 86_400_000);
}
function main() {
const { out, maxAgeDays } = parseArgs(process.argv.slice(2));
const today = new Date().toISOString().slice(0, 10);
const files = readdirSync(BLOG_DIR)
.filter((f) => f.endsWith('.md') && !f.startsWith('_'))
.map((f) => path.join('apps/landing-page/app/content/blog', f));
const buckets: Record<Window, string[]> = { 1: [], 3: [], 7: [], 14: [] };
for (const file of files) {
const publishedAt = frontmatterDate(file);
if (!publishedAt) continue;
const age = diffDays(today, publishedAt);
if (age < 0 || age > maxAgeDays) continue;
if (!WINDOWS.includes(age as Window)) continue;
const url = blogSlugToUrl(fileToSlug(file));
buckets[age as Window].push(url);
}
const urls = [...new Set(Object.values(buckets).flat())];
const result = { today, windows: buckets, urls };
const json = JSON.stringify(result, null, 2);
if (out) writeFileSync(out, json + '\n');
else process.stdout.write(json + '\n');
}
main();

View file

@ -0,0 +1,97 @@
/*
* submit-indexnow submits canonical blog URLs to IndexNow-compatible
* engines (Bing, Yandex, and partners).
*
* Usage:
* tsx submit-indexnow.ts --urls <file-or-csv> [--out file.json]
*
* Input accepts the same shapes as inspect-urls:
* { addedUrls: string[], modifiedUrls?: string[] } OR { urls: string[] }
* OR string[] OR comma-separated URLs.
*/
import { writeFileSync } from 'node:fs';
import {
INDEXNOW_KEY,
INDEXNOW_KEY_LOCATION,
SITE,
fetchWithRetry,
loadUrlInput,
} from './lib.ts';
interface Args {
urls: string;
out?: string;
}
interface IndexNowResult {
submittedAt: string;
endpoint: string;
urls: string[];
status: number;
ok: boolean;
body: string;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--urls') args.urls = argv[++i];
else if (argv[i] === '--out') args.out = argv[++i];
}
if (!args.urls) throw new Error('--urls is required');
return args as Args;
}
async function submitChunk(urls: string[]): Promise<IndexNowResult> {
const endpoint = 'https://api.indexnow.org/indexnow';
const res = await fetchWithRetry(endpoint, {
method: 'POST',
headers: { 'content-type': 'application/json; charset=utf-8' },
body: JSON.stringify({
host: 'open-design.ai',
key: INDEXNOW_KEY,
keyLocation: INDEXNOW_KEY_LOCATION,
urlList: urls,
}),
});
return {
submittedAt: new Date().toISOString(),
endpoint,
urls,
status: res.status,
ok: res.ok,
body: await res.text(),
};
}
async function main() {
const args = parseArgs(process.argv.slice(2));
const urls = [...new Set(loadUrlInput(args.urls))].filter((url) =>
url.startsWith(`${SITE}/blog/`),
);
if (urls.length === 0) {
const empty: IndexNowResult[] = [];
if (args.out) writeFileSync(args.out, JSON.stringify(empty, null, 2) + '\n');
else process.stdout.write('[]\n');
return;
}
const results: IndexNowResult[] = [];
for (let i = 0; i < urls.length; i += 10_000) {
results.push(await submitChunk(urls.slice(i, i + 10_000)));
}
const json = JSON.stringify(results, null, 2);
if (args.out) writeFileSync(args.out, json + '\n');
else process.stdout.write(json + '\n');
const failed = results.filter((r) => !r.ok);
if (failed.length > 0) {
console.error(`IndexNow rejected ${failed.length}/${results.length} batch(es).`);
process.exitCode = 1;
}
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,32 @@
/*
* submit-sitemap one PUT to the GSC Sitemaps API per deploy.
*
* Usage: tsx submit-sitemap.ts [--feed <url>]
*
* Default feed: https://open-design.ai/sitemap-index.xml.
*
* Rationale (blog-indexing-automation skill, Step 3): for standard blog
* content prefer sitemap submission over per-URL forced indexing. One
* call per deploy is enough Google revisits sitemaps on its own
* schedule once the property knows about them.
*/
import { SITEMAP_URL, submitSitemap } from './lib.ts';
function parseArgs(argv: string[]): { feed: string } {
let feed = SITEMAP_URL;
for (let i = 0; i < argv.length; i++) {
if (argv[i] === '--feed') feed = argv[++i];
}
return { feed };
}
async function main() {
const { feed } = parseArgs(process.argv.slice(2));
await submitSitemap(feed);
console.log(`Submitted sitemap: ${feed}`);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,174 @@
/*
* verify-readiness checks that each URL in the input is *operationally*
* ready for indexing. Per blog-indexing-automation skill Step 2:
*
* - URL returns 200
* - page is not noindex
* - canonical points to the intended URL
* - page is present in sitemap output
*
* Polls each URL with a short backoff so the script can be invoked
* immediately after `landing-page-deploy` completes (Cloudflare Pages
* propagation usually < 60 s but not guaranteed).
*
* Usage: tsx verify-readiness.ts --urls <file.json> [--out file.json] [--timeout-ms 180000]
*
* Input JSON: { addedUrls: string[], modifiedUrls?: string[] }.
* Output JSON: array of ReadinessResult exits non-zero if any URL is not ready.
*/
import { writeFileSync } from 'node:fs';
import {
type ReadinessResult,
SITE,
SITEMAP_CHILD_URL,
fetchWithRetry,
readJsonFile,
sleep,
} from './lib.ts';
interface Args {
urls: string;
out?: string;
timeoutMs: number;
}
function parseArgs(argv: string[]): Args {
const args: Partial<Args> = {};
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--urls') args.urls = argv[++i];
else if (a === '--out') args.out = argv[++i];
else if (a === '--timeout-ms') args.timeoutMs = Number(argv[++i]);
}
if (!args.urls) throw new Error('--urls is required');
return { timeoutMs: 180_000, ...args } as Args;
}
async function fetchOnce(url: string): Promise<{ status: number; body: string }> {
const res = await fetchWithRetry(url, {
redirect: 'manual',
headers: { 'user-agent': 'OpenDesignBlogIndexingBot/1.0' },
});
return { status: res.status, body: await res.text() };
}
async function fetchSitemapUrls(): Promise<Set<string>> {
const res = await fetchWithRetry(SITEMAP_CHILD_URL);
if (!res.ok) {
throw new Error(`Sitemap fetch failed (${res.status}) at ${SITEMAP_CHILD_URL}`);
}
const xml = await res.text();
// Sitemap entries are emitted as <url><loc>https://...</loc></url>.
// A regex is safer than a full XML parser for this CI surface.
return new Set(
Array.from(xml.matchAll(/<loc>([^<]+)<\/loc>/g)).map((m) => m[1].trim()),
);
}
async function waitForSitemapUrls(urls: string[], timeoutMs: number): Promise<Set<string>> {
const deadline = Date.now() + timeoutMs;
let sitemap = new Set<string>();
let delay = 5_000;
while (Date.now() < deadline) {
sitemap = await fetchSitemapUrls();
if (urls.every((url) => sitemap.has(url))) return sitemap;
await sleep(delay);
delay = Math.min(delay * 2, 30_000);
}
return sitemap;
}
function findCanonical(html: string): string | undefined {
const m = html.match(
/<link\b[^>]*\brel=["']canonical["'][^>]*\bhref=["']([^"']+)["']/i,
);
return m?.[1];
}
function hasNoindex(html: string): boolean {
return /<meta\b[^>]*\bname=["']robots["'][^>]*\bcontent=["'][^"']*\bnoindex\b[^"']*["']/i.test(
html,
);
}
async function checkUrl(
url: string,
sitemap: Set<string>,
timeoutMs: number,
): Promise<ReadinessResult> {
const failures: string[] = [];
let status: number | undefined;
let canonical: string | undefined;
// Poll up to `timeoutMs` waiting for a 200.
const deadline = Date.now() + timeoutMs;
let body = '';
let delay = 5_000;
while (Date.now() < deadline) {
try {
const r = await fetchOnce(url);
status = r.status;
body = r.body;
if (r.status === 200) break;
} catch (err) {
failures.push(`fetch error: ${(err as Error).message}`);
}
await sleep(delay);
delay = Math.min(delay * 2, 30_000);
}
if (status !== 200) {
failures.push(`expected 200, got ${status ?? 'no response'}`);
return { url, ok: false, failures, status };
}
if (hasNoindex(body)) failures.push('page is noindex');
canonical = findCanonical(body);
if (!canonical) failures.push('no canonical link');
else if (canonical !== url) {
failures.push(`canonical "${canonical}" != expected "${url}"`);
}
if (!sitemap.has(url)) failures.push(`url not in ${SITEMAP_CHILD_URL}`);
return { url, ok: failures.length === 0, failures, status, canonical };
}
async function main() {
const args = parseArgs(process.argv.slice(2));
const input = readJsonFile<{ addedUrls: string[]; modifiedUrls?: string[] }>(
args.urls,
);
const urls = [...new Set([...(input.addedUrls ?? []), ...(input.modifiedUrls ?? [])])];
if (urls.length === 0) {
const empty: ReadinessResult[] = [];
if (args.out) writeFileSync(args.out, JSON.stringify(empty, null, 2) + '\n');
else process.stdout.write('[]\n');
return;
}
for (const url of urls) {
if (!url.startsWith(SITE + '/')) {
throw new Error(`Refusing to verify off-site URL: ${url}`);
}
}
const sitemap = await waitForSitemapUrls(urls, args.timeoutMs);
const results: ReadinessResult[] = [];
for (const url of urls) {
results.push(await checkUrl(url, sitemap, args.timeoutMs));
}
const json = JSON.stringify(results, null, 2);
if (args.out) writeFileSync(args.out, json + '\n');
else process.stdout.write(json + '\n');
const bad = results.filter((r) => !r.ok);
if (bad.length > 0) {
console.error(`Readiness failed for ${bad.length}/${results.length} URL(s).`);
process.exitCode = 1;
}
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -0,0 +1,269 @@
# Blog indexing automation
The Open Design landing page automates the parts of search-engine
indexing that Google officially supports for normal blog content. It
does NOT pretend to "submit" or "request indexing" for blog posts via
unsupported APIs or browser automation.
This file is the operating manual. The skill that defines the rules
lives at `~/.codex/skills/blog-indexing-automation/SKILL.md`; this
doc is its concrete implementation in `nexu-io/open-design`.
## What is automated
| Trigger | Job | Outcome |
|---|---|---|
| `landing-page-ci` | `lint-blog-seo.ts` + `check-blog-url-changes.ts` | Changed posts are checked for frontmatter, internal/external links, rendered canonical/JSON-LD/OG metadata, and slug delete/rename redirects before they can merge. |
| `landing-page-deploy` finishes successfully on `main` | `blog-indexing-on-deploy.yml` | New blog URLs are detected, verified ready, submitted to IndexNow, the sitemap-index is re-submitted to GSC, baseline URL Inspection is captured, and baseline Search Analytics is queried. |
| Daily `cron: 0 2 * * *` | `blog-indexing-monitor.yml` | Every blog post in the T+1 / T+3 / T+7 / T+14 window is re-inspected; GSC Search Analytics is refreshed; stall and low-traffic issues are opened/refreshed when needed. |
| Manual `workflow_dispatch` | `blog-indexing-monitor.yml` | Maintainers can dry-run or explicitly publish a token-gated dev.to/Hashnode cross-post with canonical URL pointing back to Open Design. |
Both workflows commit their results back via the `open-design-bot`
GitHub App, opening or refreshing the `automation/blog-indexing-status`
PR. The human-readable view is `docs/blog-indexing-status.md`; the
canonical state is the sidecar `docs/blog-indexing-status.json`.
Before each run renders a new report, it restores the latest files from
the pending `automation/blog-indexing-status` branch when that branch
exists. That keeps inspection history continuous even if the previous
status PR has not been merged yet. If that branch exists but the status
files cannot be restored, the workflow fails and records the restore
failure in the job summary instead of silently starting from stale state.
## What is deliberately NOT automated
Per the `blog-indexing-automation` skill:
- We do not call Google's Indexing API. It officially supports only
Job Postings and Livestreams; using it for blog posts risks policy
flags and provides no real benefit.
- We do not automate clicks against the Search Console UI to "Request
Indexing." The skill labels that as a brittle last resort.
- We do not ping the legacy `https://www.google.com/ping?sitemap=`
endpoint. Google deprecated it in 2023.
- We do not attempt to inspect every URL on the site every day. We
only inspect changed URLs after deploy and posts in the
T+1/T+3/T+7/T+14 window.
- We do not auto-publish cross-posts. The cross-post scaffold is dry-run
by default and requires both platform tokens and `publish_crosspost=true`.
When automation cannot solve an indexing problem (e.g. Google has the
URL but refuses to index it), the monitor opens a GitHub issue
describing the likely failure mode so a human can fix the underlying
content / SEO issue.
## Architecture
```
landing-page-deploy ──success──▶ blog-indexing-on-deploy
detect-changed-urls
verify-readiness (200 / canonical / sitemap)
submit-indexnow
submit-sitemap (one PUT)
inspect-urls (baseline)
query-search-analytics
render-status ──▶ docs/blog-indexing-status.md
bot PR
cron 02:00 UTC ──▶ blog-indexing-monitor
scheduled-window (T+1/T+3/T+7/T+14 today)
inspect-urls
query-search-analytics
render-status ──▶ docs/blog-indexing-status.md
escalate-stalls ──▶ open / refresh / close stall issue
escalate-low-traffic ──▶ open / refresh / close traffic issue
bot PR
```
All scripts live in `apps/landing-page/scripts/blog-indexing/` and run
under `tsx` directly — no compile step. Most scripts depend only on
Node 24 built-ins (`crypto`, `fetch`, `child_process`). RSS uses
`@astrojs/rss`.
## One-time setup
Done once per environment by a maintainer. Repeating this is harmless
but unnecessary.
### 1. Configure Google Search Console auth
Preferred path: OAuth user refresh token. This avoids the Google Search
Console UI bug where newly-created service account emails sometimes
fail with `email not found`.
1. Go to <https://console.cloud.google.com/projectcreate> and create a
project named `open-design-blog-indexing` (or reuse an existing
project the team owns).
2. Enable the **Search Console API** under
<https://console.cloud.google.com/apis/library/searchconsole.googleapis.com>.
3. Create an OAuth client under
<https://console.cloud.google.com/apis/credentials>:
- Application type: **Desktop app**
- Name: `open-design-gsc-local`
4. In the OAuth consent screen, keep the app in Testing and add every
Google account that may grant access under **Audience → Test users**.
5. Run the local helper:
```bash
GSC_OAUTH_CLIENT_ID='<client-id>' \
GSC_OAUTH_CLIENT_SECRET='<client-secret>' \
pnpm --filter @open-design/landing-page exec tsx \
scripts/blog-indexing/authorize-gsc-oauth.ts \
--out /tmp/open-design-gsc-refresh-token.txt
```
6. Open the printed Google URL and authorize with an account that is an
Owner of the `open-design.ai` Search Console property.
Fallback path: service account. Create `gsc-indexing-bot`, download a
JSON key, then try adding the `client_email` as an Owner in Search
Console. If Search Console shows `email not found`, use OAuth instead.
### 2. Add auth secrets to GitHub
1. Open <https://github.com/nexu-io/open-design/settings/secrets/actions>.
2. Preferred OAuth secrets:
- `GSC_OAUTH_CLIENT_ID`
- `GSC_OAUTH_CLIENT_SECRET`
- `GSC_OAUTH_REFRESH_TOKEN`
3. Optional service-account fallback:
- `GSC_SERVICE_ACCOUNT_KEY`
4. Confirm the existing `BOT_APP_ID` and `BOT_APP_PRIVATE_KEY` secrets
already exist — they are reused from the `refresh-contributors-wall`
automation. The bot needs `contents:write`, `pull-requests:write`,
and `issues:write` for `nexu-io/open-design` (already configured).
If these secrets are not present yet, the workflows do not fail the
main deploy path. They record the missing configuration in the job
summary, emit a GitHub Actions warning, and skip the GSC / bot-write
steps until the secrets are added.
### 3. Optional platform secrets
These are not required for indexing.
- `DEVTO_API_KEY` — only needed if a maintainer wants
`blog-indexing-monitor.yml` to publish a dev.to cross-post.
- `HASHNODE_TOKEN` and `HASHNODE_PUBLICATION_ID` — only needed for
Hashnode cross-posts.
- `CLOUDFLARE_ZONE_ID` — optional future optimization if we choose to
purge cache directly. Current automation polls the live sitemap until
the new URLs appear, so this secret is not required.
IndexNow does not need a secret. The public verification key is committed
at `apps/landing-page/public/96b0928121e24fd7b4ef85ae0f8bf1d8.txt`.
### 4. Smoke test
Trigger `blog-indexing-on-deploy.yml` manually with the SHA of any
recent commit that added a blog post:
```bash
gh workflow run blog-indexing-on-deploy.yml \
-R nexu-io/open-design \
-f head_sha=<sha>
```
A successful run produces:
- a green check on the workflow
- the `automation/blog-indexing-status` PR refreshed with new rows in
`docs/blog-indexing-status.md`
- the artifact `blog-indexing-<run-id>` containing the raw JSON
outputs
- an `indexnow.json` artifact with the IndexNow submission result
If the run fails on the **Submit sitemap** step with a 403, the
service account is not yet an Owner on the GSC property (Step 2).
## Operating
The expected steady state:
- PR opens → `landing-page-ci` runs SEO lint and URL-change guards. A
post cannot merge if it deletes/renames a live slug without an
explicit redirect, or if the rendered HTML loses canonical/JSON-LD/OG
metadata.
- Renames are handled as both a redirect requirement for the old slug
and a newly deployed URL for the destination slug, so the new page is
included in the post-deploy readiness and baseline inspection flow.
- New post ships → `landing-page-deploy` runs → `blog-indexing-on-deploy`
runs → IndexNow is called, GSC sitemap is submitted, and the bot PR
opens with the baseline verdict plus any available 7d/28d traffic
metrics.
- Daily monitor runs → at T+1 the post usually moves to
`Crawled - currently not indexed`. By T+3T+7 a healthy post is
`Submitted and indexed`. The status table reflects this.
- If T+7 passes and the post is still not indexed, the monitor opens
a `Blog indexing — URLs stalled in Search Console` issue listing the
affected URLs, re-submits them to IndexNow, and records a history
comment on every refresh. Triage manually using the URL Inspection
live test if the issue stays open.
- If T+14 passes, a post is indexed, and GSC still reports zero
impressions, the monitor opens `Blog traffic — indexed posts with zero
impressions`. Treat that as a distribution/query-fit issue, not an
indexing issue.
The status PR is intentionally **not** auto-merged. A maintainer
reviews each refresh so the daily diff is part of the team's
awareness of search-side health.
## Files
- `apps/landing-page/scripts/blog-indexing/lib.ts` — GSC auth, URL
Inspection helper, Search Analytics helper, sitemap helper, retry
wrapper, type defs.
- `apps/landing-page/scripts/blog-indexing/detect-changed-urls.ts`
diff a deploy commit against its parent for added / modified blog
files.
- `apps/landing-page/scripts/blog-indexing/verify-readiness.ts`
HTTP, canonical, noindex, and sitemap presence checks; polls until
Cloudflare propagation completes.
- `apps/landing-page/scripts/blog-indexing/lint-blog-seo.ts`
source/rendered SEO lint for changed posts in CI.
- `apps/landing-page/scripts/blog-indexing/check-blog-url-changes.ts`
prevents slug deletes/renames without redirects.
- `apps/landing-page/scripts/blog-indexing/submit-indexnow.ts`
submits changed/stalled blog URLs to IndexNow-compatible engines.
- `apps/landing-page/scripts/blog-indexing/submit-sitemap.ts` — PUT
the sitemap to Search Console (one call per deploy).
- `apps/landing-page/scripts/blog-indexing/inspect-urls.ts` — call
URL Inspection API per URL; emit `InspectionRecord[]`.
- `apps/landing-page/scripts/blog-indexing/query-search-analytics.ts`
query URL-level 7d/28d impressions, clicks, CTR, and position.
- `apps/landing-page/scripts/blog-indexing/render-status.ts`
rewrite `docs/blog-indexing-status.md` from the JSON sidecar.
- `apps/landing-page/scripts/blog-indexing/scheduled-window.ts`
emit URLs in today's T+1 / T+3 / T+7 / T+14 buckets.
- `apps/landing-page/scripts/blog-indexing/escalate-stalls.ts`
decide whether the stall issue needs to open / refresh / close.
- `apps/landing-page/scripts/blog-indexing/escalate-low-traffic.ts`
decide whether indexed-but-zero-impression posts need a traffic issue.
- `apps/landing-page/scripts/blog-indexing/crosspost.ts`
dry-run/token-gated dev.to or Hashnode cross-post scaffold.
- `apps/landing-page/app/pages/rss.xml.ts`
- `apps/landing-page/public/llms.txt`
- `apps/landing-page/public/_redirects`
- `.github/workflows/blog-indexing-on-deploy.yml`
- `.github/workflows/blog-indexing-monitor.yml`
- `docs/blog-indexing-status.md` — human view (auto-generated)
- `docs/blog-indexing-status.json` — canonical state (auto-generated)
The JSON state records `firstInspectedAt` as the first time automation
successfully captured an inspection for a URL. It is not Google's
first-discovery time; escalation scripts prefer the post frontmatter date
for age windows and only use this inspection timestamp as a fallback.

View file

@ -0,0 +1,6 @@
{
"latest": {},
"history": [],
"performance": {},
"firstInspectedAt": {}
}

View file

@ -0,0 +1,36 @@
# Blog indexing status
_Generated by `.github/workflows/blog-indexing-monitor.yml` and
`.github/workflows/blog-indexing-on-deploy.yml`. Do not hand-edit; the
sidecar JSON at `docs/blog-indexing-status.json` is the source of
truth and the next workflow run will overwrite this file._
Verdict legend (from Google URL Inspection API):
- **indexed**`PASS` and coverage state contains "Submitted and indexed"
- **pass** — Google has the URL but is not yet indexed
- **partial** — Google sees the URL with one or more soft warnings
- **fail** — page failed inspection (likely fetch / canonical / robots)
- **error** — automation could not call the API for this URL
Traffic columns come from the GSC Search Analytics API. They lag by
roughly two days and represent canonical URL-level performance, not
site-wide totals.
If a URL stays in `Discovered - currently not indexed` or
`Crawled - currently not indexed` past the T+7 window, the monitor
workflow will open a tracking issue. See
[docs/blog-indexing-automation.md](./blog-indexing-automation.md) for
the full architecture.
## Latest verdict per URL
| Slug | Verdict | Coverage state | Last Google crawl | Last inspected | 7d impressions | 7d clicks | 7d CTR | 28d avg position |
|---|---|---|---|---|---|---|---|---|
| _no inspections recorded yet — the first deploy after this file lands will populate the table_ | — | — | — | — | — | — | — | — |
## Recent inspections
| Inspected | Slug | Verdict | Coverage state | Canonical |
|---|---|---|---|---|
| _no inspections recorded yet_ | — | — | — | — |