open-design/skills/pptx-html-fidelity-audit/references
laihenyi 3d7da43ac8
feat(pptx-fidelity): broaden script coverage beyond CJK (#308)
* feat(pptx-fidelity): broaden script coverage beyond CJK

The audit skill landed in #307 with CJK-biased depth — italic
guidance, font-fallback notes, and line-height tuning all assumed
zh-CN / zh-TW / ja / ko as the non-Latin axis. The repo also ships
fa (RTL), ru (Cyrillic), de / es / pt-BR / hu (Vietnamese-like
Latin Extended), and active discussions for Devanagari / Thai work.
This PR closes the obvious gaps without expanding scope to a full
RTL / bidi discipline (which is Tier 2 and deserves its own PR).

font-discipline.md changes:

- Layer 1 gains a "Coverage gap for Latin-slot scripts" callout —
  Cyrillic / Greek / Vietnamese all go through the Latin slot, but
  Playfair Display / Source Serif have weak coverage. PowerPoint
  silently falls back to Calibri mid-paragraph; readers see a styling
  bug. Includes an `fc-query` snippet for checking coverage.
- Layer 5 retitles from "CJK + Latin italic" to "Italic + script
  interaction" and broadens the rule: italic only applies to scripts
  with an italic tradition (Latin / Cyrillic / Greek). The
  implementation function now checks 11 Unicode ranges (CJK, Hebrew,
  Arabic, Devanagari, Bengali, Thai, Khmer) instead of just CJK.
- New "Beyond CJK — other scripts" section: 7-row reference table
  mapping script family → XML slot (latin / ea / cs) → italic OK?
  → most common defect → recommended faces. RTL handling is
  signposted as needing manual review since `verify_layout.py`
  doesn't check `<a:rtl>` today.
- New "Line height per script" section: per-script `Cursor` gap
  recommendations. Devanagari / Thai / Khmer need 0.16-0.18" headroom
  vs. the 0.12" Latin default because of stacked diacritics, matras,
  and tone marks. Latin with Vietnamese Extended (ếẫỗ) needs 0.14".
- Audit checklist updated to reflect the broader rule and adds a
  "Beyond CJK" line for RTL / non-Latin-CJK content.

SKILL.md: anti-pattern entry rewritten from "Italicizing CJK display
type" to "Italicizing scripts that have no italic tradition", listing
all 6 affected script families.

layout-discipline.md: Cursor section gains a per-script `gap` tuning
callout pointing at the new font-discipline.md table.

Sanity-tested the Unicode range table against representative strings
in 7 scripts; all classifications correct.

Out of scope for this PR (Tier 2 follow-ups):

- Full RTL discipline reference (rtl-discipline.md): bidi rules,
  margin mirroring, paragraph-direction propagation.
- verify_layout.py --rtl mode: assert <a:rtl val="1"/> on RTL slides.
- verify_layout.py glyph-coverage check: would need fonttools and
  per-run script detection.
- Vertical text (tategaki) and Furigana for traditional Japanese
  layouts.

* fix(pptx-fidelity): add Lao to NO_ITALIC_RANGES

The prose in font-discipline.md and the "Beyond CJK" table both list
Thai / Lao / Khmer as scripts without an italic tradition, but the
NO_ITALIC_RANGES tuple in the implementation snippet only covered
Thai and Khmer. Implementers copying the function would have rendered
Lao runs with synthesized italic — exactly the slant defect this skill
is meant to prevent.

Adds Lao (U+0E80-U+0EFF) between the existing Thai and Khmer rows.
Sanity-checked against ສະບາຍດີ.

Caught by Codex on #308.

* fix(pptx-fidelity): address review feedback from #308

P2 fixes (lefarcen):
- Layer 1 coverage check now includes a Windows snippet (PowerShell
  + System.Drawing) and a cross-platform fontforge fallback so
  Windows users aren't stranded when the prose said the skill is
  cross-platform.
- "Beyond CJK" Arabic / Hebrew / Persian row gains a "RTL discipline
  scope" callout that quantifies the remaining gap as ~15-20% of the
  font + layout surface area (Unicode TR9 bidi, chrome / footer
  mirroring, kashida + line-fill, right-anchored alignment) so
  readers can judge whether to wait for the Tier 2 follow-up or
  scope it themselves.
- NO_ITALIC_RANGES expands from 12 to 19 ranges. The earlier table
  asserted "Devanagari / Bengali" as no-italic but stopped there;
  the principle ("no italic tradition") covers all major Indic
  scripts. Added Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada,
  and Malayalam, with an inline comment explaining the principle so
  future maintainers can add Sinhala / Tibetan when needed.

P3 polish:
- "Line height per script" gains a sources note: numbers measured
  against Noto Sans / Noto Serif / IBM Plex line-height for 14pt
  body with full diacritic stacks (Devanagari conjuncts, Thai 4-mark
  sequences, stacked Vietnamese ỗ).
- Audit checklist's "Beyond CJK" entry now includes the unzip + grep
  command for verifying <a:rtl val="1"/> propagation, matching the
  verification style used in Layer 4.
- add_run_with_italic_safety docstring documents what cs_face=None
  means: safe for Latin-only decks, omitted by set_run_fonts when
  no complex-script characters appear.
- layout-discipline.md gap callout now explains how to detect the
  highest-demand script in a mixed deck — scan against the Layer 5
  Unicode ranges (extended with Vietnamese U+1E00-1EFF), pick the
  max-gap per slide, take the deck-wide max for a uniform setting.

Sanity-tested NO_ITALIC_RANGES against representative strings in 15
scripts; all classifications correct.
2026-05-03 00:53:28 +08:00
..
audit-table-template.md feat(skills): add pptx-html-fidelity-audit + wire into export prompt (#307) 2026-05-02 23:32:56 +08:00
font-discipline.md feat(pptx-fidelity): broaden script coverage beyond CJK (#308) 2026-05-03 00:53:28 +08:00
layout-discipline.md feat(pptx-fidelity): broaden script coverage beyond CJK (#308) 2026-05-03 00:53:28 +08:00