S4 deep-dive (S4 = pen-ai-skills migration's 4th sub-project) revealed
the entire 'visual reference' pipeline is dead code:
- executeVisualRefOrchestration has ZERO call sites. ai-chat-handlers
→ generateDesign() → executeOrchestration() directly. No
visual-ref dispatch.
- mode: 'visual-ref' (described in older comments + memory note as
'always-triggered since 2026-03-07') does not exist anywhere in src.
- design-validation.ts's referenceScreenshot path was wired to
getCurrentVisualReference() which was wired to a module-level
variable that nobody ever called setCurrentVisualReference on. The
hasReference branch was always false.
Delete the 4 dead files (~734 LOC) + simplify design-validation:
- visual-ref-orchestrator.ts (243 LOC, dead entry)
- design-system-generator.ts (174 LOC, only called by dead entry)
- design-code-generator.ts (192 LOC, only called by dead entry)
- html-renderer.ts (125 LOC, only called by dead entry, browser-only)
- design-validation.ts: drop the import + 5 clearVisualReference()
calls + the visualRef/hasReference branch in the round loop; pass
undefined for referenceScreenshot. The validateDesignScreenshot
signature keeps referenceScreenshot? for a future visual-ref source.
Rust-side note: the equivalent reference-comparison vision call IS
already implemented in crates/op-orchestrator/src/validation.rs
(reference_instruction text + timeout-doubling + plumb-through), it
just always passes None today — same pattern as ScreenshotProvider:
host wires real source when ready. No Rust changes needed for this
cleanup.
DesignSystem / VisualReference interfaces in ai-types.ts kept as
type-only declarations in case a future visual-ref provider lands.
User-reported 2026-05-11 "Popular Restaurants" — inspecting the live
canvas via batch_get showed the LLM built each row as
`role:card` (outer) carrying stroke + cornerRadius:16 + 2-shadow
elevation, then nested an inner `Card Info` frame ALSO with
`role:card`, cornerRadius:12, and the SAME 2-shadow stack for the
right-hand text column. The doubled decoration rendered as a
visible "border" / box-in-box that the user called out as the
N-tools being "死板" — element-builders deterministically emit
their own card decoration without knowing they're being nested.
New post-pass `stripNestedCardDecoration` walks the page tree and,
for each non-protected frame:
- if the frame has stroke AND any frame ancestor has stroke → strip stroke
- if the frame has cornerRadius > 0 AND any frame ancestor
has cornerRadius > 0 → strip cornerRadius
- if the frame has shadow AND any frame ancestor has shadow → strip effects
Each decoration type is checked independently so e.g. a card inside
a shadow-only ancestor still keeps its cornerRadius. Fills are NOT
touched — stripRedundantSectionFills already handles fill heuristics
and a child fill may be an intentional surface change (dark accent
strip inside a white card).
KEEP_DECORATION_ROLES exempts elements that legitimately carry their
own affordance even when nested in a card: button, chip, search-bar,
input, badge, avatar, switch, etc. Those keep their click-target
visual whether or not the parent is decorated.
Wired in apps/web design-canvas-ops.ts at both finalize sites,
running AFTER stripRedundantSectionFills so the fill pass gets first
crack and this pass cleans up the leftover stroke/cornerRadius/
shadow stack.
Tests: 8 cases — basic strip, partial strip (only matched types),
top-level decoration preserved, protected-role exemption, fills
untouched, deep nesting, asymmetric cornerRadius arrays, no-op
return value.
Codex stop-hook caught: even after the orchestrator-level reuse fix
(a720aac1), `orchestrator-sub-agent.ts` still calls
`isMobileFullScreen(plan)` independently in 2 places (L374 in
executeSubAgent + L734 in buildSubAgentUserPrompt). Both run AFTER
the orchestrator stripped the status-bar subtask, so they see a
smaller subtask count than the orchestrator's pre-strip classify.
A 2-subtask [status-bar, content] plan would flip from "mobile" →
"not-mobile" across the strip, and sub-agent prompt builders would
then disagree with the orchestrator about chrome handling — sub-
agent emits its own status bar / wraps in a phone mockup.
Architectural fix: classify ONCE per plan and memoize the result on
a WeakMap keyed by the plan object. Subsequent calls (whether from
orchestrator, executeSubAgent, or buildSubAgentUserPrompt) return
the cached pre-mutation answer. WeakMap avoids polluting the public
OrchestratorPlan type and lets the cache vacate naturally when the
plan goes out of scope.
This subsumes the orchestrator.ts L838 local-reuse fix from a720aac1
— that path is now safe via memo too — but the explicit reuse is
retained as defense-in-depth + readability (clear that the same
classification value is used at two adjacent call sites).
Tests: 2 new cases — mutation-survives-classify + per-plan
isolation. Existing 7 cases continue to pass.
Codex stop-hook on the 2026-05-10 mobile fallback fix caught a
strip-and-reclassify ordering bug. orchestrator.ts mutates
`plan.subtasks` in-place at L744 to remove the status-bar subtask
on mobile, then 96 lines later re-runs `isMobileFullScreen(plan)`
to gate status-bar injection. The new narrow + multi-subtask
fallback (`subtasks.length >= 2`) flips on the second call when
a plan that originally had [status-bar, content] (2 items, height=0
or non-numeric) drops to 1 item after the strip. Result: status bar
correctly classified as needed, then the strip removes it, then the
re-classify says "actually it's a Type 0 component" → injection
skipped. Round-trip the user back to the original missing-status-
bar bug.
Fix: reuse the `isMobileScreen` constant computed at L742 (BEFORE
the strip). The classification is stable for a given plan — there's
no reason to re-evaluate after our own mutation. Comment pins the
invariant for the next refactor.
User-reported 2026-05-10: DeepSeek "Bistro" mobile food app shipped
without the iOS status-bar chrome that the orchestrator is supposed
to inject for every mobile screen.
Forensic chain: status-bar injection at orchestrator.ts:916/977 is
gated by `isMobileFullScreen(plan)`, which required
`plan.rootFrame.height >= 480`. The LLM plan came back with width=375
but a non-numeric height ("fit_content" or similar). The plan parser's
`asNonNegativeNumber` rejected the string and fell back to the
landing-page preset's `rootHeight: 0`. So the runtime check saw
height=0 → returned false → no status bar.
Fix: when width is mobile-shaped (≤480) and declared height isn't
the canonical tall-page number, fall back to the subtask count. A
plan with 2+ subtasks is structurally a multi-section mobile page;
a Type 0 component (single card / badge / modal) is always 1 subtask.
The new branch keeps Type 0 components correctly classified as
non-mobile-screen (no chrome injection, no mobile-app skill) while
catching real mobile pages whose height got lost in plan coercion.
Tests: 7 cases covering the canonical mobile, desktop, Type 0, and
the new narrow + height-0 + multi-subtask path.
Why: Codex stop-time review 2026-05-10 — clipCardImageCorners was
slotted between unwrapFakePhoneMockups and resolveTreeRoles in the
post-streaming pipeline. Cards that don't carry an explicit
cornerRadius from the sub-agent get one filled in by role-resolver
(e.g. role='card' → default cornerRadius=12). Running clip before role
defaults silently skipped every default-radius card — match policy
required scalar cornerRadius > 0 and saw cornerRadius=undefined at
that point. Net effect: the bug stayed for the most common case where
the model wrote role='card' without a numeric radius.
What: move the clipCardImageCorners call to the freshRoot block right
after resolveTreeRoles + resolveTreePostPass. By that point the
role-resolver has populated defaults, so a card without an explicit
radius now correctly hits the predicate. Reuse the existing
\`freshRoot\` reference (re-fetched after updateNode mutations earlier
in the pipeline) so we don't double-fetch from the store.
No new tests — the existing 9 unit tests already pin the predicate;
the bug was placement-only. 1448 / 1448 pen-core tests, 1110 / 1110
AI service tests still pass.
Why: 2026-05-10 user report — "你看不到任何真正的生产级设计工具的希望"
called out three persistent visible issues with the food-app design:
1. Search bar shows a "weird inner rounded border" (image 7) — the
wrapper around address+search section sets fill + stroke +
cornerRadius together. strip-redundant-section-fills only cleared
the fill; the leftover stroke + cornerRadius kept drawing the
visible inner pill, which has been there for a long time.
2. Card image has all 4 corners rounded (image 8) — Taco Fiesta /
Bella Italia cards show the food image with bottom corners
rounded too, leaving them visually "ragged" against the title
text below that's flush with the card surface.
(The third — Categories padding — is a layout/intent question; left
for later since it can't be auto-fixed safely without knowing whether
the design wants edge-to-edge content.)
What — two coordinated fixes:
Fix#1: strip-redundant-section-fills now removes stroke and
cornerRadius alongside the fill. A misroll wrapper sets all three
together to "look like a card"; the fill gets stripped on detection
but the leftover chrome kept drawing a phantom card outline. The
three travel together so they should be cleared together. New test
pins the search-bar wrapper case.
Fix#2: new clipCardImageCorners pass in pen-core. When a card-shape
parent (scalar cornerRadius > 0, 2+ children, first child is an
image or canonical image-placeholder, image has its own scalar
cornerRadius) is detected, set parent.clipContent = true and remove
the image's scalar cornerRadius. The card's own corner clip then
cleanly handles the image — top corners round with the card, bottom
corners flush against the title below. 9 unit tests pin the
conservative match policy (silent on standalone image, title-first
card, array-form cornerRadius, cornerRadius:0, no image
cornerRadius, nested cards, existing clipContent).
Wired into applyPostStreamingTreeHeuristics right after
unwrapFakePhoneMockups. 1448 / 1448 pen-core tests pass (was 1438; +10);
1110 / 1110 AI service tests still pass.
Why: Codex stop-time review 2026-05-10 caught a content-erasure bug.
The b9b4126b heuristic accepted "0 or 1 children" with no type check,
so a hero frame named "Hero" with a single CTA button child matched.
Then the queue's update path (line 388-391 of processQueue) clears
\`children: []\` when the photo lands — destroying the legitimate CTA
button under it. Same risk for banners with sale text, covers with
nested frames, etc.
What: tighten the children clause in isImageAreaFrameByHeuristic. A
heuristic frame is now accepted only when:
- children is undefined / not an array, OR
- children.length === 0, OR
- children.length === 1 AND children[0].type === 'icon_font'
(the typical "broken image" placeholder hint).
Anything else — single non-icon child OR multiple children — means the
frame holds real content and the children:[] erase step would damage
the design. Such frames are now rejected by the heuristic and stay
unchanged on canvas.
3 new it() cases pin the rejection: hero+button, banner+text, and
cover+nested-frame all return false. Existing single-icon-child case
still passes. 38 / 38 tests in image-search-pipeline.test.ts (was 37;
+1; the rejection case adds 1 new it). 1110 / 1110 AI tests pass.
Why: Codex stop-time review #N (2026-05-10) caught the real bug — my
b9b4126b / 810c2f7a chain detected heuristic image-area frames at
collectImageSearchTargets and enqueued them at enqueueImageForSearch,
but the queue processor's still-needs-fill re-check (line 350-358 of
processQueue) called isUnfilledImagePlaceholderFrame which strictly
requires role='image-placeholder'. Heuristic frames have no role, so
the re-check returned false and the queue silently dropped them
before issuing the fetch — net effect was zero photos for the food-app
card scenario the heuristic was supposed to fix.
What: extract a new pure-function predicate
\`isFramePlaceholderStillUnfilled(node)\` that accepts a node iff it is
EITHER a canonical unfilled placeholder (role-based) OR a heuristic
match (name-based). Queue processor calls this single helper instead
of the strict canonical-only check. The helper is also exported and
test-covered separately so a future regression on this code path
fails loudly.
6 new it() cases pin: positive on canonical + heuristic, negative on
already-filled (canonical AND heuristic), null/undefined, and
unrelated frame names. 31 → 37 tests in image-search-pipeline.test.ts;
1109 / 1109 AI service tests pass overall (was 1103; +6).
Why: 810c2f7a added the parent-walk query mining for heuristic image-
area frames but landed without unit coverage of the two new helpers.
Without it, a future edit to GENERIC_PLACEHOLDER_NAMES or the layout-
word filter could silently regress query quality (e.g. start emitting
"Wrapper" or "Section" as queries) and the food-app card photos would
go back to looking generic.
What: export the two helpers and add 5 it() cases covering:
- explicit imageSearchQuery wins over name
- generic literal "Image" + parent "Bella Italia" → returns "Bella Italia"
- skip layout words ("Card Wrapper") in the parent walk; accept the
next semantic ancestor ("Margherita Pizza")
- 4-hop layout-only chain returns null (maxHops bound)
- fall back to non-generic-but-image-themed name when parent walk
yields nothing ("My Custom Photo")
31 / 31 tests in image-search-pipeline.test.ts pass (was 26; +5).
1103 / 1103 AI service tests pass (was 1098; +5).
Why: b9b4126b's collectImageSearchTargets returns heuristic-matched
frames (named "Image" / "Photo" / "Cover" without the canonical
role) but two downstream code paths quietly dropped them:
1. enqueueImageForSearch only accepted type==='image' or
isUnfilledImagePlaceholderFrame, so the heuristic frames got past
collect but never reached the queue.
2. extractQueryForNode would have returned the literal name "Image" or
"Photo" — useless to the photo search API. The user's "Bella Italia"
restaurant card never gets a relevant photo because the placeholder
frame's name says nothing about the restaurant.
What:
- enqueueImageForSearch grows a third branch: isImageAreaFrameByHeuristic
→ kind: 'placeholder-frame'. Same kind so the rest of the pipeline
treats it identically to a canonical placeholder.
- extractQueryForNode learns to skip "generic" placeholder names
(Image / Photo / Cover / Hero / Thumbnail / Banner / Poster + a few
variants) and walk up to the nearest semantic parent frame name
("Bella Italia" / "Margherita Pizza" / "Sushi House" — whatever the
enclosing card was named). Bounded to 3 hops. Filters layout words
(Card / Wrapper / Container / Section / Frame / Root / Page / Stack /
Row / Column / Content) so we don't end up searching for "Card".
- A new helper findParentSemanticName builds a parent map from the
live document on demand. Cheap for typical designs (< few hundred
nodes); avoids threading parent through every collect / enqueue call
site.
Net effect: a model-emitted plain "Image" frame inside a "Bella Italia"
card now searches for "Bella Italia" instead of literally "Image". The
existing isImageAreaFrameByHeuristic test coverage protects the entry
condition; 1098 / 1098 AI service tests still pass.
Why: 2026-05-09 user report — the food-app card design landed with
empty colored rectangles where restaurant photos should be (Bella
Italia / Green Bowl / Margherita Pizza). Root cause: the model emitted
plain frames named "Image" / "Photo" / "Cover" with a solid fill as
the card-top image area, instead of using add_image_placeholder_v1
(which sets role: 'image-placeholder'). The auto-search pipeline
strict-checks role and missed all of them, so scanAndFillImages found
nothing to search and the cards stayed solid-colored.
What: new isImageAreaFrameByHeuristic(node) supplements the strict
role check. Conservative match — only fires when ALL of:
- frame node WITHOUT role='image-placeholder' (strict path handles
that)
- name matches /\\b(image|photo|cover|hero|thumbnail|thumb|picture|
banner|poster)\\b/i
- width >= 80 AND height >= 60 (filters tiny color swatches)
- exactly one solid (non-image) fill (gradient = decorative, image
= already filled, both skip)
- 0 or 1 children (an icon child is OK, multi-child = real layout)
collectImageSearchTargets walks both the strict and heuristic paths
and produces the same kind: 'placeholder-frame' target either way, so
the existing query / aspect / search code-path runs unchanged.
10 new tests cover positive matches (Image / Photo / Cover / Hero /
Thumbnail / Banner / Poster), negative for canonical placeholder
(double-counting prevented), unrelated names (Card / Wrapper),
already-filled image fills, gradients, content-rich frames, single-
icon-child acceptance, undersize frames, and non-numeric dimensions.
26 / 26 tests in image-search-pipeline.test.ts pass (was 16; +10).
1098 / 1098 AI service tests pass (unchanged).
Why: 3abdfcf9 added integration tests for 4 of the 6 new aesthetic
detectors (rotation / text-cornerRadius / text-stroke / mixed-sibling-
cornerRadius). text-effect (7aef1b14) and mixed-sibling-padding
(ad025c95) landed without integration coverage of the
detect → applyFixes → store mutation chain.
What: 2 more it() cases mirroring the existing pattern:
- text-effect: text node with shadow effects → effects cleared
- mixed-sibling-padding: 3 cards with padding 16/16/20 → outlier
rewritten to scalar 16 (collapsed from the 4-tuple modal because
all sides are equal — locks in the scalar-collapse code path)
13 / 13 tests in design-pre-validation.test.ts pass (was 11; +2).
1088 / 1088 AI tests overall (was 1086; +2).
Why: even after the icon-catalog skill rewrite (19ca1c66) said "ALWAYS
USE icon_font, NEVER path NODES", end-to-end testing with MiniMax-M2.7
shows the model still emits ~half its icons as \`path\` nodes (Bell
Icon / Cart Icon etc.) because the per-subtask CRITICAL LAYOUT
CONSTRAINTS prompt — which the model treats as the prompt of record
— never restated the icon convention. The skill prompt is upstream
context the model can drift away from; the per-subtask block is the
last thing the model reads before generating, so it carries weight.
What: append a one-line ICONS rule directly to CRITICAL LAYOUT
CONSTRAINTS in orchestrator-sub-agent.ts. Restates the icon_font shape
inline (\`{"type":"icon_font","iconFontName":"<lucide-name>",…}\`) and
calls out the failure mode by name (resolver guess + placeholder
circle) so the model sees both the right pattern and the consequence
of the wrong one.
The path-with-iconic-name fallback path keeps working — this is purely
preventive guidance, the resolver tokenize fix (ef6f7ed3) still
catches the leftover cases. 1086 / 1086 AI tests still pass.
Why: the 4 new aesthetic detectors (rotation / text-cornerRadius /
text-stroke / mixed-sibling-cornerRadius) have unit tests against the
pure detect functions in pen-ai-skills, but the chain through the live
Zustand store + runPreValidationFixes had no end-to-end coverage.
Without it, a future refactor that subtly breaks the apply step (e.g.
suggestedValue:undefined not clearing the prop) would slip through.
What: 4 new integration tests in design-pre-validation.test.ts using
the existing makeDoc / loadDocument fixtures. Each builds a doc with
exactly one known aesthetic issue, runs runPreValidationFixes, and
asserts the store mutation took effect:
- rotation:12 on a frame → reset to 0
- cornerRadius:8 on a text node → cleared (undefined)
- stroke on a text node → cleared (undefined)
- mixed cornerRadius across 3 sibling cards (8 / 8 / 12) → outlier
rewritten to modal 8 (note: handled by the older
sibling-inconsistency detector; mixed-sibling-corner-radius is
the dedupe-loser here since sibling-inconsistency runs first
with the same {nodeId, property} key — both produce the right fix)
11 / 11 tests in design-pre-validation.test.ts pass (was 7; +4).
1086 / 1086 AI tests overall (was 1082; +4).
Why: C9 (b1ffa1c5) removed `image` from the resolver noise list to make
"Image Icon" resolve to lucide:image. The doc comment claimed "Image
Placeholder Path" still resolves via prefix fallback, but the math is
wrong — `image` is 5 chars, `imageplaceholder` is 16 chars, 5/16 = 31%
which is below the 50% FALLBACK_MIN_RATIO. So "Image Placeholder Path"
genuinely no longer hits the resolver fallback path. That's actually
correct (no icon marker word, resolver returns early), and the original
food-app circles came from the model emitting circle path-data
directly, not from the resolver fallback. But the photo / camera
aliases were also implicit dependencies that deserve explicit coverage
to lock in the behavior.
What: 2 new tests verify photo / camera resolve correctly through the
already-existing alias chain (`photo: _IMAGE` in BUILTIN_ICONS, lucide
camera native). 1082 / 1082 AI tests pass (was 1080; +2).
Why: when the pre-validation pass auto-fixes issues, the chat panel
just says "Pre-checks: fixed 5 issues" — generic and uninformative.
The user can't tell whether 5 invisible-container fixes happened
(structural, mostly safe), 5 unexpected-rotation fixes (aesthetic,
worth reviewing), or 5 mixed-sibling-padding fixes (consistency, worth
reviewing). With 10 detector categories now (4 original + 6 aesthetic
added in 53435bf7 / 7aef1b14 / cd1e4325 / ad025c95), the per-category
visibility starts to matter.
What:
- runPreValidationFixesDetailed() returns { total, byCategory } where
byCategory is a per-category count of APPLIED fixes (excludes the
info-severity skips and the protected-status-bar skip).
- runPreValidationFixes() kept as a thin wrapper returning .total so
no caller needs to change.
- design-validation.ts now uses the detailed result and formats the
breakdown as e.g. "fixed 5 (3 text-effect, 2 unexpected-rotation)"
in the chat panel — sorted by count descending so the dominant
category surfaces first. Both the no-vision-validation path and the
size-gated skip path show the breakdown when it exists.
Falls back to the legacy "fixed N issues" format when byCategory is
empty (defensive — should never happen if total > 0). 1080 / 1080 AI
tests still pass — the new return shape is additive and the wrapper
preserves the integer contract.
Why: dd8eb0eb's unwrap pass (Type 0 single-component section root
hoist) was integration-tested via the live Playwright run but had no
unit coverage. The integration test won't catch regressions when
someone tightens the heuristics — and the load-bearing "do nothing"
guards (multi-section, mobile screen, desktop, 0/N children, non-frame
child) are exactly where a careless edit would silently flatten a
multi-page design.
What: split the helper into two — a pure predicate
shouldUnwrapSingleComponentSectionRoot(plan, root) returning bool, and
the existing unwrapSingleComponentSectionRoot(rootNodes, plan) which
calls the predicate then mutates the store. Predicate is exported.
10 new tests cover:
- 3 positive: wrapper id ends -root / wrapper id ends -section /
wrapper name copies parent name
- 7 negative load-bearing guards: multi-section plan, mobile screen
(height >= 480), desktop (width > 480), root with 0 children, root
with multi children, wrapper with no children, wrapper with
unrelated id+name, wrapper is non-frame (text / icon)
1080 / 1080 AI tests pass (was 1070; +10).
Why: every time the vision validation loop returned skipped:true the
chat panel logged the same hardcoded "(timeout or provider error)"
string regardless of the actual cause — provider mismatch, HTTP error,
upstream config issue. Now that the server (validate.ts) returns
explicit skip reasons (e.g. "Vision validation is not supported for
builtin providers"), the UI should surface them so the user can fix
the right thing instead of guessing it's a timeout.
What: ValidationResult gains an optional `skippedReason` field.
validateDesignScreenshot fills it from response.json's `error` (or
the HTTP status text on a non-OK response) and propagates it through
the loop. The chat-panel status line now reads
"[error] Analysis skipped (<reason>)" with the server-provided
message clipped to 120 chars; falls back to the legacy string when no
reason is present.
1070 / 1070 AI tests still pass; no test depended on the literal
"timeout or provider error" string.
Why: builtin providers (MiniMax / DeepSeek / Bailian / Ark) currently
fall through to the generic "Missing or unsupported provider" error in
/api/ai/validate. The post-generation loop catches that as a hard
provider error and logs "[error] Analysis skipped (timeout or provider
error)" — which reads like a config bug to the user even though the
real reason is "this provider's models are text-only, vision validation
isn't useful here even if we did proxy it".
What: branch on body.provider === 'builtin' before the generic error
and return { skipped: true, error: '<explanatory message>' }. The
client design-validation.ts already short-circuits on `data.skipped`
so the loop now logs the clearer message instead. No behavior change
for the four supported providers; no new wire fields.
Why: end-to-end test of "Design a bottom nav with Home / Search /
Orders / Cart / Profile" surfaced a stray coloured pill highlight
wrapping the Search tab. Root cause: the model labels the cell
\`role: 'search-bar'\` (intending "this tab whose icon is search"),
and the role-resolver dutifully stamps the input-shaped 44px-tall,
22-corner, filled-surface look onto the nav cell. Inside a 56px tall
tab row that pill swallows the icon + label, looks broken on canvas,
and competes for click area with the nav-item active state.
What: search-bar role now early-outs with `{}` (no overrides) when
ctx.parentRole is one of `bottom-tab-bar` / `tab-bar` / `tab-row` —
mirroring the same check the `button` role already uses to skip its
text-button defaults inside tab containers. Nav-cell layout / fill
remains the responsibility of nav-item / nav-item-active.
1070 / 1070 AI tests still pass; the input-shape default still applies
in every other context (forms, headers, hero search, etc.).
Why: for Type 0 component plans (Notification Card / Profile Card / …)
the orchestrator pre-inserts a page rootFrame named after the component,
then the sub-agent emits its own section-root frame as the only child.
Result is a visible "Notification Card → Notification Card" double wrap
in the layers panel and a wasted layout depth that does nothing visual.
The double wrap was confirmed in the 2026-05-09 end-to-end test of the
notification-card prompt: depth-0 = orchestrator rootFrame (role=card),
depth-1 = sub-agent wrapper (also role=card), actual children at depth-2.
What: new unwrapSingleComponentSectionRoot pass added as Phase 4c right
after the mobile-status-bar dedup (mutually exclusive: that runs only on
mobile, this runs only on component-shaped plans). Conservative match —
only fires when:
- plan.subtasks.length === 1, AND
- plan.rootFrame is narrow (≤480) and auto-height (<480 or 0), AND
- the orchestrator rootFrame has exactly 1 frame child, AND
- that child's id has the sub-agent section-root suffix
(`-root` / `-section`) OR the child copied the parent's name.
When the conditions hold, hoist the wrapper's children up via
store.moveNode (preserving order) and remove the wrapper. Multi-section
pages, dashboards, and mobile screens are untouched — early-out on the
plan.subtasks.length / width / height checks.
1070 / 1070 AI tests still pass; unit-testing this against the live
Zustand store is awkward, the integration verification will land via
the next end-to-end notification-card run.
Why: my prior C3 resolver fix added `image` to ICON_NOISE_WORDS so
"Image Placeholder Path" (a non-icon container name) wouldn't collapse
to a circle. That was overcorrecting — `image` is also the canonical
Lucide icon key for the picture/photo glyph, and the model frequently
emits "Image Icon" meaning exactly that. With image stripped, "Image
Icon" tokenised to [] and the resolver returned without writing the
matched lucide:image path.
What: remove `image` from ICON_NOISE_WORDS, with an inline note that
the multi-word "Image Placeholder Path" pattern still resolves through
the prefix fallback (`image` covers >= 50% of `imageplaceholder` so
findPrefixFallback picks it up). Add a regression test for "Image Icon"
→ /image/.
1070 / 1070 AI tests pass (was 1069; +1).
Why: end-to-end test of "design a notification card with dismiss x
button" surfaced that MiniMax-M2.7 emits a path node named "Dismiss
Icon". Tokenisation gives "dismiss" but Lucide doesn't have a `dismiss`
key — the resolver fell through prefix/substring fallbacks and wrote
the placeholder lucide:circle, leaving the card with a hollow ring
where the X should be.
What: 5 new aliases added in lock-step to icon-dictionary.ts (client
commonAliases) + icon.ts (server NAME_ALIASES per existing comment):
- dismiss → x (close button intent)
- closebutton → x (compacted from "Close Button Icon")
- cancel → x (cancel-action close icon)
- remove → x (remove-action close icon)
- expand → maximize-2
- collapse → minimize-2
NOT aliased: `cross`. Lucide already ships a `cross` icon (the
Christian-cross shape) and overriding it would lose that geometry.
"Cross" disambiguation is left to the model — if it really means a
close button, telling it to write "Dismiss Icon" / "Close Icon" via
the icon-catalog skill is enough.
Tests: 3 new it.each cases (Dismiss / Cancel / Remove Icon → /x/).
1069 / 1069 AI tests pass (was 1066; +3).
Why: Codex stop-time review #6 — C6 added workspace / console / 工作台 /
工作区 to the component DISQUALIFIER, but the dashboard detector regex
still only matched dashboard|admin|管理|后台|控制台. So "design a
workspace with side panel" skipped component (correct) AND skipped
dashboard (regex miss) and fell through to landing-page (1200×0,
4-section), which is the wrong shape for a workspace UI — the user
wants a 3-section desktop-screen with header/main/actions.
What: dashboard detector regex extended in lockstep with the
disqualifier — dashboard|admin|workspace|console|管理|后台|控制台|
工作台|工作区. Comment makes the "keep in sync" invariant explicit.
Tests: 4 new positive cases (Latin workspace + console, zh-Hans 工作台
+ 工作区 with 卡片) assert the plan returns 1200×800 with the 3-
section ['Header','Main Content','Actions'] layout, not the 4-section
landing-page default.
1066 / 1066 AI tests pass (was 1062; +4).
Why: Codex stop-time review #5 — broadening the component trigger list
from 17 to 25 nouns introduced false positives:
"admin dashboard with metric tiles" → matched `tile` → Type 0 (400×0)
when the user clearly wants a desktop dashboard. Same for "design an
admin panel" / "workspace with charts" / Chinese 后台管理 + 卡片.
What: COMPONENT_DISQUALIFIER_RE gains three new keyword buckets in
addition to the existing screen / page / home / onboarding / flow:
- mobile-screen markers — mobile, phone, ios, android, 手机, 移动端
- workspace markers — dashboard, admin, workspace, console, 管理,
后台, 控制台
+ zh-Hans 屏幕 (screen) was already added in C5.
These ensure component classification is reserved for "X card / X chip /
…" prompts that have no surrounding screen/dashboard/mobile context.
The dashboard / mobile prompts then continue down to their own explicit
detector branches and produce the right preset.
Tests: 7 new negative cases covering admin dashboards with tiles,
charts, panels, Chinese 后台 with 卡片, and mobile/phone prompts that
also mention card/badge. 1062 / 1062 AI tests pass (was 1055; +7).
Why: Codex stop-time review #4 — the previous regex covered ~17 nouns
but design-type.md documents 25 (button / label / row / item / selector
/ panel / chart were missing) and the CJK 卡片 alias was also listed.
JS `\b` is ASCII-only and never fires between two CJK chars, so
`\b卡片\b` matched nothing in "design a 卡片".
What: split into COMPONENT_TRIGGER_LATIN_RE (full noun list with `\b`
boundaries) + COMPONENT_TRIGGER_CJK_RE (kana-free subset of the most
common Chinese aliases — 卡片 / 徽章 / 标签 / 按钮 / 开关 / 对话框 /
提示 / 气泡 / 图表). Either match is enough to classify Type 0.
Disqualifier regex also gains 屏幕 (screen in zh-Hans).
Tests: 23 it.each cases pin one Latin trigger each plus the CJK 卡片;
6 negative cases prove the disqualifier still wins for "X screen / page
/ app / onboarding / flow" prompts. 1055 / 1055 AI tests pass (was
1027; +28 new).
Why: Codex stop-time review #3 flagged "Type 0 component handling is
incomplete". The earlier C1 fix (orchestrator-plan-classify helper +
isMobileFullScreen heuristic) covered the orchestrator path, but four
more places still bucketed narrow widths (≤480 / ≤500) as mobile and
mishandled component-shaped plans.
What:
- agent-tool-executor.ts: replace `width<=500 ? 375 : 1200` bucket on
setGenerationCanvasWidth with the inserted node's actual width — a
400-wide profile card now estimates text against 400, not 375.
- design-type-presets.ts: add 'component' to DesignType union with
width=400, height=0, and a single-section default. detectDesignType
matches "X card / X badge / X chip / ..." prompts BEFORE the mobile
/ dashboard check, so the parse-failure fallback returns a 400px
component instead of a 1200px landing-page for "design a profile
card". Disqualified when prompt also names a screen / page.
- orchestrator-prompt-optimizer.ts: 3 spots — platform selection now
uses preset.type==='mobile-screen' (component groups with webapp,
not mobile, since it has no status bar / bottom nav); compact
prompt rules and subtask hint get a component branch ("Use width=400
height=0, exactly 1 subtask, no chrome"); fallback height map gives
components a single 200px region instead of 800.
- orchestrator-planning.ts: buildFallbackHeights treats narrow +
auto-height plans as component-shape and emits 200px sections,
preventing the prior "812 / 1 = 812-tall card" output.
2 new tests pin: (a) "design a clean profile card" → 400×0 single
"Component" subtask with 200px region; (b) "design a card screen page"
must NOT shortcut to component (screen/page disqualifier holds).
Why: MiniMax-M2.7 keeps emitting path nodes named "Search Icon Path" /
"Time Icon Path" / "Heart Icon Stroke" (3 words ending in noise word).
The legacy resolver normalised to "searchiconpath" (15 chars), prefix
fallback found "search" (6/15 = 40% < 50% threshold) → rejected →
fallback to lucide:circle → user-visible "circle bug" across categories,
filter chips, and search bar leading icons. Skill update alone (telling
models to use icon_font) doesn't fix the trained-pattern leftover —
Codex flagged this as a still-unfixed failure mode.
What: extractIconKeyword() tokenises on camelCase / space / dash /
underscore boundaries and drops { icon, logo, symbol, glyph, path,
shape, stroke, fill, svg, graphic, image }. Surviving tokens are
concatenated for direct dictionary lookup. Pure-noise names ("Icon
Path", "Symbol") return early without writing the misleading circle
placeholder. time / deliverytime / rider aliases added (kept in sync
across icon-dictionary.ts and server icon.ts NAME_ALIASES per existing
comment). 11 new tests cover multi-word resolution and pure-noise
no-op; 21 prior regression cases (descriptive geometry untouched,
single-word camelCase / kebab / snake all resolve, "Brand Logo"
placeholder behaviour preserved) still green.
Why: "Design a profile card" through MiniMax-M2.7 produced a 375×803 mobile
screen with auto-injected status bar, because the planner skill listed
"profiles" as a Type 2 single-task screen and the orchestrator's
isMobileScreen heuristic ran on width≤480 alone.
What: design-type.md + decomposition.md add Type 0 (single component:
card / badge / chip / modal) with width=400 height=0 1 subtask no chrome.
isMobileFullScreen helper extracted to orchestrator-plan-classify.ts and
required by both orchestrator.ts and orchestrator-sub-agent.ts so the
two paths can't drift on what "mobile" means (Codex review caught this
when only orchestrator.ts had the new check).
Verified with same MiniMax + same prompt: 400×320 component, 8 nodes,
firstChildRole=card, no status-bar.
ab-corpus rerun (gpt-5.5, ab-v3, 52 prompts × 2 arms): obvious-T M3
59.6% -> 91.5% (+31.9pp); composite-T 0% -> 40% (+40pp). Lift on top
of d5d1a8cd (Rank 1 schema coerce), 9e90cffe (Rank 2 prompt fail watch),
e34d9238 (Rank 3 vision toggle).
Builder fallback minima:
- chart-pie/line/bars-v1: values [1] -> [30,25,20,15,10] / [10,15,12,20,18]
so chart-pie-slice (>=4) and chart-line-dot (>=7) corpus minima are met
- toolbar-v1: fallback items include a divider_after entry so toolbar-divider
role emits even when the model passes only icons
- avatar-group-v1: entry-coerce items with 5 placeholder initials so the
builder always emits avatar-group-{item,initial,overflow,overflow-count}
- combobox/data-table-row/share-row-v1: fallback arrays grown to 3 items
matching the corpus shape minimums
Optional-content discipline (codex stop-time round 2):
- user-card-v1: name field fuzzy-coerce (required field, real fix for the
"Element tool insert failed: I(null,...)" handler bug); the optional role
text stays conditional, never invented. content empty-string was tried
but rejected (empty text nodes still consume flex gap).
- image-placeholder-v1: label stays conditional for the same reason.
Prompt:
- elements.md fail-watch table extended with 5 components (chart legend,
skeleton, inline-action, share-row, combobox) so models routing to
batch_design at least know the role names.
Multi-page vision validation (codex stop-time round 1):
- design-validation.ts: countNodesInActivePage + buildNodeTreeDump now
read getActivePageChildren(activePageId) instead of DEFAULT_FRAME_ID,
so the size-gate and the LLM's tree dump both reflect the page the user
is actually editing rather than the default page. Was a latent bug
surfaced when VALIDATION_ENABLED flipped to true in e34d9238.
Tests: 4223/4223 pass; format:check + tsc clean. 12 files changed.
Toggle VALIDATION_ENABLED from false to true so the post-generation
vision LLM validation loop runs. The loop itself was fully built in
design-validation.ts long ago — only ai-runtime-config:109 was holding
it at runtime.
Add VALIDATION_NODE_COUNT_THRESHOLD=30 size-gate so atomic single-tool
outputs (one badge, one chart) skip the +30-90s vision round-trip.
Composite multi-section briefs (full-page mockups, dashboards) easily
clear the threshold and benefit from the screenshot -> vision LLM ->
safe-fix -> re-screenshot rounds.
Pre-validation heuristics (ms-cheap tree walks) still run regardless
of size.
Plumbing was already done before this commit:
- design-screenshot.ts captureRegion() shipped in Phase 1.5
- design-validation.ts MAX_VALIDATION_ROUNDS=3 loop fully implemented
- validate.ts has 4 vision provider paths (Anthropic Agent SDK,
Codex CLI, OpenCode SDK, Gemini CLI)
This commit only flips the flag and adds the size-gate heuristic.
Tests: 4223/4223 pass; format:check + tsc clean.
Predicted KPI lift: M3 composite +5-10pp (speculative). Vision
catches what schema-coerce + role-hint can't — mis-positioned sibling
sections, missing component spacing, color-contrast issues. Gating by
node count keeps user-perceived latency contained to designs that
actually need it.
Out-of-scope for this commit (followups if needed):
- ab-corpus glm/minimax/deepseek client image_url part injection
(lets the eval harness exercise vision for KPI verification)
- builtin Zig agent-native runtime image part support (only matters
if the embedded provider becomes the default)
Codex flagged: \`normalizeTreeLayout\` strips \`x\` / \`y\` from
non-overlay children of any vertical / horizontal layout container
as a stale-coordinate cleanup. The new
\`convertStackedOverlayToAbsolute\` post-pass was wired in AFTER
normalize, so when a sub-agent emitted an intentional content
offset on a layered hero — e.g.
hero { layout: 'vertical', height: 200, children: [
image { full bg },
overlay { full bg gradient },
content { x: 16, y: 80 } ← inset above the gradient
]}
normalize would delete the \`x: 16, y: 80\` first, then convert
would flip layout to 'none' on a hero whose children have no
positions to honor. The content frame ends up at (0,0) overlapping
the bg image instead of where the model placed it.
Move convert to run BEFORE normalize. After convert, the
container's layout is 'none' so normalize sees an absolute-
positioning container and leaves the children's x/y untouched.
The function is a no-op when no layered pattern matches, so
running it earlier doesn't add cost on the common path.
New test asserts: convert + normalize (in that order) preserves
content's x=16, y=80 through the chain. Verified by reversing the
order in the test — assertion correctly fails with
"expected undefined to be 16", proving the regression coverage
actually exercises the bug condition.
M2.7 food-app run shipped a hero whose content piled into the next
section. Live doc inspection showed:
hero-image-container { width: 'fill_container', height: 200,
layout: 'vertical' }
├─ hero-image { width: 'fill_container', height: 200 }
├─ hero-overlay { width: 'fill_container', height: 200 } // gradient
└─ hero-content { width: 'fill_container', height: 'fit_content' }
├─ "Hungry?" title
└─ search-bar (48 tall)
The model intended the image + overlay to LAYER on top of each
other as bg+gradient with content floating on top. With
\`layout: 'vertical'\` the layout engine instead stacked them
sequentially: 200 + 200 + ~80 = 480, far past the 200 declared
height. No clipContent on the container, so the overflow rendered
into the NEXT sibling section — the user's screenshot showed
"Hungry?" search and category icons piled over the "Near You"
restaurant cards.
\`convertStackedOverlayToAbsolute\` post-pass detects the pattern
conservatively:
- frame, layout='vertical' (or undefined → infers vertical)
- numeric fixed height H
- >= 2 children of types image / rectangle / frame whose height
is exactly H or 'fill_container'
The repair: switch \`layout\` to 'none' so the layout engine
respects each child's own x/y (defaulting to 0/0 = layered) — the
image lands at (0,0), the overlay layers on top, and the content
frame floats on top. Children with explicit positions stay
respected.
Wired into \`design-canvas-ops.ts::applyPostStreamingTreeHeuristics\`
right after \`expandOverflowingFixedHeightCards\` so both layered
and overflowing-fixed-height fixes run together.
6 tests cover: hero pattern conversion, fill_container variant,
plain content stacks left alone (only one bg-like child), no
fixed height left alone, horizontal-layout side-by-side rows
left alone, nested heroes detected.
MiniMax M2.7 food-app run failed because the model emitted its full
subtask design wrapped in a single JSON array literal:
[
{ "id": "filterChips-root", "_parent": null, "type": "frame", … },
{ "id": "chip-1", "_parent": "filterChips-root", … },
…
]
The previous \`looksLikeJsonl\` gate only checked
\`startsWith('{')\` so this fell through to the DSL parser, which
tried to read \`[\` / \`{\` / \`}\` each on its own line as DSL
operations. Every line was rejected, the subtask returned empty,
the orchestrator retried with minimal skills, that timed out too,
and the user got a single-frame placeholder with one section
instead of the full screen.
Extended the gate to accept \`[\` as the leading character. The
shape signature stays the same (\`_parent\` or PenNode \`type\`
key inside the first 800 chars) — the bracket check just
disambiguates from real DSL. \`parseJsonlToTree\` already handles
both shapes via brace-counting (it scans for \`{...}\` blocks and
ignores surrounding \`[\`, \`]\`, and \`,\`), so the apply path
needed no changes.
Exported \`looksLikeJsonl\` for direct unit testing. 6 new tests
cover: pure JSONL match, JSON-array match (the M2.7 case),
array with leading whitespace, DSL-style assignment lines reject,
empty/non-bracketed reject, and bracketed-but-no-PenNode-keys
reject (so we don't reroute legit non-design array operations).
Verified by temporarily reverting the gate to just \`{\`: the two
new array tests correctly fail.
Image #44 banner shipped with the "Order now" button cut in half:
\`featured-promo-card { role: 'card', height: 165, clipContent: true }\`
held a vertical content stack (badge + title + body + button) whose
natural height was ~220px on the model's wrapped column width. The
card role default sets \`clipContent: true\` to keep image children
inside rounded corners, so the overflow got rendered then clipped at
y=165, making the bottom row of content disappear.
New \`expandOverflowingFixedHeightCards\` post-pass:
- Walks the tree.
- For each frame whose \`role\` is in CARD_ROLES (card, stat-card,
pricing-card, feature-card, image-card, testimonial, event-card,
product-card) AND \`height\` is a positive number AND
\`fitContentHeight(node) > height\`, switches \`height\` to
\`'fit_content'\`.
- Returns true if any card was patched.
Why fit_content, not removing clipContent: clipContent is what makes
nested image children respect the card's rounded corners. Removing
it would un-clip the button (good) but un-clip the image edges (bad
— image bleeds past the card's corner radius). Just letting the
card grow keeps both invariants right.
Also wired in: \`design-canvas-ops.ts\` calls the new pass right
after \`injectMissingNavSurfaceFill(pageRoot)\` in the streaming /
dispatcher post-pass chain. The card-overflow fix runs ONCE per
post-pass invocation on the page root, so all card-family children
on the page get checked together.
Side fix: button role default for tab-style buttons (parent role is
bottom-tab-bar / tab-bar / tab-row AND layout='vertical') now
returns \`padding: [6, 4], gap: 4\` instead of falling through to
the text-button \`[12, 24]\` default. Only affects the case where
the model omits padding on the tab cell — sub-agents that emit
explicit padding still win (per applyDefaults' missing-only rule).
5 new tests cover: banner-style overflow gets fit_content, fitting
content stays at fixed, non-card roles never get touched, already
auto-sizing cards stay alone, and overflow detection walks into
nested sections.
Two visible regressions in Image #44:
1. Bottom nav reverted to no-background even though earlier runs
worked. GPT-5.5 wrapped its bottom nav in a single-child section:
root > frame{role:'section',id:'bottom-tabs-root'}
> frame{role:'bottom-tab-bar'} > [tabs]
The inject pass only walked DIRECT children of root and bailed on
the section wrapper. Now we hop one level when the wrapper is a
single-child section AND its sole child is a nav-role frame, so
the nested nav gets the surface fill + position-aware shadow.
Multi-child sections still bail (those are real content sections,
not wrappers).
2. Banner "Order now" CTA shipped with white text + dark icon. My
prior contrast fix used a luminance-delta threshold of 0.4, but
#0F172A icon vs #F97316 (orange accent) actually has delta 0.48
— the threshold said "good contrast, leave it alone" while the
user sees an obvious mismatch with the white text label.
Wrong axis: the user's complaint is about CONSISTENCY (icon
should read as the same token as text), not raw contrast.
Refactored fixButtonForegroundContrast:
PASS 1 — find a "reference" foreground from sibling text fill
(after refs resolve). The model's own text color is the
authoritative signal for what the button's foreground should
look like, regardless of what bg/fg luminance suggests.
PASS 2 — for each icon_font sibling, override when its
resolved hex differs from the reference fg. Icon-only
buttons (no text sibling) fall back to a luminance-based
check at threshold 0.5 — catches dark-on-dark / light-on-
light pairs that motivated the original rule, without the
false-negative on saturated mid-luminance bgs (orange).
3 new tests: wrapper-section nav reach, multi-child wrapper bail,
and the regression test for the original "dark-on-dark icon-only
button" still passing under the new luminance-fallback path.
141 tests in the affected suites all green.
Side effect: applyNavSurfaceFill now bails entirely (returns false)
when the nav already has a fill — earlier version still added a
shadow even when fill was preserved, which violated the
"preserves sub-agent intent" semantics the existing tests rely on.
Image #42 logs showed every image fetch URL came out wrapped twice:
http://localhost:3000/api/local-asset?path=%2Fapi%2Fai%2Fimage-proxy%3Furl%3D...
The image-search pipeline correctly returned
\`/api/ai/image-proxy?url=...\` thumbUrls (so browser fetches go
through the dev server, which can reach openverse via the system
proxy). But \`isLocalAssetPath\` only excluded \`data:\`/\`https?:\`/
\`blob:\` from local-asset bridging — anything else, including
absolute paths starting with \`/api/\`, was treated as a file-system
asset and re-wrapped through \`/api/local-asset?path=\`. That bridge
handler then 404s because the encoded path \`/api/ai/image-proxy?...\`
isn't a real file. Net effect: every search-found image stayed at
the placeholder visual even though the search succeeded.
Add a same-origin route carve-out:
/^\/(?:api|_)\//
Paths under those prefixes are runtime endpoints (Nitro \`/api/*\`,
Vite \`/_/*\`), not file system assets, so they pass through the
resolver as-is. \`/assets/hero.png\` and similar absolute file-style
paths still go through the local-asset bridge.
New regression test covers /api/ai/image-proxy, /api/local-asset,
and /_/* paths returning false from isLocalAssetPath, and verifies
ordinary /assets/... paths still return true.
Codex flagged: the previous version cleared the AbortController
timeout in a finally{} block right after \`await fetch()\`, but
fetch() resolves as soon as the response headers arrive — the
body read happened later in the \`reader.read()\` loop with no
timeout protection. An upstream that drip-feeds bytes (or stops
mid-stream) would leave the dev server hanging on
reader.read() forever.
Single AbortController + timeout now covers the entire request
lifecycle (DNS + TLS + headers + body). The clearTimeout moves to
the outer finally{} so it fires regardless of return path
(success, 4xx, 5xx, abort) but never AHEAD of the body read.
Side benefit: AbortError thrown by the controller's timeout (or
by the size-cap controller.abort()) now lands in the catch clause
with a distinguishable error.name === 'AbortError'. Translate it
to a 504 Gateway Timeout when the timeout was the cause, so the
caller can distinguish a slow-upstream from a generic
fetch failure (502).
Codex flagged: the previous version did
\`Buffer.from(await upstream.arrayBuffer())\` which buffers the
entire upstream body into memory with no upper bound. Wikimedia
Commons originals can be 100 MB+, and a malicious request could
point at any arbitrarily-large file on an allow-listed host (an
upstream big enough to OOM the dev server is reachable behind
plenty of legitimate-looking URLs).
Hard 16 MiB cap on every proxied response:
- Read upstream's declared Content-Length first; reject (413) if
it already advertises more than the cap, before reading a single
byte.
- Stream the body via \`getReader()\`, accumulate in chunks, and
bail (cancel reader, abort fetch, return 413) the moment total
bytes cross the cap. Subsequent chunks are never buffered.
- Move the timeout from \`AbortSignal.timeout(15000)\` to a manual
AbortController so the same controller can also abort on
size-limit hit.
16 MiB sits well above any reasonable thumbnail and even high-res
4K JPEGs (~5–8 MiB), but well below the territory that risks
heap pressure from a single fetch.
Image #40 logs showed three "Failed to load image" errors for
api.openverse.org/...thumb/ URLs even though the search-pipeline
successfully fetched URLs from openverse via the dev server (the
HTTPS_PROXY fix from 3a6f8480 routes server-side fetches through the
local proxy). The browser-side image loader doesn't go through the
same dispatcher: \`new Image(); img.src = url\` does a direct
browser fetch that ignores HTTP_PROXY env vars, so on a machine that
requires the proxy to reach openverse.org the canvas paints the
placeholder visual even though the search-pipeline already found a
valid image URL.
New endpoint \`/api/ai/image-proxy?url=<encoded-url>\`:
- Proxies image bytes through the dev server.
- Reuses \`configureProxyDispatcher\` so the upstream fetch routes
through HTTPS_PROXY (same path as image-search).
- Allow-lists known image hosts (openverse, wikimedia, flickr's
static CDN) to prevent the dev server being used as an open
proxy. Unknown hosts get 403.
- Forwards Content-Type and Cache-Control from upstream.
- Sets Access-Control-Allow-Origin: * so canvas readback works.
\`mapOpenverseResult\` and \`mapWikimediaPages\` now return thumbUrl
wrapped via \`viaImageProxy(externalUrl)\`. The browser fetches
\`/api/ai/image-proxy?url=...\` (same-origin, no proxy needed),
the server fetches the upstream (with proxy), bytes flow back,
the canvas paints the photo. Test expectations updated to assert
the proxy wrapper.
Net effect: with HTTPS_PROXY set, image search end-to-end (search
results found AND images actually load in canvas) works on
proxy-required dev machines. With no proxy env var set
(production / CI), the cascade is still well-behaved — proxy
dispatcher is a no-op, server fetch is direct, no proxy wrapping
is necessary but it doesn't hurt either (the endpoint just adds
a hop).
Codex flagged: the previous fix (b3180534) read
\`doc.themes[SEMANTIC_PALETTE_THEME_AXIS]\` as the dark-mode
discriminator, but \`seedDocVariablesFromStyleGuide\` — the only
production writer of theme-related doc state — writes ONLY
\`doc.variables\`, never \`doc.themes\`. So on a real
orchestrator-emitted dark-mode design the axis is empty, the test
\`modeAxis[0] === 'Dark'\` is false, and the cascade still served
the LIGHT palette. The "production" dark-mode signal was wired to
nothing.
Fix reads the active page root's fill via \`detectThemeFromNode\`, the
same heuristic \`resolveTreeRoles\` uses at its entry point to set
\`ctx.theme\` for role defaults. The page root's fill is what the
model / user actually painted as the page background, regardless
of whether any themes axis was ever populated, so it's the
production-truthful signal.
New \`detectActivePageMode()\` helper reads the doc store + canvas
store's activePageId, finds the first frame on that page, and
runs \`detectThemeFromNode\`. Defaults to 'light' when no page root
exists or it has no fill — same conservative bias as before.
Test updated to seed a dark page-root fill on the live doc store
(replaces the prior \`themes['Mode']\` axis seed which never
matched production state). Also explicitly nulls the themes axis
to confirm the fill-based detection is the active code path.
Codex flagged: the previous cascade (965e143e) gated step-2 mode
selection on a `themeHint` parameter that no caller actually
supplied, so every dark-mode generation with an unseeded
\$color-accent button got served the LIGHT palette hex
(#2563EB blue) instead of the DARK one (#60A5FA light blue).
With identical luminance assumptions in both branches the
contrast pass picked the wrong fg for genuinely dark-mode docs.
`resolveColorMaybeRef` now drops the unused themeHint param and
reads `doc.themes[SEMANTIC_PALETTE_THEME_AXIS]` ('Mode' axis)
directly. First value of that axis === 'Dark' → step-2 uses the
dark palette; otherwise light. The function is self-contained,
no plumbing required at call sites, and the fallback honors
whichever mode the doc currently advertises.
New regression test seeds doc.themes={ Mode: ['Dark', 'Light'] }
with no doc.variables, asserts that an unseeded \$color-accent
button gets the dark-palette accent (#60A5FA, lum ≈ 0.6) and
therefore the dark fg color (#0F172A) on the text child. If
step-2 had stayed locked on Light, the test would expect #FFFFFF
and fail.
Codex flagged the previous fix (1c08ac3f, "skip on unresolved ref")
as too conservative: a sub-agent's button with `\$color-accent` bg
and a text child WITHOUT a fill ended up with no fill at all when
doc.variables hadn't been seeded yet — text falls back to black,
which is invisible on a dark accent button.
Better fix: extend `resolveColorMaybeRef` with a step-2 fallback
into the built-in semantic palette (`getSemanticPaletteHex`). Every
core token (`color-accent`, `color-bg`, `color-text-primary`, etc.)
has a known hex for both `Light` and `Dark` modes, so the cascade
now is:
1. Doc-seeded variables (user's palette).
2. Built-in semantic palette for the requested mode.
3. Original ref string (unresolvable) — caller bails.
In the food-app + GPT-5.5 path, step 1 already worked because
`seedDocVariablesFromStyleGuide` runs before the sub-agents.
The fallback is for paths that bypass seeding (test fixtures,
external MCP callers, mid-flight states), not the common case.
Skip-on-NaN behavior stays — it now only triggers for genuinely
unknown tokens (`\$color-foobar` or similar), where any guess is
worse than leaving the model's existing fill alone.
Tests:
- New: `\$color-accent` ref resolves to #2563EB via semantic palette
even with no doc.variables, contrast pass picks white fg correctly
for the unfilled text child.
- New: `\$color-mystery-token` (not in palette) — step-2 misses,
contrast pass skips, existing text fill survives.
- Replaces the prior "skips on unresolved" test which over-asserted
the conservative path.
Codex flagged: the previous version (ddf6580f) treated a NaN
luminance — what `hexLuminance` returns when the bg color is still
a `\$color-accent` ref because the doc's variables haven't been
seeded — as a dark bg and painted white text. If the user's palette
later resolves \$color-accent to a LIGHT hex (e.g. cream
#FFE4B5), the white text becomes invisible on the light bg. Same
risk in the inverted direction with the original code, which
defaulted to dark text on unknown bg.
Either guess can ship a visually broken button. Skip the contrast
pass entirely when we can't resolve the bg ref to a hex — text /
icon retain whatever fill they already carry, which is at least
visible (the model's default text color, usually dark). A later
post-pass invocation, after variables get seeded, re-runs and
applies contrast cleanly with a real luminance value.
`needsContrastOverride` already returns false when either luminance
is non-finite, so the icon-override branch was already safe; only
the text branch and the (now-removed) NaN→white default needed the
fix.
New regression test in role-resolver.test.ts covers the unseeded-ref
case: explicit dark text fill on an unresolvable accent button
survives untouched.
3 new cases covering the regression fixed in ddf6580f:
- dark icon on dark button → overridden to white (low contrast triggers)
- intentional brand-red icon on white button → preserved (delta > 0.4)
- unfilled icon_font on dark button → still gets contrast fill (no
regression on the original "fill if missing" branch)
User reported icons inside accent-color buttons rendering dark while
text on the same button rendered white — visible on the "Burger" tab,
the "Order now" CTA, and the round filter icon-button in the food-app
generation. Two compounding causes in `fixButtonForegroundContrast`:
1. The luminance check ran on the raw `fill[0].color` even when that
was a `\$color-accent` variable ref. parseInt('\$c…', 16) returns
NaN, NaN<0.5 evaluates to false, the dark-fg branch wins, and the
contrast pass was painting dark-on-orange. Resolve the ref via
the doc store's variables + active theme before luminance.
(NaN luminance now falls through to the white branch — better
than the previous silent dark default when resolution misses.)
2. The icon_font child path skipped any node that "already has a
visible fill". Models reflexively stamp `icon_font.fill` to a
dark text color (the prompt lists `fill` as a property), so on
an accent button the contrast pass left the dark icon next to
the now-white text. Split the icon_font branch off the text
branch: when the icon already has a fill, RESOLVE it and check
contrast against the (resolved) bg; if the luminance delta is
below the WCAG-graphical threshold (0.4), override with the
contrast fg.
Why not unconditionally override icon_font fill: an intentional
brand-color icon on a near-white button (red notification dot, blue
brand mark) has a contrast delta well above 0.4 and survives. Only
the dark-on-dark / light-on-light pairs that motivated the bug get
rewritten.
`text` and `path` branches keep their original behavior — text is
where models intentionally express accent colors, and path stroke
icons get filled by the existing stroke-fallback path.
Two of five food-app placeholder images shipped unfilled because
Openverse returned `[]` for the model's 3-keyword queries:
- "burger combo fries" → 0 results
- "sakura sushi platter" → 0 results
The same queries truncated to the first two words have plenty:
- "burger fries" → 240 results
- "sushi platter" → 240 results
Openverse uses strict AND-search across all keywords, so a 3-word
query that includes any low-frequency or non-matching token
zero-results even when the photos exist. The skill prompt already
nudges models toward "2-3 English keywords" but they often pick three
when the brief mentions a third descriptor (e.g. "Tasty BURGER COMBO
fries" → "burger combo fries").
Endpoint now cascades:
1. Openverse with full query.
2. If `[]` and query has > 2 words: re-query with first 2 words.
3. If still nothing usable: fall through to Wikimedia (existing path)
with the same 2-word retry safety net.
Returning the original empty result was wrong: the placeholder stays
unfilled even though a satisfactory photo for "burger fries" was one
keyword-trim away. The trade-off is losing a small amount of relevance
on the dropped 3rd keyword — but that's better than no photo at all,
and the model still drives the first two keywords which carry the
core subject.
Visible regression in the food-app screenshot: the top-left avatar
(44×44 orange circle, role='button', single child text 'A') rendered
with the 'A' visibly off-center. Live doc inspection showed
`padding: [12, 24]` on the avatar — 24px horizontal padding × 2 = 48
exceeded the 44px frame width, the layout engine clamped to the
negative content area, and the centered text ended up shifted off
the visual center of the orange circle.
Root cause: the default 'button' role rule unconditionally returns
`padding: [12, 24]`, which fits a typical text-button (~44 tall × wide
enough for label + horizontal pad). When the model emits role='button'
on what's actually an avatar / icon-action shape (small square, single
short child), the same default collides with the small fixed width.
Fix: in the default branch (no special parent-role context), inspect
the node's explicit width/height. When BOTH are numeric AND ≤ 60 AND
the default 24px horizontal padding wouldn't fit (`width < 24*2`), skip
the text-button defaults entirely — return only the layout / centering
shape (no padding, no cornerRadius, no height). The 60px ceiling is
the standard icon-button / avatar size band; the 24*2 fit-check keeps
the full text-button defaults active for everything wider than ~48px.
Why no cornerRadius default in this branch: avatar-style frames
typically supply `cornerRadius: width/2` themselves to get a circle.
The 8px text-button default would silently override it.
Note: applyDefaults only fills missing properties, so this only
affects nodes the model didn't explicitly stamp padding on. Models
that DO emit padding stay untouched (correct — the model is the
source of truth when it commits to a value).