Commit graph

9 commits

Author SHA1 Message Date
Dan Porat
3395d2c855
feat(daemon): implement fal.ai renderer for image + video generation (#1606)
* feat(daemon): implement fal.ai renderer for image + video generation

Adds renderFalImage and renderFalVideo backed by the fal queue API
(queue.fal.run). Any fal-ai/* model path can be used directly without
a catalog entry, enabling the full fal model library without code
changes. Catalogued shortcuts are mapped via FAL_ENDPOINTS to their
fal-ai/* paths; OD_FAL_MAX_POLL_MS controls the poll ceiling.

Expands the fal model catalog with flux-pro-ultra, flux-dev-fal,
flux-schnell-fal, ideogram-v3-fal, recraft-v3-fal (images) and
veo-3-fal, veo-2-fal, wan-2.1-t2v, wan-2.1-i2v, seedance-1-pro-fal,
kling-2.1-t2v-fal (video). Marks fal provider as integrated: true in
both daemon and web model registries.

* fix(daemon): address fal renderer review comments

- Correct Wan 2.1 endpoints: wan-video/v2.1/* → fal-ai/wan-t2v / fal-ai/wan-i2v
- Correct Kling 2.1 t2v endpoint: .../pro/... → .../master/text-to-video
- Add FAL_IMAGE_USES_ASPECT_RATIO: flux-pro-ultra sends aspect_ratio not image_size
- Add FAL_VIDEO_NO_DURATION: Wan models reject the duration field
- Add FAL_VIDEO_STRING_DURATION: Veo expects duration as "5s" not 5
- Fix falQueueBase() to use anchored regex replace, avoiding mangled
  custom base URLs
- Do not wrap payload under input — raw fal queue HTTP API expects flat
  body; the input wrapper is an SDK abstraction only (confirmed by 422
  validation error from fal showing prompt missing at body.prompt)

* fix(daemon): correct fal queue protocol comment (flat body, no SDK input wrapper)

* fix(daemon): clamp Veo duration to valid fal buckets (4s/6s/8s)

* fix(daemon): report effective fal Veo duration in providerNote (with snap warning)

* fix(daemon): reduce image generation latency from 4m37s to ~73s

Five layered fixes targeting the overhead that padded a ~10s fal API call
into a 4m37s user-facing wait:

1. Skip DISCOVERY_AND_PHILOSOPHY for media surfaces (image/video/audio).
   The ~3000-token HTML-artifact discovery layer is irrelevant for media
   generation and forced the agent to parse and override all its rules
   before dispatching. Removes it from the system prompt entirely for
   these surfaces; MEDIA_GENERATION_CONTRACT is the sole authority.

2. Broaden the wait-loop contract to cover ALL slow models, not just
   "Volcengine i2v / hyperframes-html". Any model whose generation
   exceeds 25s — including fal flux-pro-ultra, Veo, Sora — returns exit 2
   from od media generate. The contract now makes this universal and
   provides a python3-based bash pattern (jq is not guaranteed to be
   installed on all agent runtimes).

3. Increase od media wait polling budget from 25s to 120s. od media
   generate keeps its 25s budget for fast feedback; od media wait is
   purpose-built to sit and poll, so it can safely use the full 2-minute
   bash-tool window. Reduces re-entries for a 3-minute generation from
   ~7 to ~2.

4. First fal poll is now immediate instead of always sleeping 3s before
   the first status check. Saves 3s for all fal jobs.

5. Project metadata no longer emits "(unknown — ask)" for imageModel and
   aspectRatio when unset. Emits the actual defaults (gpt-image-2,
   aspect-ratio scene heuristic) so the agent can dispatch without
   extended reasoning about model selection. Also adds dispatch-immediately
   defaults and a brief-reply rule (2–3 sentences max after generation).

Measured end-to-end on the exact problem prompt before/after:
  Before: 4m37s (discovery form + 7x LLM re-entries + jq failure)
  After:  ~73s   (single bash loop, no question turn, image delivered)

* feat(daemon): inject media dispatch hint for non-media project surfaces

Agents running inside prototype, deck, and other non-image/video/audio
projects previously had no knowledge of `od media generate`, so when
asked to create an image with fal they would try to call provider REST
APIs directly and ask the user for API keys — even though the daemon
already holds credentials in .od/media-config.json.

Add MEDIA_DISPATCH_HINT to composeSystemPrompt for all non-media
surfaces. The hint tells the agent to always route media generation
through the daemon dispatcher, and explicitly forbids prompting for
API keys.

Verified end-to-end: a prototype project generates a 952 KB image via
flux-pro-ultra in ~52s with no key errors.

* fix(daemon): prevent agent from converting bash env vars to PowerShell syntax

MEDIA_DISPATCH_HINT now explicitly labels the shell as POSIX bash and
shows the correct $VAR form side-by-side with a warning NOT to use
PowerShell $env:VAR. Without this, claude-sonnet running on a Windows
host converts the example to PowerShell syntax (`& $env:OD_NODE_BIN`)
which then fails at the bash executor with 'syntax error near unexpected
token &'.

* fix(daemon): add generate→wait loop to MEDIA_DISPATCH_HINT for slow models

MEDIA_DISPATCH_HINT previously showed only a bare  call.
flux-pro-ultra and other slow models always exit 2 after ~25s — without
the wait loop the agent would treat exit 2 as a failure and report an
error to the user.

Replace the single-command example with the canonical generate→wait loop
(matching media-contract.ts), add an explicit note that exit 2 means
'keep polling', and reinforce the POSIX bash / no-PowerShell rule
directly inside the code block.

* fix(daemon): allow fal-ai/* passthrough in media-agent contract

The media-agent prompt instructed the agent to warn and substitute the
default model for any ID not in the catalogue. This blocked the custom
fal-ai/* passthrough path the daemon already supports, so users could
not reach uncatalogued fal models from the normal chat flow.

Carve out the fal-ai/* exception so the agent passes those IDs through
directly instead of warning or substituting.

* fix(daemon): align MEDIA_DISPATCH_HINT with exit-0 generate contract

media generate now always exits 0 (handoff included). The non-media
agent hint still checked ec==2 to decide whether to keep polling, so
slow fal models (flux-pro-ultra, veo-3-fal) would stop after printing
the handoff JSON instead of entering the wait loop.

- generate error check: drop the ec!=2 exception (exits 0 always)
- while loop: drive on taskId presence, not ec==2; stop on ec==0/5
- footer: remove --surface inference claim; CLI requires it explicitly

* fix(guard): add test-fal-webui.ts to e2e scripts allowlist

CI failed: guard flagged e2e/scripts/test-fal-webui.ts as an
unapproved package-owned entrypoint. Add it to allowedE2eScripts.

* fix(daemon): update prompt test expectations to match exit-0 handoff wording

The two stale assertions checked for the old generate-exits-2 copy which
no longer exists in the contract. Update them to match the current
always-exits-0 wording.

* fix(daemon): move skipDiscoveryBrief override before discovery block

* chore(e2e): remove ad-hoc fal webui test script

The script was a one-time developer helper used to manually validate fal
image generation through the live UI. It relied on a real fal API key and
hardcoded local port, so it cannot participate in the e2e package's
fixture/reporting/CI conventions. Removing it per reviewer feedback.

- Delete e2e/scripts/test-fal-webui.ts
- Remove its guard.ts allowlist entry
- Gitignore the file and its screenshots to prevent accidental re-addition

* chore: remove accidental local scratch files from branch

Remove bash.exe.stackdump (MSYS crash dump) and fix_loop.py (one-off
local rewrite helper) — neither is a repo-owned source artifact.

* fix(prompts): document fal-ai/* passthrough in non-media dispatch hint

Prototype/deck agents now know arbitrary fal-ai/* model ids are valid
--model values and should be forwarded as-is, mirroring the exception
already present in media-contract.ts. Adds a prompt regression test.

* fix(daemon): use renderMediaGenerationContract(mediaExecution) for media surfaces

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-31 04:44:44 +00:00
Denis Redozubov
c847ace554
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy

* feat(daemon): enforce run media execution policy

* test(daemon): cover media execution policy gates
2026-05-28 09:19:40 +00:00
Eli
18b947c25f
[codex] Land design system GitHub intake handoff (#2187)
* Add Claude-style design system workflow

* Merge design system workflow into main

* Restore design system workflow UI styles

* Fix design system setup scrolling

* Fix design system setup connector button

* Preserve connector auth link after popup block

* Simplify connected GitHub setup state

* Open generated design system workspace project

* Summarize design system auto prompt in chat

* Add bounded GitHub connector design intake

* Prefer path-scoped GitHub intake tools

* Restore branch GitHub design context intake

* Restore design system review workspace

* Restore design system manager tab

* Let design system workflow routes own details

* Open editable design systems as projects

* Restore design system workspace coverage

* Fix bounded GitHub connector intake

* Hide design system review while generating

* Suppress design system generation questions

* Constrain GitHub design intake to bounded command

* Tolerate oversized GitHub metadata during intake

* Rebuild daemon CLI when sources change

* Fallback when GitHub connector snapshots are rate limited

* Allow GitHub intake without Composio

* Use native GitHub auth for design intake

* Remove design system review group heading

* Improve design system extraction evidence

* Align design system scaffold with Claude output

* Add evidence inventory for design system intake

* Add local design system evidence intake

* Add design system package audit gate

* Allow auditing Claude Design reference packages

* Audit design system package content quality

* Migrate legacy design system artifacts

* Clean migrated design system artifacts

* Require modular design system UI kits

* Reject thin design system UI kits

* Prioritize core design evidence intake

* Require role-based design system UI kits

* Clean stale design system manifest references

* Require representative preserved design assets

* Warn on generic design system visuals

* Enforce design system quality warnings

* Audit connected design system UI kits

* Require mounted design system UI kits

* Require composed design system app shells

* Require runnable JSX design system kits

* Require browser globals for design system components

* Infer design system names from source URLs

* Require source examples in design system packages

* Bind preserved fonts in design system tokens

* Require skill frontmatter in design system packages

* Preserve build icons in design system packages

* Require real assets in brand previews

* Require substantive source examples

* Require product overview in design system README

* Require reusable UI kit README

* Require reusable design system skill docs

* Seed Claude-style UI kit entry contract

* Preserve runtime build assets in design packages

* Audit design system packages after generation

* Audit design system first-run output

* Audit source-backed preview cards

* Align design system UI kit scaffolds

* Materialize design evidence package artifacts

* Show project chat during design system setup

* Hand off design system setup to project chat

* Auto-repair design system audit failures

* Harden design system evidence preservation

* Tighten design system package guidance

* Add targeted design system repair guidance

* Bound design system audit auto repair

* Use connector statuses in design system setup

* Audit design system preview manifests

* Require README preview manifests for design systems

* Fix design system GitHub intake handoff

* Fix daemon prompt CI assertions
2026-05-19 14:30:17 +08:00
Yuhao Chen
0e61313347
fix(prompts): stabilize discovery brand answers (#1861) 2026-05-16 15:50:52 +08:00
Quang Do
3d0e708720
fix(daemon): treat media generate handoff as success (#1715) 2026-05-15 14:11:40 +08:00
Marc Chan
055e55abd8
Add batch design system testing (#1515)
* feat: add batch design system testing

* fix: use daemon default agent for batch tests

* fix: honor batch project prompt flags

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix: persist batch run output

* fix: honor dry-run before daemon resolution

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix: persist batch assistant run ids

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix: cancel timed-out batch runs

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-14 14:19:32 +08:00
kami
4f76e836ae
feat(audio): add ElevenLabs audio support (#1384)
* docs: add ElevenLabs audio support design

* docs: add ElevenLabs audio implementation plan

* feat(daemon): add ElevenLabs speech renderer

* feat(daemon): add ElevenLabs sound effects renderer

* fix(daemon): preserve ElevenLabs sfx durations

* feat(web): expose ElevenLabs media providers

* feat(daemon): document ElevenLabs audio contract

* feat(audio): add ElevenLabs voice selection

* chore: ignore superpowers scratch docs

* fix(daemon): cache ElevenLabs voice options

* fix(audio): expand ElevenLabs voice and SFX selection

* fix(audio): align ElevenLabs SFX controls

* fix(audio): tighten ElevenLabs SFX prompt budget

* fix(audio): preflight ElevenLabs SFX prompt length

* fix(audio): surface ElevenLabs lookup failures

* fix(audio): sanitize ElevenLabs prompt errors
2026-05-13 15:53:41 +08:00
Deheng Huang
09b78c2f9b
feat(daemon): let Codex image projects use built-in imagegen (#622)
* feat(daemon): let Codex image projects avoid API-key setup

Codex has a built-in image generation path available inside the agent runtime, while the generic media dispatcher still routes gpt-image models through the daemon OpenAI provider. Pass the active agent id into prompt composition so Codex-only gpt-image projects can use built-in imagegen first without changing non-Codex media behavior.

Constraint: Existing media contract remains the default path for non-Codex agents and explicit provider fallback
Rejected: Add a nested daemon Codex media provider | heavier auth, streaming, timeout, cancellation, and output parsing surface for this parity fix
Confidence: high
Scope-risk: narrow
Directive: Keep this override after the media contract so it can intentionally supersede dispatcher-only wording for Codex gpt-image projects
Tested: pnpm --dir apps/daemon exec vitest run -c vitest.config.ts tests/system-prompt-template.test.ts
Tested: pnpm --filter @open-design/daemon typecheck
Tested: pnpm guard
Tested: pnpm typecheck
Not-tested: Live Codex image generation inside the Open Design UI

* fix(daemon): harden Codex imagegen prompt routing

PR review found the Codex override could be superseded by the web-supplied media contract, trusted unvalidated image model metadata, and assumed generated image paths outside the workspace were readable.

This keeps the override daemon-owned, appends it last in the live prompt, validates against registered gpt-image model IDs, allowlists only Codex's generated_images folder, and tightens copy-failure instructions.

Constraint: The web contracts composer still emits the generic media contract without agent identity.

Rejected: Mirror Codex-specific prompt logic into contracts/web | duplicates daemon model registry and still leaves final ordering fragile.

Confidence: high

Scope-risk: narrow

Directive: Keep Codex imagegen override appended after client systemPrompt so it remains the final media instruction for Codex gpt-image projects.

Tested: pnpm --dir apps/daemon exec vitest run -c vitest.config.ts tests/system-prompt-template.test.ts tests/agents.test.ts tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon typecheck

Tested: pnpm guard

Tested: pnpm typecheck

Not-tested: Live Codex image generation inside the Open Design UI

* fix(daemon): keep Codex add-dir writable scope narrow

PR review found Codex --add-dir grants writable workspace access, so passing skill, design-system, and linked reference directories through the same chat allowlist broke their documented read-only boundary.

This routes chat extra directories by active agent: Codex receives only the validated generated_images output directory needed for built-in imagegen, while non-Codex adapters keep the existing resource and linked-directory read access behavior.

Constraint: Codex CLI treats --add-dir as writable sandbox expansion.

Constraint: The daemon still stages active skill files into the project cwd as Codex's read-safe path.

Rejected: Keep one shared extraAllowedDirs list for all agents | grants Codex write access to read-only resources.

Confidence: high

Scope-risk: narrow

Directive: Do not add read-only resource/reference directories to Codex --add-dir unless Codex gains a read-only allowlist flag.

Tested: git diff --check -- apps/daemon/src/server.ts apps/daemon/tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon exec vitest run tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon typecheck

Tested: pnpm guard

Tested: pnpm typecheck

Not-tested: Live Codex image generation inside the Open Design UI

* fix(daemon): validate Codex imagegen add-dir grants

PR review found the generated_images grant still trusted symlinked paths and rendered the Codex override before proving the sandbox grant would be present.

This validates the generated_images directory before prompt assembly, rejects final-component symlinks and protected-root canonical escapes, passes Codex the canonical grant path, and only appends the Codex imagegen override when that same path is in extraAllowedDirs.

Constraint: Codex --add-dir grants writable workspace access, so path aliases into read-only resource roots must be rejected.

Rejected: Keep returning the nominal CODEX_HOME path after validation | leaves Codex operating through a symlink alias instead of the audited grant target.

Confidence: high

Scope-risk: narrow

Directive: Keep Codex imagegen prompt rendering downstream of generated_images validation and grant resolution.

Tested: git diff --check -- apps/daemon/src/server.ts apps/daemon/tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/agents.test.ts tests/chat-route.test.ts

Tested: pnpm --filter @open-design/daemon typecheck

Tested: pnpm guard

Tested: pnpm typecheck

Not-tested: Live Codex image generation inside the Open Design UI
2026-05-06 18:28:16 +08:00
Tom Huang
513bd4edea
feat(web): pick prompt templates (not design systems) for image/video projects (#192)
* feat(web): pick prompt templates (not design systems) for image/video projects

The New Project panel now shows the curated prompt-template gallery in the
image and video tabs instead of the Design System picker. Design systems
only make sense for prototypes, slide decks, templates, and the freeform
"other" canvas — they don't map onto image/video generation.

The picked template's body is editable in-line ("optimize" affordance) and
the (possibly tuned) prompt is snapshotted into ProjectMetadata as
`promptTemplate`. The system-prompt composer surfaces it on every turn as
a stylistic + structural reference for the agent — same shape we already
use for the saved-template flow.

Also rename the right-side gallery tabs from "Image prompts" / "Video
prompts" to "Image templates" / "Video templates" across all locales.

Made-with: Cursor

* fix(prompt-templates): address review — escape fences, error UI, empty hint, tests

Apply the actionable feedback from the code review:

- Security: escape triple-backticks in `metadata.promptTemplate.prompt`
  before interpolating it into the system prompt's fenced block. A user
  who pasted ``` into the editable template body could otherwise close
  the fence and inject free-form instructions for the agent. Apply the
  same fix in both the contracts composer and the daemon mirror.
- UX: surface fetchPromptTemplate failures via an inline error banner
  outside the popover, with a one-click retry button bound to the last
  failed pick. Previously the error toast lived inside the popover and
  vanished as soon as the popover closed.
- UX: show a subtle hint below the prompt textarea when the body is
  empty, so users who clear it on purpose understand the agent will
  have no template reference.
- Defense in depth: gate the `**referenceTemplate**:` metadata bullet
  on a non-empty prompt body in both composers, so a stale title can
  never appear in the system prompt without the body it claims to
  reference.
- Tests: add tests/system-prompt-template.test.ts covering the happy
  path (image + video), the backtick escape, the truncation cap, the
  empty-prompt skip, the non-media kinds, and the missing-source path.
- i18n: add `promptTemplates.retry` and `newproj.promptTemplateBodyEmpty`
  across all 9 locales; backfill the prompt-template picker keys for
  ja, es-ES, de which landed on main after this branch.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-01 23:31:31 +08:00