mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

feat: agent-callable research command and /search (#615 )

* feat: pre-generation research (Tavily) for grounded generation

Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.

User flow:
  1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
  2. Click the new Research button in the chat composer.
  3. On send, the daemon runs a Tavily search, prepends the findings
     as a <research_context> block ahead of the system prompt, and
     spawns the agent. Research progress shows up as status pills in
     the chat stream; the agent cites sources inline as [1]/[2]/...

Phase 1 surface:
  - Single provider (Tavily), single depth ('shallow'), no LLM
    synthesis pass (Tavily's `answer` is the summary).
  - Composer toggle only; no popover / depth picker yet.
  - Reuses the existing `status` SSE agent payload + StatusPill UI
    so no new event variants or renderer code are needed.

Layers touched:
  - contracts: ResearchOptions / Source / Findings DTOs;
    ChatRequest.research; export from index.
  - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
    + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
    in startChatRun before prompt assembly.
  - web: ChatComposer toggle + ChatSendMeta; threaded through
    ChatPane / ProjectView / streamViaDaemon into ChatRequest.

Side fix (required to land the feature, but useful on its own):
  contracts internal relative imports lacked the `.js` suffix that
  NodeNext module resolution requires. This was already breaking
  `pnpm --filter @open-design/daemon typecheck` on main; without the
  fix, none of the new research types were visible to the daemon.
  All internal contracts imports now carry `.js`.

Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).

Verified:
  - pnpm --filter @open-design/contracts typecheck/test
  - pnpm --filter @open-design/daemon typecheck (the chokidar
    project-watchers test is a pre-existing flake, unrelated)
  - pnpm --filter @open-design/web typecheck
  - node scripts/verify-media-models.mjs

* fix(daemon): clamp Tavily max_results to 20

Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* Remove stale research merge leftovers

* Add agent-callable research search

* Fix Indonesian locale typecheck

* Fix research command invocation edge cases

* Harden slash search prompt expansion

* Honor research source caps in command contract

* Require search reports in design files

* Add research data provider settings

* Wire web research provider fallback order

* Update research provider fallback wording

* Revert "Update research provider fallback wording"

This reverts commit 86fb6001e3.

* Revert "Wire web research provider fallback order"

This reverts commit 4c9e16036b.

* Revert "Add research data provider settings"

This reverts commit 23630d1746.

* Add Dexter and Last30Days research skills

* Add DCF and Last30Days OD skills

* Add Last30Days and Dexter skills

* Resolve research review threads

---------

Co-authored-by: a1chzt <chizblank@gmail.com>

2026-05-08 10:33:44 +08:00

17 KiB

Raw Permalink Blame History

Laws of UX craft rules

Universal cognitive, perceptual, and behavioral heuristics that decide what a UI composes — how many pricing tiers fit on a screen, where a primary action anchors in scanning order, when a progress indicator earns its place, why a settings list needs grouping. The active DESIGN.md decides brand visual language; the existing craft files decide rendering rules (color, typography, motion, states, ARIA, RTL, forms); this file decides composition rules grounded in named research.

Distilled from primary sources: Hick (1952) + Hyman (1953), Miller (1956) for chunking / 7±2 channel capacity, Cowan (2001) for the modern ~4 working-memory bound, Fitts (1954), Wertheimer (1923) for proximity / similarity / Prägnanz, Palmer (1992) for Common Region, Palmer & Rock (1994) for Uniform Connectedness, Kahneman / Fredrickson / Schreiber / Redelmeier (1993) for Peak-End, Zeigarnik (1927), Csíkszentmihályi (1975), Hull (1932), von Restorff (1933), Broadbent (1958), Sweller (1988), Postel (RFC 760, 1980), Carroll & Rosson (1987), Tversky & Kahneman (1974) for Anchoring, Kurosu & Kashimura (1995), Iyengar & Lepper (2000), Toffler (1970), Pareto (c.1906) / Juran (Quality Control Handbook, 1951), Ebbinghaus (1885), Ockham (14th c.), Tesler at Apple (1980s), Nielsen (2000), Norman POET (1988), Parkinson (1955).

Prior art and scope

Existing public catalogs of UX heuristics (Yablonski's lawsofux.com, NN/g's 10 usability heuristics, Material 3 motion + interaction guidance, Apple HIG, Baymard Institute checkout research) inventory the laws but rarely tie each one to a concrete code-gen directive. This file does the translation: every entry ends with one actionable move for an HTML / Tailwind / React-emitting agent. Sibling craft files (accessibility-baseline.md, state-coverage.md, typography.md, anti-ai-slop.md, color.md, animation-discipline.md, form-validation.md) own the auto-checked rules; this file names the underlying law and surfaces the folklore. Out of scope: Weber-Fechner psychophysics and Signal Detection Theory — both apply to UI but the prompt-emission directives are too narrow to earn a slot here. Add later if a skill needs them.

The rules below are guidance, not auto-checked. Reviewers and the agent apply them; the linter does not. Where a law has a sibling rule already auto-checked elsewhere — touch-target floor in accessibility-baseline.md, the 300 ms / 2 s / 10 s / 30 s / 60 s loading thresholds in state-coverage.md, ALL CAPS letter-spacing in typography.md, the indigo / gradient / emoji-icon list in anti-ai-slop.md — the entry below cross-references rather than duplicates.

Perception and visual grouping

Five Gestalt laws plus three attention-and-recognition laws govern how the eye groups elements before the brain reads them.

Law of Proximity (Wertheimer, 1923). Objects near each other read as a group. Cheapest grouping signal — cheaper than borders or shared color. Apply variable vertical rhythm: 8–12 px within a group, 32–48 px between groups. Uniform spacing reads as nothing being grouped.
Law of Similarity (Wertheimer, 1923). Visually similar elements read as a group. Equivalent affordances must share treatment — every list row identical class set, every secondary button identical, every destructive action identical. Visible deviation is reserved for the one item meant to draw attention (the recommended pricing tier, the selected nav item).
Law of Common Region (Palmer, 1992). A shared bounded area binds enclosed elements. Use enclosure when proximity is not enough — and reserve it. Concrete numbers: padding ≥16 px inside the region, distinct surface (border + tinted background, or card chrome at ≥1 px hairline). A page where every section is bordered destroys the signal.
Law of Prägnanz / Good Figure (Wertheimer, 1923). The eye resolves complex layouts into the simplest underlying form. Designs that align with a clear underlying grid (12-column, F-pattern, 4-quadrant) feel inevitable; ornate breaks that add nothing semantic feel arbitrary.
Law of Uniform Connectedness (Palmer & Rock, 1994). The strongest grouping signal in the Gestalt hierarchy: connected lines, shared toolbars, or bracketing containers tie items together more strongly than proximity or similarity. Use for wizard steps, comparison sets, and explicit navigation flows.
Selective Attention (Broadbent, Perception and Communication, 1958). Cognitive bandwidth is finite. Users filter aggressively and ignore anything that looks irrelevant to their goal — banner blindness comes from this. Reserve the strongest visual contrast for the single goal-relevant action; let supporting content recede in weight.
Von Restorff Effect (von Restorff, 1933). The item that differs from a uniform field is the one most likely to be remembered. Make the recommended pricing tier, the active nav item, the warning state visually distinct. Pair contrast with a non-color signal (icon, text label, position) — accessibility-baseline.md rules out color-alone signaling.
Aesthetic-Usability Effect (Kurosu & Kashimura, Hitachi Design Center, 1995). Visual polish biases perceived usability. Refined typography, generous whitespace, and a calm palette earn the benefit of the doubt for minor friction. Never substitutes for measurable usability or for state-coverage.md's required-states rule.

Decision-making

Six laws govern how fast and how well users decide when an interface offers a choice.

Hick's Law (Hick, 1952; Hyman, 1953 replication). Decision time grows roughly log(n+1) with the number of equivalent options. Cap any single decision-screen to 3–5 visible primary options; collapse the rest behind a "More" / progressive disclosure pattern; visually distinguish the recommended choice. Aggressive truncation that hides the path forward is the opposite failure mode — surface the full option set, just don't render every option at the same visual weight.
Choice Overload (Iyengar & Lepper, Journal of Personality and Social Psychology, 2000; framing dates to Toffler, Future Shock, 1970). Too many roughly-equivalent options stall or abandon the decision. Pricing pages: 3–4 tiers, exactly one marked recommended. Product grids: 6–9 hero cards above the fold. Settings panels: ≤5 named groups. Never emit a flat wall of equivalents.
Anchoring (Tversky & Kahneman, Science 185:1124–1131, 1974). The first number a user sees re-weights every subsequent number. Place the recommended pricing tier where it anchors the comparison; render yearly-billing savings as concrete dollar deltas, not just percentage badges; pre-select the safer default in radio groups. Visual weight matches intended decision weight.
Pareto Principle / 80-20 (Pareto, c.1906; Juran, Quality Control Handbook, 1951 — popularized the management-application framing). A small share of features drives most of the value. Identify the 2–3 actions that drive the dominant journey for the target persona; emphasize those visually; demote the long tail to overflow menus, footer surfaces, or settings.
Tesler's Law / Conservation of Complexity (Tesler, Apple, 1980s). Every product has an irreducible amount of complexity. The design choice is where it lives — engineering team, interface, user — not whether to eliminate it. When complexity reaches the user, surface contextual guidance (tooltips, smart defaults, inline empty-state coaching, progressive disclosure) at the exact step where it surfaces. Hiding it is not the same as removing it.
Occam's Razor (Ockham, 14th c.). Among options that explain the data equally well, prefer the one with the fewest assumptions. Specify a minimal element inventory; forbid decorative chrome that doesn't serve a stated user task. The law constrains assumptions, not feature count — a "minimum viable" framing misreads it.

Memory and learning

Five laws cover how working memory handles information density and what the user retains afterward.

Miller's Law and Chunking (Miller, The Magical Number Seven, Plus or Minus Two, Psychological Review, 1956 — channel capacity / 7±2 and chunking; Cowan, Behavioral and Brain Sciences 24:1, 2001 for the modern ~4-item working-memory bound). Working memory holds about four items reliably and up to seven for short-term recall. Each slot can hold a larger familiar unit, constrained by the user's domain knowledge — chunking does not let you pack arbitrary content into a single slot. Often misread as a rule about menu length; Miller's paper is about chunks. Group related fields with clear section headings, dividers, or card containers. A settings page with sections "Account / Notifications / Privacy / Billing / Danger zone" beats one flat list of 30 toggles.
Working Memory (Baddeley & Hitch, 1974; lineage to Atkinson & Shiffrin, 1968). Items decay in seconds without rehearsal. Recognition beats recall: persisting prior context across screens, marking visited elements, and surfacing comparison views beats forcing the user to memorize. On dashboards specifically: sticky filter chips, last-N selections persisted, breadcrumbs that include applied filters.
Serial Position Effect (Ebbinghaus, Über das Gedächtnis, 1885). Recall favors the extremes — primacy at the start, recency at the end — while middle items fade. Anchor the most important nav items at the leftmost and rightmost positions of a horizontal menu; cluster utilities in the middle.
Peak-End Rule (Kahneman, Fredrickson, Schreiber, Redelmeier, Psychological Science, 1993). Memory of an experience is dominated by the emotional peak and the ending, not the average. Stage a high-effort celebratory success state; let intermediate steps stay calm. Mediocre middles matter less than a strong close. The peak belongs at the end of a flow, not as arbitrary mid-flow motion — animation-discipline.md rejects motion that performs (rather than confirms) a state change.
Zeigarnik Effect (Zeigarnik, Über das Behalten erledigter und unerledigter Handlungen, 1927). Uncompleted tasks create cognitive tension that pulls the user back. Visible progress ("3 of 5 steps", greyed-out next sections) converts that tension into completion pressure. Reserve for genuinely beneficial flows like onboarding; applying the same lever to streaks, daily-quest counters, or notification- reduction nags is a dark pattern.

Interaction and motor

Five laws cover how fast and how accurately users can act on the UI.

Fitts's Law (Fitts, Journal of Experimental Psychology, 1954). Time to acquire a target depends on its distance and size — bigger and closer is faster. Spacing between adjacent hit zones matters as much as size. Pair with accessibility-baseline.md's 24 × 24 CSS px AA touch-target floor; on mobile, place high-frequency controls in the natural thumb arc.
Doherty Threshold (Doherty & Thadani, IBM Systems Journal, 1982). Sub-second feedback keeps users in flow; latency above ~1 s breaks attention. The implementable directive lives in state-coverage.md's loading-threshold table (no indicator under 300 ms; skeleton 300 ms – 2 s; labelled spinner 2 – 10 s; determinate bar with cancel 10 – 60 s; stop and offer error/retry past 60 s). This entry exists to name the underlying law and flag its folklore (the 400 ms number doesn't appear in the 1982 paper — see animation-discipline.md for the 400 ms folklore trace).
Flow (Csíkszentmihályi, Beyond Boredom and Anxiety, 1975). Flow sits in the balance between challenge and skill — too hard breeds frustration, too easy breeds boredom. Continuous feedback and a clear sense of control keep the user inside the state. System friction and latency are the fastest ways to break it.
Goal-Gradient Effect (Hull, Psychological Review, 1932; Kivetz, Urminsky, Zheng, 2006 for the punch-card replication). Motivation to finish rises as the goal gets closer. Multi-step flows render with a prominent progress indicator that reflects real endowed progress — show completed prerequisites when they truly exist (saved profile, imported team, prior survey answer). When no real prerequisite exists, render the current step honestly as 1 of N with the empty/current-step state clearly marked. Hull's hypothesis is descriptive; treating it as license for fabricated progress, streak dark patterns, or loyalty-program quota inflation is a misread.
Postel's Law / Robustness Principle (Postel, RFC 760, 1980). "Be liberal in what you accept, conservative in what you send." Take input in whatever shape users naturally give it (phone numbers with or without dashes, dates in mixed formats, percentages with or without %); normalize internally to a canonical form; emit one consistent format on output. The error-timing and ARIA-wiring half lives in form-validation.md; the input-tolerance half is the directive above. RFC 9413 (Thomson, IAB, 2023; built on the earlier draft-iab-protocol-maintenance) retracts the maxim for protocol design citing security surface; the UX-input application stands.

Behavior and expectation

Five laws cover what users predict and how that prediction interacts with the rendered surface.

Jakob's Law (Nielsen, Nielsen Norman Group, 2000). Users spend most of their time on other sites and expect yours to work the same. Reuse category convention — nav placement, cart icon, settings gear, primary CTA in the upper right of a SaaS landing — so the user spends zero cycles relearning interaction grammar. Novelty must earn its keep against the convention's ROI; "innovate everywhere" is the opposite failure mode.
Mental Model (Craik, The Nature of Explanation, 1943; Norman, POET / The Psychology of Everyday Things, 1988). Every user arrives with a prior built from competitor products and the physical world. When the prediction holds, the product feels intuitive; when it breaks, friction shows up as confusion, not curiosity. When the brief names a reference product, anchor explicitly — capture the reference in the prompt and the agent inherits a transferable interaction grammar.
Paradox of the Active User (Carroll & Rosson, in Interfacing Thought: Cognitive Aspects of Human-Computer Interaction, MIT Press, 1987, pp. 80–111). Users skip the manual and start using the software immediately, even when reading it would speed them up. Bake guidance into the surface itself — empty-state coaching, inline tooltips, contextual hints — at the action point.
Parkinson's Law (Parkinson, The Economist, 1955). Work expands to fill the time allotted to it. Loose interfaces let users dawdle; cut friction and pre-fill what you can — autofill, smart defaults, saved state — so a checkout finishes faster than the user expected. Beating anticipated duration becomes the felt win.
Cognitive Load (Sweller, Cognitive Science, 1988). Total mental effort splits into intrinsic (the task's inherent difficulty) and extraneous (poor layout, jargon, inconsistent patterns, visual noise). Designers can't reduce intrinsic load; they own extraneous fully. The visual-restraint directives (single accent in color.md, three-weight typography rhythm in typography.md, P0 anti-default list in anti-ai-slop.md) already constrain extraneous load; this entry exists to name the cognitive cost the sibling rules reduce.

Common mistakes (lint these)

Folklore corrections that don't survive a primary-source check, in the same vein as the busts already documented in animation-discipline.md and accessibility-baseline.md. The first three are attribution corrections (year / venue / institution) the body entries already applied — restated here so a reviewer reading just this section sees them. The remainder are folklore not addressed in the body.

"Anchoring effect = Tversky & Kahneman 1972." Wrong year. The Science paper introducing the anchoring framing is 1974; the 1972 paper is "Subjective probability: A judgment of representativeness" (different work).
"Tesler's Law was developed at Xerox PARC." Tesler left PARC for Apple in 1980; the Conservation-of-Complexity formulation traces to his Apple years.
"Paradox of the Active User was a CACM article." It's a chapter in Carroll's Interfacing Thought (MIT Press, 1987), not CACM.
"Selective Attention = solved by red dots and badges." Repeated attention-grabbers train banner blindness. Reserve the strongest contrast for one goal-relevant action per surface.
"Fitts's Law alone is enough for touch targets." Fitts gives the speed-accuracy tradeoff. WCAG 2.2 SC 2.5.8 sets the AA floor at 24 × 24 CSS px (accessibility-baseline.md); iOS HIG suggests 44 × 44 pt; Material 3 suggests 48 × 48 dp. Fitts plus the platform floor — never just Fitts.

17 KiB Raw Permalink Blame History Unescape Escape