Commit graph

1 commit

Author SHA1 Message Date
lefarcen
c2b3d737f2
fix: make max_tokens configurable (closes #29) (#78)
* fix(web,daemon): make max_tokens configurable (closes #29)

BYOK users on custom Anthropic-compatible providers (e.g. Xiaomi MiMo)
hit the hardcoded 8192 cap and saw artifacts truncated mid-stream.

- AppConfig.maxTokens with Settings input (EN/CN + 8 other locales)
- ProxyStreamRequest.maxTokens contract field
- anthropic, anthropic-compatible, and openai-compatible providers all
  forward cfg.maxTokens
- /api/proxy/anthropic/stream and /api/proxy/stream payloads honor it,
  defaulting to 8192 when unset so prior clients are unaffected

Original sketch by @mashu in #78 (50a9d14); rebased to the apps/web
layout and extended to the proxy paths actually used when baseUrl is
set, which is where #29's user actually traffics.

* feat(web): per-model max_tokens defaults

Adds a hand-maintained MODEL_MAX_TOKENS table (Claude 4.5 line → 64k,
mimo-v2.5-pro → 32k) and an effectiveMaxTokens helper layered over the
override field added in 6a3ae5f, so #29's user — and others on supported
models — don't have to discover Settings to avoid mid-stream truncation.

- apps/web/src/state/maxTokens.ts: lookup + helpers
- providers/{anthropic,anthropic-compatible,openai-compatible}.ts:
  forward effectiveMaxTokens(cfg) instead of cfg.maxTokens ?? 8192
- SettingsDialog: input becomes an optional override (blank = default,
  shown as placeholder)
- 10 locale hint strings updated to the new semantics

* feat(web): vendor LiteLLM model metadata for max_tokens defaults

Replaces the 4-entry hand-rolled MODEL_MAX_TOKENS map from 544e67e with
a vendored slice of BerriAI/litellm's model_prices_and_context_window
JSON (1970 chat models, ~97KB raw / ~25KB gzip). Future model launches
land in maxTokens.ts via `pnpm sync-litellm-models` instead of manual
edits.

- scripts/sync-litellm-models.ts: fetches the upstream JSON, filters to
  chat-mode entries, projects each entry to its max_output_tokens (or
  max_tokens fallback), and writes a sorted, license-attributed JSON
- apps/web/src/state/litellm-models.json: generated artifact, committed
- apps/web/src/state/maxTokens.ts: lookup is now
  OVERRIDES → LITELLM_MODELS → FALLBACK_MAX_TOKENS. The OVERRIDES table
  shrinks to just `mimo-v2.5-pro` (LiteLLM only ships MiMo via
  OpenRouter/Novita aliases, not the canonical id Xiaomi's API uses).

LiteLLM is MIT-licensed (BerriAI/litellm/blob/main/LICENSE); attribution
is preserved in both the script header and the generated JSON's
_license field.

* test(web,docs): cover maxTokens lookup + document sync workflow

- apps/web/src/state/maxTokens.test.ts: six vitest cases pinning the
  three-tier lookup (override → LiteLLM → fallback) and the
  effectiveMaxTokens user-override path. Guards against a future sync
  silently dropping the Anthropic 4.5 entries we rely on.
- CONTRIBUTING.md / CONTRIBUTING.zh-CN.md: new "Updating model
  max_tokens metadata" section pointing future maintainers at
  scripts/sync-litellm-models.ts and explaining when OVERRIDES is
  appropriate (it's the rare exception, not the default).

* fix(web): mark Max tokens label as optional in 10 locales

The Settings field is optional (blank means "use the per-model default")
but the label gave no visual cue, breaking the implicit pattern that
every other API-mode field (key/model/baseUrl) is required. Append
"(optional)" — using the locale's natural parenthetical convention
(Chinese full-width brackets, Japanese 任意, Russian опционально, etc.)
— so the field reads as discretionary at a glance.

* fix(web): validate maxTokens override against advertised UI bounds

Addresses Siri-Ray's review on commit 0d98185. The Settings input
declares min={1024}/max={200000}/step={1024}, but until now
effectiveMaxTokens trusted any defined cfg.maxTokens, so a stale or
hand-edited localStorage value (negative, zero, fractional, billions)
would pass straight to the Anthropic SDK on the direct path while the
daemon proxy quietly clamped it back to 8192 on the proxied path —
same config, divergent behavior depending on route.

- maxTokens.ts: add MIN_MAX_TOKENS / MAX_MAX_TOKENS exports and
  isValidOverride helper. effectiveMaxTokens only honors the override
  when it is a finite integer in [1024, 200000]; otherwise falls back
  to modelMaxTokensDefault.
- SettingsDialog.tsx: input bounds now reference the same constants so
  the UI promise can't drift from the runtime check.
- maxTokens.test.ts: six new cases pinning the rejection of negative,
  zero, sub-MIN, super-MAX, non-integer (fractional / NaN / Infinity)
  overrides plus the inclusive MIN/MAX boundaries.

The daemon proxy's existing `> 0` fallback stays as defense-in-depth.
2026-05-02 13:52:54 +08:00