* feat: add Ollama Cloud to KNOWN_PROVIDERS as OpenAI-compatible BYOK provider
* feat: add ollama.com to isOpenAICompatible base URL detection
* feat: add Ollama Cloud models to SUGGESTED_MODELS_BY_PROTOCOL fallback list
* fix: use full Ollama Cloud model list from /api/tags, drop -cloud suffix
* feat: add Ollama Cloud as native protocol with NDJSON streaming and connection test support
* fix: remove ollama.com from OpenAI compatibility check
* feat: add token overrides for Ollama Cloud models to prevent truncation
* fix: extend inferApiProtocol and legacy migration to recognize ollama.com base URLs
* fix: normalize Ollama Cloud base URL by stripping /api suffix during migration and in daemon
---------
Co-authored-by: herediaron <aronheredi346@gmail.com>
* fix(web,daemon): make max_tokens configurable (closes#29)
BYOK users on custom Anthropic-compatible providers (e.g. Xiaomi MiMo)
hit the hardcoded 8192 cap and saw artifacts truncated mid-stream.
- AppConfig.maxTokens with Settings input (EN/CN + 8 other locales)
- ProxyStreamRequest.maxTokens contract field
- anthropic, anthropic-compatible, and openai-compatible providers all
forward cfg.maxTokens
- /api/proxy/anthropic/stream and /api/proxy/stream payloads honor it,
defaulting to 8192 when unset so prior clients are unaffected
Original sketch by @mashu in #78 (50a9d14); rebased to the apps/web
layout and extended to the proxy paths actually used when baseUrl is
set, which is where #29's user actually traffics.
* feat(web): per-model max_tokens defaults
Adds a hand-maintained MODEL_MAX_TOKENS table (Claude 4.5 line → 64k,
mimo-v2.5-pro → 32k) and an effectiveMaxTokens helper layered over the
override field added in 6a3ae5f, so #29's user — and others on supported
models — don't have to discover Settings to avoid mid-stream truncation.
- apps/web/src/state/maxTokens.ts: lookup + helpers
- providers/{anthropic,anthropic-compatible,openai-compatible}.ts:
forward effectiveMaxTokens(cfg) instead of cfg.maxTokens ?? 8192
- SettingsDialog: input becomes an optional override (blank = default,
shown as placeholder)
- 10 locale hint strings updated to the new semantics
* feat(web): vendor LiteLLM model metadata for max_tokens defaults
Replaces the 4-entry hand-rolled MODEL_MAX_TOKENS map from 544e67e with
a vendored slice of BerriAI/litellm's model_prices_and_context_window
JSON (1970 chat models, ~97KB raw / ~25KB gzip). Future model launches
land in maxTokens.ts via `pnpm sync-litellm-models` instead of manual
edits.
- scripts/sync-litellm-models.ts: fetches the upstream JSON, filters to
chat-mode entries, projects each entry to its max_output_tokens (or
max_tokens fallback), and writes a sorted, license-attributed JSON
- apps/web/src/state/litellm-models.json: generated artifact, committed
- apps/web/src/state/maxTokens.ts: lookup is now
OVERRIDES → LITELLM_MODELS → FALLBACK_MAX_TOKENS. The OVERRIDES table
shrinks to just `mimo-v2.5-pro` (LiteLLM only ships MiMo via
OpenRouter/Novita aliases, not the canonical id Xiaomi's API uses).
LiteLLM is MIT-licensed (BerriAI/litellm/blob/main/LICENSE); attribution
is preserved in both the script header and the generated JSON's
_license field.
* test(web,docs): cover maxTokens lookup + document sync workflow
- apps/web/src/state/maxTokens.test.ts: six vitest cases pinning the
three-tier lookup (override → LiteLLM → fallback) and the
effectiveMaxTokens user-override path. Guards against a future sync
silently dropping the Anthropic 4.5 entries we rely on.
- CONTRIBUTING.md / CONTRIBUTING.zh-CN.md: new "Updating model
max_tokens metadata" section pointing future maintainers at
scripts/sync-litellm-models.ts and explaining when OVERRIDES is
appropriate (it's the rare exception, not the default).
* fix(web): mark Max tokens label as optional in 10 locales
The Settings field is optional (blank means "use the per-model default")
but the label gave no visual cue, breaking the implicit pattern that
every other API-mode field (key/model/baseUrl) is required. Append
"(optional)" — using the locale's natural parenthetical convention
(Chinese full-width brackets, Japanese 任意, Russian опционально, etc.)
— so the field reads as discretionary at a glance.
* fix(web): validate maxTokens override against advertised UI bounds
Addresses Siri-Ray's review on commit 0d98185. The Settings input
declares min={1024}/max={200000}/step={1024}, but until now
effectiveMaxTokens trusted any defined cfg.maxTokens, so a stale or
hand-edited localStorage value (negative, zero, fractional, billions)
would pass straight to the Anthropic SDK on the direct path while the
daemon proxy quietly clamped it back to 8192 on the proxied path —
same config, divergent behavior depending on route.
- maxTokens.ts: add MIN_MAX_TOKENS / MAX_MAX_TOKENS exports and
isValidOverride helper. effectiveMaxTokens only honors the override
when it is a finite integer in [1024, 200000]; otherwise falls back
to modelMaxTokensDefault.
- SettingsDialog.tsx: input bounds now reference the same constants so
the UI promise can't drift from the runtime check.
- maxTokens.test.ts: six new cases pinning the rejection of negative,
zero, sub-MIN, super-MAX, non-integer (fractional / NaN / Infinity)
overrides plus the inclusive MIN/MAX boundaries.
The daemon proxy's existing `> 0` fallback stays as defense-in-depth.