mirror of
https://github.com/nexu-io/open-design.git
synced 2026-05-31 19:04:39 +07:00
* fix(chat): surface OpenCode provider failures from its log on a silent stall OpenCode's headless `run --format json` mode swallows provider failures: a 429 usage-limit is marked retryable and retried silently with nothing on stdout/stderr, so the chat run only dies via the inactivity watchdog and the daemon shows a bare "request timed out" with no reason. The real error (statusCode + "Monthly usage limit reached…") is recorded only in OpenCode's own session log. On a failed OpenCode close where stdout/stderr carry no signal, read the newest OpenCode session log, extract the latest `service=llm` provider error (scoped to that one line so the embedded request body can't contaminate the classification), and emit a structured, retryable SSE error (RATE_LIMITED / AGENT_AUTH_REQUIRED / UPSTREAM_UNAVAILABLE) carrying the provider's message. Refs #982. * fix(chat): emit recovered OpenCode failure from the watchdog path, bound to the run Addresses review on #3316. Blocking: the recovery previously ran only in the child-close handler, but in the inactivity-watchdog stall path (the exact case this targets) failForInactivity sends its error and finish()es the run — which clears run.clients — before the child closes. So the structured error reached zero live SSE clients and only surfaced on reload. Recover and send the OpenCode failure inside failForInactivity, before finish(), on the same pre-teardown send path the generic stall message already uses. Keep the close-handler branch for the case where OpenCode exits non-zero on its own (clients still attached). Non-blocking: bind the log lookup to the current run via an mtime gate (since=run.createdAt) so a stale or concurrent session's error can't be misattributed — skip log files last written before the run started. * docs(opencode-log): note the concurrent-run limitation of the mtime gate * fix(chat): skip close-handler failure emit when the watchdog already finished the run Non-blocking review follow-up on #3316: on the silent-stall path both failForInactivity and the child-close handler fired for the same run, so the recovered RATE_LIMITED error was sent twice and the events-log stream was reopened after finish() had closed it. Guard the close-handler failure emit with !design.runs.isTerminal(run.status) — the watchdog already sent the error and finalized the run; finalization below still runs (finish() no-ops once terminal). |
||
|---|---|---|
| .. | ||
| daemon | ||
| desktop | ||
| landing-page | ||
| packaged | ||
| telemetry-worker | ||
| web | ||
| AGENTS.md | ||