Skip to content

ai: contain Kiro ACP stream failures#217

Open
anders-heimer wants to merge 5 commits into
sashiko-dev:mainfrom
anders-heimer:kiro-acp-error-guards
Open

ai: contain Kiro ACP stream failures#217
anders-heimer wants to merge 5 commits into
sashiko-dev:mainfrom
anders-heimer:kiro-acp-error-guards

Conversation

@anders-heimer

@anders-heimer anders-heimer commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Kiro ACP session/prompt is a long-lived stream. When that stream fails,
Sashiko currently retries the same expensive turn without enough
provider-specific containment.

This PR adds Kiro-specific handling for those failures:

  • classify Kiro ACP stream, rate-limit, provider, and permanent errors
  • include bounded, redacted ACP diagnostics
  • block retries after side-effect-looking ACP updates
  • preserve terminal budget failures across review-bin stdio
  • bound Kiro retry loops with per-turn budgets, wall-clock guards,
    idle watchdogs, and a process-local circuit breaker

Patch Layout

  1. ai: improve kiro acp diagnostics

    Capture malformed ACP stdout separately from stderr and surface
    redacted JSON-RPC error.data.

  2. ai: classify transient kiro acp errors

    Classify known Kiro stream failures, throttling, provider availability
    failures, and permanent auth/configuration errors.

  3. ai: block kiro retries after side effects

    Treat retryable-looking failures as fatal once ACP updates suggest a
    possible tool, command, or file mutation.

  4. ai: preserve terminal budget errors over stdio

    Carry fail-fast budget errors through the review-bin AI stdio protocol
    so subprocess boundaries do not turn them into ordinary retryable
    remote errors.

  5. ai: contain kiro acp stream failures

    Add Kiro-only retry budgets, same-error streak caps, turn wall-clock
    limits, ACP idle telemetry, prompt cancellation, request-id extraction,
    and a process-local circuit breaker.

Notes

The only shared runtime plumbing is the review-bin terminal error marker.
It is generic, but included here because Kiro containment needs terminal
provider failures to survive stdio serialization.

This PR does not include follow-up handling for empty successful Kiro
responses, Kiro response parser quirks, or broader CLI diagnostic logging.

@anders-heimer anders-heimer force-pushed the kiro-acp-error-guards branch 2 times, most recently from 1a27100 to 190d02c Compare May 29, 2026 07:25
@anders-heimer anders-heimer changed the title ai: contain Kiro ACP stream failures and empty responses ai: contain Kiro ACP stream failures May 29, 2026
Capture malformed ACP stdout separately from stderr so errors can
report both diagnostic streams without leaking secrets.

Include redacted JSON-RPC error data in Kiro ACP failures to expose
nested provider error details.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Kiro ACP wraps response-stream and provider failures in JSON-RPC
-32603 errors with details in error.data.

Classify known transient, rate-limit, provider, and permanent markers
so the retry layer can back off appropriately. Treat kiro-cli timeouts
as transient typed errors.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Track whether ACP session updates indicate possible tool, command, or
file mutations before an error is returned.

Retryable-looking failures after such updates are kept fatal so the
caller does not replay potentially non-idempotent work.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Carry budget-exceeded terminal markers through the review-bin AI stdio
protocol so true budget failures keep fail-fast behavior after crossing
the process boundary.

This is shared protocol plumbing used by the Kiro ACP containment path.

Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Bound Kiro ACP stream failures with Kiro-only per-turn retry budgets,
a process-local circuit breaker, and ACP-line idle telemetry.

Send session/cancel on prompt failures and keep retries blocked after
side-effect-looking ACP updates.

Rate-limit markers intentionally take precedence over generic stream
failure markers so throttling receives the slower quota backoff instead
of the short stream retry delay.

Signed-off-by: Anders Heimer <anders.heimer@est.tech>
@anders-heimer anders-heimer force-pushed the kiro-acp-error-guards branch from 190d02c to 3a6bd12 Compare May 30, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant