Tool-result overflow handling now retries recoverable sessions

Takhoffman

·Apr 6, 2026·#61651Align tool-result truncation more closely with PI

Sessions with oversized tool results can now recover without failing—the system estimates prompt size before sending to the model, then applies the cheapest available fix (truncation, compaction, or both) and retries.

When a tool produces output that exceeds the model's context window, OpenClaw must decide how to recover. The old approach stored raw tool results and repeatedly rewrote them during live context processing—tool results could be truncated, compacted, and rewritten multiple times as the model requested more tools. This created unpredictable behavior and made overflow errors harder to recover from.

This PR restructures tool-result overflow handling around a simple principle: truncate once, store once, and decide the recovery strategy upfront. Tool results are now bounded at a central ingress point before they reach session history. A precheck before each provider call estimates whether the assembled prompt will fit within the context budget. If it doesn't, the system classifies the bottleneck—tool results too large, older history too large, or both—and routes to the cheapest path that resolves it. Truncation targets recent tool-result tails first, keeping real content visible rather than replacing outputs with placeholder markers. Only when precheck-based recovery doesn't help does the system fall back to provider-reported overflow handling.

The practical effect is that sessions previously marked as unrecoverable can now retry with smaller context. Context overflow no longer means the session is dead. The change also unifies truncation notices to a consistent PI-style format so agents can parse them reliably.

This work is part of a larger effort to align OpenClaw's runtime behavior with PI's model, while keeping OpenClaw's central safety backstops and session repair fallbacks intact.

View Original GitHub Description

Summary

align OpenClaw’s tool-result handling more closely with PI’s model: truncate tool results once, preserve stored history, and stop rewriting older tool results in live context
switch live and persisted truncation wording to PI-style [... N more characters truncated]
restore a reserve-based precheck before provider submission and make it cause-aware: when the prompt estimate is too large, choose the cheapest local recovery path first
keep a readable persisted-session repair fallback after compaction failure, including aggregate medium tool-result tails, without collapsing tool results to placeholder-only markers
keep a central ingress backstop for tool results because OpenClaw still needs uniform protection across tools and integration surfaces

Original behavior: OpenClaw before this PR

Tool executes
  |
  v
Raw tool result is stored in session/history
  |
  v
Next model call starts
  |
  v
transformContext guard runs
  |
  +--> If one tool result is too large:
  |       truncate that result for live context
  |
  +--> If total context is still too large:
  |       rewrite older tool results in-place
  |       trim them again under aggregate pressure
  |       and sometimes replace them with compaction markers
  |
  +--> If that still is not enough:
          trigger overflow / session compaction
          compact older history
          retry prompt

PI behavior (For reference)

Tool executes
  |
  v
Tool applies its own output limit
  |
  v
Bounded tool result is stored in session/history
  |
  v
Next model call starts
  |
  v
Build prompt from session history
  |
  +--> If prompt estimate fits within contextWindow - reserveTokens:
  |       send prompt unchanged
  |
  +--> If prompt estimate exceeds contextWindow - reserveTokens:
          compact older history into a summary
          keep recent messages and tool results together
          rebuild prompt from compacted history
          retry prompt

New behavior: OpenClaw after this PR

Tool executes
  |
  v
Central guard applies one-time truncation if needed
  (so every tool result has one uniform ingress cap before it reaches history)
  |
  v
Bounded tool result is stored in session/history
  |
  v
Next model call starts
  |
  v
Build prompt from session history
  |
  v
Reserve-based precheck estimates assembled prompt
against contextTokenBudget - reserveTokens
  |
  +--> If estimate fits:
  |       send prompt unchanged
  |
  +--> If estimate exceeds budget:
          classify likely pressure source
          |
          +--> compact_only:
          |       trigger existing session compaction early
          |       compact older history
          |       rebuild prompt from compacted session
          |       retry
          |
          +--> truncate_tool_results_only:
          |       readably truncate persisted recent tool-result tails
          |       skip provider call for this attempt
          |       retry with updated session history
          |
          +--> compact_then_truncate:
                  trigger existing session compaction early
                  compact older history
                  then readably truncate persisted recent tool-result tails
                  rebuild prompt from updated session history
                  retry
  |
  v
If a later provider attempt still reports overflow:
existing overflow handling remains the final fallback path
  |
  +--> try explicit overflow compaction
  +--> if still needed, try readable persisted-session tool-result truncation
  +--> otherwise return context overflow error

Aggregate tool-result fallback

Aggregate overflow means no single tool result is above the per-result limit, but the recent tool-result tail is still too large as a group.

Example:

tool result A is 18k chars
tool result B is 17k chars
tool result C is 16k chars
each one is individually valid, so the single-result guard leaves them alone
if that recent tail still explains the overflow, the new precheck can truncate it early without hitting the provider
if the estimate is mixed, OpenClaw compacts older history first and then truncates the recent tool-result tail before retry
if a later provider attempt still overflows, the existing overflow path remains as the final safety net

Important behavior:

this is readable truncation, not placeholder replacement
each rewritten tool result keeps real content plus the truncation suffix
mixed tails are handled in one persisted-session rewrite pass: if one tool result is individually oversized and the remaining medium tail is still over budget, OpenClaw truncates the oversized entry first and then still applies aggregate trimming to the remaining medium tail
the old live [compacted: ...] style behavior is not reintroduced
normal one-time truncation still keeps a readable floor, but persisted-session recovery truncation can now shrink all the way down to suffix-only when needed

How this differs from PI

PI mostly relies on tool-local output limits, while OpenClaw still keeps a central ingress backstop for tool results
the precheck is now PI-like in shape, but it still plugs into OpenClaw’s existing context engine and overflow handling rather than replacing that stack wholesale
OpenClaw also keeps a readable persisted-session repair fallback after compaction failure; PI does not appear to have an equivalent session rewrite path
this PR removes the OpenClaw-only live historical rewriting path while keeping recovery paths readable and bounded

Why not switch fully to PI in this PR

replacing OpenClaw’s full overflow/compaction behavior with PI’s exact implementation would still be a much larger behavioral change than this PR’s truncation and precheck cleanup
that would touch more runtime boundaries at once and make it harder to isolate regressions if context handling changes in production
this PR takes the lower-risk step: remove the custom live rewrite layer we do not want, restore a PI-like reserve-based precheck, add cause-aware local routing, keep the central ingress safety backstop, and preserve a readable session repair fallback where OpenClaw still needs one

Verification

pnpm exec vitest run src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts src/agents/pi-embedded-runner/tool-result-truncation.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts --config vitest.config.ts --reporter verbose
pnpm exec oxlint src/agents/pi-embedded-runner/run/preemptive-compaction.ts src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts src/agents/pi-embedded-runner/tool-result-truncation.ts src/agents/pi-embedded-runner/tool-result-truncation.test.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/types.ts src/agents/pi-embedded-runner/run.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts

Notes

the normal commit hook previously hit unrelated pre-existing repo type errors in src/auto-reply/reply/dispatch-from-config.reply-dispatch.test.ts and src/cli/update-cli/update-command.ts; this PR’s code changes were validated with the focused commands above