Batch payload processes retry object store uploads

matt-aitken

·Apr 7, 2026·#3331fix(batch): retry R2 upload on transient failure in BatchPayloadProcessor

Large batch processes are now insulated from transient network failures, as the system automatically retries object store uploads instead of aborting the stream.

Large batch processes are no longer derailed by momentary network hiccups. Previously, a single "fetch failed" error when communicating with the object store would immediately abort an entire stream of batch items.

The batch payload processor now automatically retries object store uploads up to three times under the hood, using an exponential backoff. If the server completely exhausts its retry attempts, the error response now permits the client SDK to trigger its own fallback recovery sequence. This makes the batch processing pipeline significantly more resilient, ensuring that transient connectivity drops self-heal without interrupting active workloads.

View Original GitHub Description

A single "fetch failed" from the object store was aborting the entire batch stream with no retry. Added p-retry (3 attempts, 500ms-2s backoff) around ploadPacketToObjectStore so transient network errors self-heal server-side instead of propagating to the SDK.