BlueBubbles catchup now skips wedged messages after 10 failed retries
A new retry ceiling stops catchup replay from getting stuck on malformed messages forever, while fixes squashed three silent bugs that caused duplicate replies on gateway restarts.
The catchup replay mechanism for BlueBubbles iMessage handling had a wedge problem: if any message payload failed processing repeatedly, the cursor would stall at that message's timestamp and never advance. Gateway restarts hit the same failure, forever.
The fix adds a configurable retry ceiling. Messages that fail more than 10 consecutive times (default, clamped 1-1000) are marked "given up" and skipped on sight. The cursor advances past them, and catchup resumes normal processing. Operators see a WARN log when a message crosses the threshold.
While testing this fix, three additional latent bugs surfaced. A re-entrant file lock in the persistent deduplication layer was letting concurrent callers read stale data and silently overwrite each other's writes—producing duplicate replies after restart. A file naming change between beta versions left the dedupe store empty on upgrade, replaying every recently-handled message. And balloon events (URL previews, stickers) were bypassing the debouncer during catchup replay since they have different GUIDs than their parent text messages, again generating duplicate replies.
All four issues are resolved. Live testing confirmed clean behavior across stop/restart cycles with zero replayed messages.
View Original GitHub Description
Summary
What started as a retry cap for #66870 uncovered and fixed two latent bugs in the catchup/dedupe plumbing from #66857 and #66230 that would have caused duplicate replies on every gateway restart for any user with catchup enabled.
1. Per-message retry cap (#66870)
- Adds
catchup.maxFailureRetries(default 10, clamped[1, 1000]) so a persistently-failing message no longer wedges the catchup cursor forever. - Persists per-GUID failure counts in the cursor file.
count >= maxmarks the GUID as "given up": catchup skips it on sight without anotherprocessMessageattempt, and the cursor advances past it. - Correctly handles the mixed case — an earlier still-retrying GUID plus a later given-up GUID: cursor holds below the still-retrying message while the given-up one is skipped.
- Emits a distinct WARN on give-up transitions for operator visibility.
2. Lost-update race in persistent dedupe (found during live testing)
- Root cause: the re-entrant file lock in
file-lock.tsgave concurrent callers for the same file immediate access instead of serializing them. TwocheckAndRecordInnercalls (inbound user message + outbound agent reply) would both read the same stale file, then the last writer silently overwrote the first writer's additions. The in-memory cache masked this within a process lifetime, but after restart the lost GUID caused catchup to replay already-handled messages — producing duplicate replies. - Fix: added an in-process write queue per file path in
persistent-dedupe.tsso read-modify-write cycles targeting the same dedupe file are serialized. The file lock continues to guard cross-process contention.
3. Dedupe file naming migration gap (found during live testing)
- The dedupe file naming changed from
${safe}.jsonto${safe}__${hash}.jsonbetween beta iterations. Upgrading started with an empty dedupe file and replayed the entire catchup window, producing duplicate replies for every recently-handled message. - Fix: one-time migration in
inbound-dedupe.tsthat renames the legacy file on first access. Also added awarmupBlueBubblesInboundDedupecall in catchup before the fetch so the migration and memory warmup run eagerly, not only whenprocessMessagehappens to be called.
4. Balloon events bypassing debouncer (found during live testing)
- The live webhook path coalesces text + URL-preview balloon events via the debouncer. Catchup processes each query result individually. A URL balloon has a different GUID from its parent text message and no
balloonBundleIdin the query API response, so catchup replayed it as a standalone message — producing a duplicate reply. - Fix: catchup now skips messages with
associatedMessageGuidset (tapbacks, reactions, balloons). Threaded replies usethreadOriginatorGuidinstead and are unaffected.
Fixes #66870.
Live testing
Dogfooded on a live BlueBubbles install with real iMessage traffic across multiple stop/restart cycles:
-
openclaw doctor— clean after upgrade from 2026.4.14 to beta+retry-cap - Live iMessage send → single reply, no duplicate
- Stop gateway → restart → catchup runs →
replayed=0(dedupe correctly recognizes the live-handled message) - Verified dedupe file contains both inbound and outbound GUIDs after the write queue fix (previously only the outbound survived the race)
- Verified legacy
default.jsonrenamed todefault__37a8eec1ce19.jsonon first startup after migration fix - Verified
replayed=0 fetched=0on a clean bounce with no intervening messages (cursor fully caught up, no stale leftovers) - Verified balloon filter:
associatedMessageGuidmessages are tapbacks/reactions only (checked 200 messages), threaded replies usethreadOriginatorGuidand are not filtered
Automated tests
-
pnpm test extensions/bluebubbles/— 425 passed -
pnpm tsgo— green -
pnpm check— 0 warnings, 0 errors -
pnpm config:docs:check/pnpm plugin-sdk:api:check— baselines match - 14 new tests for retry cap (counter increment, give-up transition, skip-on-sight, stickiness, mixed earlier/later failures, counter clear on success, legacy cursor compat, stale entry pruning, config clamping, sanitization)
- Existing 22 catchup tests + 5 dedupe persistence tests pass unchanged
🤖 Generated with Claude Code