Heartbeat session keys stabilized
Isolated heartbeat sessions no longer accumulate unbounded :heartbeat suffixes on each tick—session keys stay stable and stale orphans are cleaned up automatically.
When heartbeat sessions run with isolated mode enabled, each tick was appending another :heartbeat suffix to the session key instead of reusing a stable key. This produced keys like agent:main:main:heartbeat:heartbeat:heartbeat that grew without bound, leaving orphaned entries in the session store and making debugging difficult.
A feedback loop between heartbeat ticks and wake-triggered re-entry caused the accumulation. When an exec event fires during a heartbeat tick, it calls requestHeartbeatNow() with the current session key—already suffixed with :heartbeat. The next tick would append another :heartbeat, and the cycle repeated.
The fix introduces a new resolveIsolatedHeartbeatSessionKey() function that strips trailing :heartbeat suffixes before re-applying exactly one. The function uses a three-branch approach: a stored heartbeatIsolatedBaseSessionKey marker takes priority, a suffix-strip applies when the configured base does not itself end with :heartbeat, and a fallback passes through without modification.
Stale orphaned sessions are now cleaned up automatically when keys converge. When a accumulated key like base:heartbeat:heartbeat is repaired to base:heartbeat, the old entry is removed from the session store and its transcript is archived rather than left dangling.
In the heartbeat runner, existing sessions with accumulated suffixes will converge to the stable key on their next tick. Legacy sessions created before the marker was introduced also converge correctly.
View Original GitHub DescriptionFact Check
Summary
- Problem: When
isolatedSession: trueis set in heartbeat config, each wake-triggered heartbeat tick appends another:heartbeatsuffix to the session key, producing keys likeagent:main:main:heartbeat:heartbeat:heartbeat:...that grow without bound. - Why it matters: Session keys grow indefinitely, session store accumulates orphaned entries, and debugging/session management becomes difficult. Eventually hits string length limits.
- What changed: Strip all trailing
:heartbeatsuffixes before re-appending inresolveIsolatedHeartbeatSessionKey()(regex fix + three-branch resolution logic). Added 9 regression tests covering all key-stability scenarios. - What did NOT change (scope boundary): No changes to
resolveHeartbeatSession(), wake dispatch, or session key parsing. The fix is scoped to the single append site. - Note: 535 additions, of which 383 are regression tests. Production change is ~56 lines in
heartbeat-runner.ts.
Change Type (select all)
- Bug fix
- Feature
- Refactor required for the fix
- Docs
- Security hardening
- Chore/infra
Scope (select all touched areas)
- Gateway / orchestration
- Skills / tool execution
- Auth / tokens
- Memory / storage
- Integrations
- API / contracts
- UI / DX
- CI/CD / infra
Linked Issue/PR
- Closes #59493
- This PR fixes a bug or regression
Root Cause / Regression History (if applicable)
- Root cause: Feedback loop between heartbeat runner and wake requests. During an isolated heartbeat run, exec/subagent completions call
requestHeartbeatNow()with the current session key (already suffixed with:heartbeat). The wake handler passes this key back intorunHeartbeatOnce(), which appends:heartbeatagain atheartbeat-runner.ts:590without stripping the existing suffix. - Missing detection / guardrail: No idempotency guard at the
:heartbeatappend site.resolveHeartbeatSession()acceptsforcedSessionKeywithout normalizing away existing heartbeat suffixes. - Prior context: The isolated session feature was added to reduce token cost by giving each heartbeat tick a fresh transcript. The wake-request path (
requestHeartbeatNow→wakeHandler→run()→runOnce()) passessessionKeythrough without sanitization. - Why this regressed now: The interaction between
isolatedSession: trueand wake-triggered re-entry (exec events during heartbeat) was not covered by the original isolated session tests.
Regression Test Plan (if applicable)
- Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
- Target test or file:
src/infra/heartbeat-runner.isolated-key-stability.test.ts - Scenarios the tests lock in (9 total):
- Wake path with already-suffixed key → key stays
<base>:heartbeat(no double suffix) - Clean base key → appends
:heartbeatexactly once - Deeply accumulated key (e.g.
base:heartbeat:heartbeat:heartbeat) → converges to<base>:heartbeatin one call - Configured base key that legitimately ends with
:heartbeat(e.g.alerts:heartbeat) → getsalerts:heartbeat:heartbeat, not stripped - Wake re-entry for case 4 → key stays stable at
alerts:heartbeat:heartbeat - Forced real
:heartbeatsession (not from isolatedSession config) → gets:heartbeatappended - Wake re-entry for case 6 → stable
- Task-based heartbeat skip (no tasks due) → no isolated session created
- Legacy isolated key without stored marker (created before
heartbeatIsolatedBaseSessionKeywas introduced) → converges to canonical<base>:heartbeatwithout producing<base>:heartbeat:heartbeat
- Wake path with already-suffixed key → key stays
- Why this is the smallest reliable guardrail: Tests exercise
runHeartbeatOnce()directly with pre-suffixed session keys, simulating the exact wake-request feedback loop without needing a full gateway or real exec events.
User-visible / Behavior Changes
- Isolated heartbeat sessions now use a stable key
<base>:heartbeatacross all ticks instead of accumulating suffixes. - Existing sessions with accumulated suffixes will converge to the stable key on next heartbeat tick.
- Legacy sessions created before the
heartbeatIsolatedBaseSessionKeymarker was introduced also converge correctly.
Diagram (if applicable)
Before (feedback loop):
tick 1: base → base:heartbeat (agent runs, exec event fires)
wake: requestHeartbeatNow(key="base:heartbeat")
tick 2: base:heartbeat → base:heartbeat:heartbeat
wake: requestHeartbeatNow(key="base:heartbeat:heartbeat")
tick 3: base:heartbeat:heartbeat → base:heartbeat:heartbeat:heartbeat
... (unbounded growth)
After (stable):
tick 1: base → base:heartbeat (agent runs, exec event fires)
wake: requestHeartbeatNow(key="base:heartbeat")
tick 2: strip trailing :heartbeat(s) → "base", append → base:heartbeat
wake: requestHeartbeatNow(key="base:heartbeat")
tick 3: strip trailing :heartbeat(s) → "base", append → base:heartbeat
... (stable)
Security Impact (required)
- New permissions/capabilities?
No - Secrets/tokens handling changed?
No - New/changed network calls?
No - Command/tool execution surface changed?
No - Data access scope changed?
No
Repro + Verification
Environment
- OS: Linux 6.8.0-106-generic (x64), Ubuntu (as reported in issue)
- Runtime/container: Node 22+
- Model/provider: Any (issue is model-agnostic)
- Integration/channel: Any channel with heartbeat
- Relevant config:
agents.defaults.heartbeat.isolatedSession: true
Steps
- Configure agent heartbeat with
isolatedSession: trueandevery: "5m" - Restart gateway
- Wait for 2+ heartbeat ticks (especially with exec tool activity)
- Run
sessions_listand observe session keys
Expected
- All heartbeat sessions use stable key:
agent:<agentId>:main:heartbeat
Actual
- Session keys accumulate
:heartbeatsuffixes on each wake-triggered tick
Evidence
- Failing test/log before + passing after
Test output (9 tests, all passing):
✓ does not accumulate :heartbeat suffix when wake passes an already-suffixed key
✓ appends :heartbeat exactly once from a clean base key
✓ stays stable even with multiply-accumulated suffixes
✓ keeps isolated keys distinct when the configured base key already ends with :heartbeat
✓ stays stable for wake re-entry when the configured base key already ends with :heartbeat
✓ keeps a forced real :heartbeat session distinct from the heartbeat-isolated sibling
✓ stays stable when a forced real :heartbeat session re-enters through its isolated sibling
✓ does not create an isolated session when task-based heartbeat skips for no-tasks-due
✓ converges a legacy isolated key that lacks the stored marker (single :heartbeat suffix)
Full heartbeat test suite: all passing.
Human Verification (required)
- Verified scenarios: Ran all 9 new regression tests + full heartbeat suite,
pnpm check(lint/typecheck/format all green) - Edge cases checked:
- Deeply accumulated keys (
base:heartbeat:heartbeat:heartbeat) correctly converge in one call - Clean keys still get
:heartbeatappended exactly once - Configured base keys legitimately ending with
:heartbeat(e.g.alerts:heartbeat) are not incorrectly stripped — they getalerts:heartbeat:heartbeatas expected and stay stable - Legacy isolated sessions lacking the stored
heartbeatIsolatedBaseSessionKeymarker converge correctly without double-suffix regression - Task-based skip (no tasks due) leaves no isolated session in the store
- Non-isolated heartbeats are unaffected
- Deeply accumulated keys (
- What I did not verify: Live gateway with real exec events triggering wake requests (no access to a running gateway with
isolatedSession: true)
Review Conversations
- I replied to or resolved every bot review conversation I addressed in this PR.
- I left unresolved only the conversations that still need reviewer or maintainer judgment.
Compatibility / Migration
- Backward compatible?
Yes - Config/env changes?
No - Migration needed?
No— existing accumulated keys auto-converge on next tick; legacy sessions without stored marker also converge
Risks and Mitigations
- Risk: Regex
(:heartbeat)+$could theoretically match a user-chosen session key that legitimately ends with:heartbeat.- Mitigation: The strip only runs inside the
useIsolatedSessionbranch, which already knows the key will get:heartbeatappended. The resolution logic has three branches: (1) storedheartbeatIsolatedBaseSessionKeymarker takes priority; (2) suffix-strip applies only when the configured base does NOT itself end with:heartbeat; (3) fallback passes through. This meansalerts:heartbeatas a configured session name correctly producesalerts:heartbeat:heartbeat(branch 3 fallback) rather than being erroneously stripped — confirmed by tests 4 and 5.
- Mitigation: The strip only runs inside the