Server-side memory leak eliminated in SSE helper
Long-running webapp processes are protected from memory bloat. A persistent server-side leak tied to aborted server-sent event connections was eliminated by removing composite abort signals.
Under sustained traffic, aborted Server-Sent Event (SSE) connections were quietly piling up in memory. A bug in Node 20 caused composite abort signals to pin their entire request and response graphs indefinitely, swelling the heap every time a client closed a tab or a connection timed out.
To fix this, the composite signals are replaced with manual timeout management and a single-signal abort chain in the webapp's SSE utilities. Additional cleanup in the server entry point ensures that successful HTML renders no longer pin the React tree in memory for 30 seconds per request. Long-running webapp processes now remain stable under heavy streaming churn, dropping retained memory from a steady linear climb down to zero application-code leaks.
View Original GitHub Description
Summary
Fixes a server-side memory leak in the webapp's SSE helper. Every aborted SSE connection (client tab close, navigation, timeout) was pinning its full request/response graph indefinitely on Node 20, so any long-running webapp process accumulated retained memory proportional to streaming-request churn.
Root cause
apps/webapp/app/utils/sse.ts combined four abort signals via AbortSignal.any([requestAbortSignal, timeoutSignal, internalController.signal]). The composite signal tracks its source signals in an internal Set<WeakRef> registered against a FinalizationRegistry; under sustained traffic those entries accumulate faster than they're cleaned up, pinning every source signal (and its listeners, and anything those listeners close over) until the parent signal itself is GC'd or aborts.
This is a long-standing Node issue with multiple open reports:
- nodejs/node#54614 — original report, still open. A follow-up from ChainSafe describes the exact same shape in a Lodestar production workload (req + timeout signals composed per request accumulating in long-running worker) and the same mitigation: drop
AbortSignal.any, compose manually. - nodejs/node#55351 — mechanism confirmed by Node member @jasnell: "the set of dependent signals known to the AbortSignal are kept in an internal Set using WeakRefs. The AbortSignals are being properly gc'd but the Set is never cleaned out of the WeakRefs making those leak." Partially fixed by PR #55354, shipped in Node 22.12.0 — but only covers the tight-loop case, not long-lived parent signals.
- nodejs/node#57584 — circular-dependency variant, still open.
- nodejs/node#62363 — regression in Node 24/25 from an unrelated V8 change ("Don't pretenure WeakCells"). Different root cause, same symptom.
A separate issue in apps/webapp/app/entry.server.tsx — setTimeout(abort, ABORT_DELAY) with no clearTimeout on success paths — kept the React render tree + remixContext alive for 30s per successful HTML request. Same pattern fixed upstream in React Router templates (react-router#14200), never backported to Remix v2.
What changed
apps/webapp/app/utils/sse.ts— single-signal abort chain.AbortSignal.anyremoved;AbortSignal.timeoutreplaced by a plainsetTimeoutcleared when the controller aborts; named sentinel constants used as stackless abort reasons; request-abort handler explicitly removed on cleanup.apps/webapp/app/entry.server.tsx— clears thesetTimeout(abort, ABORT_DELAY)timer inonShellReady/onAllReady/onShellError.apps/webapp/app/v3/tracer.server.ts+env.server.ts— gates OpenTelemetryHttpInstrumentationandExpressInstrumentationbehindDISABLE_HTTP_INSTRUMENTATION=trueas an escape hatch for future OTel-listener retention patterns. Defaults to enabled.apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts— uses the sharedABORT_REASON_SEND_ERRORsentinel.
Verification
Full-app reproduction (memlab)
Isolated local harness, 500 abrupt SSE disconnects against a dev-presence route, GC between passes, heap snapshot diff with memlab:
| Run | Heap delta after 500 conns + GC | memlab retained leaks |
|---|---|---|
| Before | +16.0 MB (linear with request count) | 158 clusters; 250 ServerResponse, 1000 AbortController, 250 SpanImpl retained |
| After | +3.3 MB (noise) | 0 app-code leaks |
Standalone mechanism isolation
To confirm which axis of the change is load-bearing, a separate standalone Node script (/tmp/abort-leak-test.mjs) ran 2000 requests × 200 KB payload per variant:
| Variant | Heap delta after GC |
|---|---|
| baseline (no signal machinery) | 0 MB |
V1: AbortSignal.any + string abort reason | +9.1 MB |
V2: AbortSignal.any only (no reason) | +10.8 MB |
V3: string reason only (no AbortSignal.any) | 0 MB |
| V4: neither (the fix) | 0 MB |
V5: AbortSignal.any with no listener on the composite | +10.2 MB |
This proves AbortSignal.any is the sole mechanism. The reason type (.abort() vs .abort("string")) is irrelevant for retention — V3 is clean, V5 leaks even without a listener on the composite.
Risk
sse.tsis used by the dev-presence routes. Behaviour is equivalent — timeouts and client disconnects still abort the stream.signal.reasonis now a named string sentinel ("timeout","request_aborted", etc.) instead of the previous string arg or defaultAbortError. No in-tree reader ofsignal.reasonexists.entry.server.tsxchange is a standard cleanup of an abort timer, matches upstream React Router guidance.tracer.server.tschange is env-gated and defaults to current behaviour.- Three other webapp
AbortSignal.timeout()callsites (alert delivery, remote-build status) are fire-and-forget passed directly tofetch— not composed with anything long-lived, no retention risk, untouched.
Test plan
- Existing SSE integration tests pass
- Dev-presence SSE behaves normally across tab open/close cycles
- No heap growth under sustained aborted-connection traffic (heap snapshot diff)
Follow-up
The same AbortSignal.any([userSignal, internalSignal]) pattern exists in several SDK/core callsites that ship to customers (packages/core/src/v3/realtimeStreams/manager.ts, packages/trigger-sdk/src/v3/{ai,chat,chat-client,sessions}.ts, packages/core/src/v3/workers/warmStartClient.ts). Whether those leak in practice depends on the user passing a long-lived signal. Tracked separately.