Agent runtime isolate pool eliminates OOM cascades

The agent runtime now manages a pool of V8 isolates instead of reusing a single one — preventing memory failures from cascading across all concurrent requests, while reducing package size by two-thirds.
The agent runtime previously ran on a single V8 isolate shared across all requests. When memory pressure hit that isolate, the entire runtime would fail — and the recovery logic was compounding the problem by re-bundling the library from scratch on each restart.
A new isolate pool replaces the single shared isolate. The pool maintains multiple warm slots (default: two), each with its own compiled library bundle. When a slot's heap exceeds 80% of its memory limit, it's automatically retired and replaced in the background. If an isolate does crash, the pool recovers without touching the library bundle — which is now kept separate from isolate-bound state.
The pool also implements back-pressure. Rather than spawning unlimited isolates when request volume spikes, callers queue up (up to ten deep) waiting for a free slot. Beyond that threshold, new requests fail fast with a clear error rather than exhausting system resources.
The changes live in the @n8n/agents runtime package, part of a broader initiative to stabilize the agent execution environment.
View Original GitHub Description
Summary
Replaces the single shared V8 isolate in AgentSecureRuntime with a
managed pool (AgentIsolatePool) that eliminates OOM cascades, adds
concurrency back-pressure, and fixes the library bundle clearing bug.
What changed
agent-isolate-pool.ts (new)
AgentIsolateSlot— owns oneivm.Isolate+ pre-compiledbundleScript(V8 bytecode). ExposescreateContext()for per-request context creation anddispose()which correctly callsbundleScript.release()before disposing the isolate (fixing the missed cleanup from the old code).AgentIsolatePool— manages N slots (default: 2) with:acquire()— async, queues callers when the pool is empty (up tomaxQueueDepth = 10; rejects withPoolExhaustedErrorbeyond that)release(slot)— gives healthy slots directly to the next waiter or returns them to the pool; discards and replenishes unhealthy / high-heap slotstryAcquireSync()— non-blocking path forexecuteToMessageSync- Proactive recycling when
used_heap_size > 80%ofmemoryLimit(configurable viahighWaterMarkRatio) - Background replenishment with exponential-backoff retry (up to 3 attempts, 500 ms base delay)
- Optional
Loggerinjection for OOM, exhaustion, and replenishment events
agent-secure-runtime.ts (refactored)
- Replaced
this.isolate/this.bundleScript/this.isolateInitPromisewithAgentIsolatePool; pool is lazy-initialized on first call viapoolInitPromise ??= withIsolate(fn)— acquire → run → release wrapper shared by all five async public methods; retries once with a fresh slot on OOMlibraryBundleis never cleared on OOM — it is a plain esbuild string independent of isolate state; previously it was incorrectly grouped with isolate-bound state and forced a full re-bundle on recoveryexecuteToMessageSyncusestryAcquireSync()and throwsPoolExhaustedErrorwhen no slot is available synchronously- added McpClient support
- marked more dependencies as external and reduced package size from ~4mb to 1.34mb
Related Linear tickets, Github issues, and Community forum posts
<!-- Include links to **Linear ticket** or Github issue or Community forum post. Important in order to close *automatically* and provide context to reviewers. https://linear.app/n8n/issue/ --> <!-- Use "closes #<issue-number>", "fixes #<issue-number>", or "resolves #<issue-number>" to automatically close issues when the PR is merged. -->Review / Merge checklist
- PR title and summary are descriptive. (conventions) <!-- **Remember, the title automatically goes into the changelog. Use `(no-changelog)` otherwise.** -->
- Docs updated or follow-up ticket created.
- Tests included. <!-- A bug is not considered fixed, unless a test is added to prevent it from happening again. A feature is not complete without tests. -->
- PR Labeled with
Backport to Beta,Backport to Stable, orBackport to v1(if the PR is an urgent fix that needs to be backported)