Wait node made fully crash-safe with database persistence

Short waits in n8n workflows can now survive crashes and restarts — previously, waits under 65 seconds ran entirely in memory and were lost on failure. The tradeoff is up to 5 seconds of jitter on resume timing.
The Wait node in n8n workflows now persists all time-based pauses to the database, making them recoverable after crashes or restarts. Previously, waits shorter than 65 seconds ran entirely in memory via setTimeout and were invisible to crash recovery, multi-instance failover, and the internal WaitTracker. If the server went down mid-wait, those executions simply vanished.
This change removes the dual-execution-path behavior. All waits — regardless of duration — now immediately call putToWait and are tracked by the WaitTracker polling system. The poll interval was reduced from 60 seconds to 5 seconds, and the lookahead window is now anchored to the database server's clock rather than the local instance's Date.now(). This eliminates clock skew issues between n8n instances in multi-main setups.
The trade-off is precision: in-memory waits could resume within milliseconds of their target time, while DB-persisted waits now resume within ±5 seconds of the target. For most workflow automation scenarios, this jitter is an acceptable exchange for full crash durability. A clock skew warning is logged if the local instance drifts more than 2 seconds from the database server.
This work sits within a broader initiative to make n8n's execution layer fully durable and cluster-aware.
View Original GitHub Description
Summary
Removes the dual-execution-path behaviour from the Wait node. Previously, waits shorter than 65 seconds ran entirely in-memory via setTimeout and were never persisted to the database. This made them invisible to crash recovery, multi-main failover, and the WaitTracker entirely.
What changed:
- Wait node (
Wait.node.ts): removed the< 65sin-memory branch. All time-based waits now callputToWaitimmediately, regardless of duration. - ExecutionRepository (
execution.repository.ts):getWaitingExecutions()now uses a DB-server-clock-anchored 15-second lookahead window (NOW() + INTERVAL '15 seconds'/datetime('now', '+15 seconds')) viacreateQueryBuilder. AddedgetServerTime()to fetch the DB server's current timestamp (PostgreSQL:CURRENT_TIMESTAMP(3), SQLite:STRFTIME). - WaitTracker (
wait-tracker.ts): poll interval reduced from 60s → 5s.triggerTimeis now computed relative to the DB server clock (via a 60s-TTL cache with elapsed-time interpolation) rather thanDate.now(), eliminating inter-instance clock skew from timer precision. Logs a warning when skew exceeds 2s. - PrometheusMetricsService (
prometheus-metrics.service.ts): addedn8n_db_clock_skew_msgauge, scraped live on each Prometheus pull.
Why: The 65s threshold existed because the old 60s poll interval made DB-persisted short waits resume late. Reducing the poll to 5s and adding a 15s lookahead window eliminates the need for the in-memory path entirely. The trade-off is up to ~5s of jitter on short waits in exchange for full crash/restart durability.
Blast radius: Narrow — only affects time-based Wait node resume and WaitTracker scheduling. No schema changes, no API changes. Safe to revert with a single commit revert; in-flight waiting executions survive revert (they resume ~60s late via the old poll cycle).
How to test:
- Create a workflow: Manual Trigger → Wait (15s) → Set node
- Execute — verify execution enters
waitingstatus in DB immediately (not after 15s) - Verify it resumes at ~15s (±5s acceptable)
- Kill n8n mid-wait, restart — verify it resumes after restart
- Scrape
/metricsand confirmn8n_db_clock_skew_msgauge is present
Related Linear tickets, Github issues, and Community forum posts
<!-- Link to Linear ticket: https://linear.app/n8n/issue/[TICKET-ID] -->Review / Merge checklist
- PR title and summary are descriptive. (conventions)
- Docs updated or follow-up ticket created.
- Tests included.
- PR Labeled with
release/backport(if the PR is an urgent fix that needs to be backported)