Merged
Size
M
Change Breakdown
Bug Fix85%
Maintenance15%
#61565Gateway: bound websocket shutdown close

Websocket close timeout prevents gateway hangs

Gateway restarts should now complete cleanly instead of hanging until the 25-second watchdog timeout. Websocket shutdown now has bounded timeouts, with force-termination of stalled clients after a grace period.

When the gateway server restarted on production systems, the process would hang indefinitely waiting for websocket connections to close cleanly. The shutdown handler called wss.close(), which would never resolve unless the server tracked clients and those clients cooperated with close callbacks. With no tracked clients or uncooperative clients, the gateway was stuck until the watchdog killed the process 25 seconds later.

Websocket shutdown now has two bounded windows. The close operation gets a one-second grace period to complete normally. If it exceeds that window, all tracked clients are force-terminated. A final 250-millisecond window handles any lingering close callbacks before the shutdown proceeds regardless.

This fix lives in the gateway server shutdown path, which means every gateway restart and deployment will now complete within a predictable timeframe instead of relying on the watchdog as a safety net.

View Original GitHub Description

Summary

  • bound gateway websocket shutdown so restart/stop can continue even if wss.close() never resolves
  • terminate tracked websocket clients after a short grace window and continue after a final force window
  • add regressions for both lingering-client and zero-tracked-client hangs

Why

On jpclawhq, gateway restarts were hanging in shutdown until the 25s watchdog killed the process. Root cause was createGatewayCloseHandler waiting forever on wss.close() unless the server both tracked clients and cooperated with close callbacks.

Testing

  • bunx vitest run src/gateway/server-close.test.ts --pool forks --maxWorkers 1
  • pnpm build in the clean worktree hit an unrelated existing @anthropic-ai/sdk unresolved-import failure from src/agents/anthropic-transport-stream.ts
  • verified the same logic live on jpclawhq: restart now completes cleanly instead of timing out in websocket shutdown

AI

  • AI-assisted
  • fully tested on the touched shutdown path; broader maintainer gates still pending
© 2026 · via Gitpulse