Persistent chat sessions are being unlocked for AI agents

Developers should soon be able to build long-running chat agents and approval loops that persist state and stream data indefinitely, long after individual workflow runs complete.
Previously, stream data and workflow execution states were scoped entirely to a single run. This made multi-turn chat agents or human-in-the-loop approvals difficult to stitch together across different workflow executions.
Developers can now group multiple workflow runs under a single durable Session equipped with a persistent, bidirectional data stream. Client interfaces subscribe to a session once, continuing to receive real-time outputs as new background tasks connect and execute. This completely decouples an agent's lifetime from the underlying task execution boundaries.
This foundation is being laid in the @trigger.dev/core API and server backend.
1{2 "type": "chat.agent",3 "externalId": "user_123"4}
View Original GitHub Description
⚠️ Not released yet. This PR is the server-side foundation only. The SDK changes that customers will actually use (
chat.agentmigration,chat.createStartSessionAction,useTriggerChatTransportupdates) live on a separate branch and ship together in an upcoming@trigger.dev/sdkprerelease. Until that prerelease is published, this surface is reachable only via direct HTTP.
What this gives Trigger.dev users
A new first-class primitive, Session, for durable, task-bound, bidirectional I/O that outlives any single run. Sessions are the run manager for chat.agent going forward, and they unblock anything else that needs "one identifier, many runs over time" with a stable channel pair the client can write to and subscribe to.
Use cases unblocked
- Chat agents that persist across many runs. One session per chat (keyed on your own
chatIdviaexternalId), turns 1..N attach to the same Session, the UI subscribes once and keeps receiving output as new runs take over. - Approval loops and long-running tasks with user feedback. The task waits on
.in, the client writes to.in, the server enforces no-writes-after-close. - Workflow progress streams that live past the run. Subscribe to
.outafter the task finishes to replay history. - Resume-next-day flows. A session is a durable row, not a transient stream. Send a message a day later and the server triggers a fresh run on the same session.
How it works (Session-as-run-manager)
A Session row is task-bound (taskIdentifier + triggerConfig are required) and owns its current run via currentRunId + currentRunVersion for optimistic claim. Three trigger paths:
- Session create —
POST /api/v1/sessionscreates the row and triggers the first run synchronously. - Append-time probe —
POST /realtime/v1/sessions/:session/in/appendchecks if the current run is alive; if it has terminated (idle exit, crash, etc.), the server triggers a new run before processing the append. - End-and-continue handoff —
POST /api/v1/sessions/:session/end-and-continue, called by the running agent, triggers a fresh run and atomically swapscurrentRunId. Used bychat.requestUpgrade()for version handoffs.
Every triggered run is recorded in the SessionRun audit table with a reason (initial, continuation, upgrade, manual).
Public API surface
Control plane
POST /api/v1/sessions— create. Idempotent on(env, externalId). Triggers the first run, returns the session and a session-scoped public access token. Returns 409 if the upserted row is already closed.GET /api/v1/sessions/:session— retrieve by friendlyId (session_abc...) or by your own externalId (server disambiguates by prefix).GET /api/v1/sessions— list with filters (type,tag,taskIdentifier,externalId, derivedstatusACTIVE/CLOSED/EXPIRED, created-at range) and cursor pagination. Backed by ClickHouse.PATCH /api/v1/sessions/:session— update tags / metadata / externalId.POST /api/v1/sessions/:session/close— terminate. Idempotent, hard-blocks new server-brokered writes.POST /api/v1/sessions/:session/end-and-continue— agent-only handoff to a fresh run.
Realtime
PUT /realtime/v1/sessions/:session/:io— initialize a channel. Returns S2 credentials in headers so high-throughput clients can write direct to S2.GET /realtime/v1/sessions/:session/:io— SSE subscribe. Supports Last-Event-ID resume and an opt-inX-Peek-Settled: 1header that fast-closes the stream when the upstream is already settled (trigger:turn-complete), eliminating long-poll wait on reconnect-on-reload paths.POST /realtime/v1/sessions/:session/:io/append— server-side appends.POST /api/v1/runs/:runFriendlyId/session-streams/wait— runs wait on a session stream as a waitpoint, with a race-check to avoid suspending if data already landed.
Auth scopes
sessions is a new resource type. read:sessions:{id}, write:sessions:{id}, admin:sessions:{id} flow through the existing JWT validator. Session-scoped public access tokens minted by the server replace browser-held trigger-task tokens for chat-style flows — the browser never sees a run identifier or a run-scoped token in steady state.
What's coming after this PR
- SDK + chat.agent migration: separate branch, separate PR, ships in the next
@trigger.dev/sdkprerelease alongside this server deploy. Customers using the prereleasechat.agentwill follow the upgrade guide. - Dashboard surfaces: dedicated agent list, agent playground, agent view on the run dashboard. Tracking separately.
Implementation notes
- Postgres
Sessiontable: scalar scoping columns (projectId,runtimeEnvironmentId,environmentType,organizationId) without FKs, matching the January TaskRun FK-removal decision. Point-lookup indexes only — list queries go to ClickHouse. Terminal markers (closedAt,expiresAt) are write-once. - ClickHouse
sessions_v1: ReplacingMergeTree, partitioned by month, ordered by(org_id, project_id, environment_id, created_at, session_id). Tags indexed viatokenbf_v1skip index. SessionsReplicationService: mirrorsRunsReplicationServiceexactly — leader-locked logical replication consumer,ConcurrentFlushScheduler, retry with exponential backoff + jitter, identical metric shape. Dedicated slot + publication so the two consume independently.- S2 keys:
sessions/{addressingKey}/{out|in}. The existingruns/{runId}/{streamId}key format for run-scoped streams is untouched. - Optimistic claim:
ensureRunForSessiontriggers a run upfront (cheap to cancel if it loses the race), then attempts anupdateManykeyed oncurrentRunVersion. Loser cancels its triggered run and reuses the winner's. No DB lock held across the trigger.
What did NOT change
Run-scoped streams.pipe / streams.input and the existing /realtime/v1/streams/{runId}/... routes are unchanged. Sessions are net-new — not a reshaping of the current streams API.
Deploy notes
- Set
SESSION_REPLICATION_CLICKHOUSE_URLandSESSION_REPLICATION_ENABLED=1to enable the replication consumer. - The
Sessiontable needsREPLICA IDENTITY FULLset on the prod source DB before the publication is created (same one-time DDL we did forTaskRun). Required for delete events to carry full column values. - Cross-form authorization on the
GET /api/v1/sessions/:sessionloader (a JWT minted for either form authorizes both URL forms). Action routes are URL-form-specific, matching how the SDK mints PATs.
Verification
- Webapp typecheck clean (10/10).
apps/webapp/test/sessionsReplicationService.test.ts— round-trip tests for insert/update/delete through Postgres logical replication into ClickHouse via testcontainers.- Live end-to-end against local dev: create + retrieve (both forms) + update + close,
.out.initialize+.out.appendx2 +.in.send+.out.subscribeover SSE, list with all filter combinations + pagination,end-and-continueswap,X-Peek-Settledfast-close (verified in browser via reconnect-on-reload and via curl). Replicated row lands in ClickHouse within ~1s. - Multi-round Devin + CodeRabbit review feedback addressed (read-after-write paths use
prismawriter, info-leak on auth-routes masked as 403, peek-settled discriminator parsing fix, etc.).
Test plan
-
pnpm run typecheck --filter webapp -
pnpm run test --filter webapp ./test/sessionsReplicationService.test.ts --run - Start the webapp with
SESSION_REPLICATION_CLICKHOUSE_URLandSESSION_REPLICATION_ENABLED=1. Confirm the slot and publication auto-create on boot. -
POST /api/v1/sessionsand verify the row replicates totrigger_dev.sessions_v1within a couple of seconds. -
POST /api/v1/sessions/:id/close, then confirmPOST /realtime/v1/sessions/:id/out/appendreturns 400. - Reuse a closed session's
externalIdonPOST /api/v1/sessionsand confirm 409. -
GET /realtime/v1/sessions/:id/outwithX-Peek-Settled: 1after a turn completes and confirmX-Session-Settled: trueresponse header + immediate close.