Persistent chat sessions are being unlocked for AI agents

ericallam

·Apr 28, 2026·#3417feat: Sessions - bidirectional durable agent streams

Developers should soon be able to build long-running chat agents and approval loops that persist state and stream data indefinitely, long after individual workflow runs complete.

Here's the LatestUpdated as code changes

Previously, stream data and workflow execution states were scoped entirely to a single run. This made multi-turn chat agents or human-in-the-loop approvals difficult to stitch together across different workflow executions.

Developers can now group multiple workflow runs under a single durable Session equipped with a persistent, bidirectional data stream. Client interfaces subscribe to a session once, continuing to receive real-time outputs as new background tasks connect and execute. This completely decouples an agent's lifetime from the underlying task execution boundaries.

This foundation is being laid in the @trigger.dev/core API and server backend.

json

1{
2  "type": "chat.agent",
3  "externalId": "user_123"
4}

This analysis will evolve. Full story with review threads and final assessment available after merge.

View Original GitHub Description

⚠️ Not released yet. This PR is the server-side foundation only. The SDK changes that customers will actually use (chat.agent migration, chat.createStartSessionAction, useTriggerChatTransport updates) live on a separate branch and ship together in an upcoming @trigger.dev/sdk prerelease. Until that prerelease is published, this surface is reachable only via direct HTTP.

What this gives Trigger.dev users

A new first-class primitive, Session, for durable, task-bound, bidirectional I/O that outlives any single run. Sessions are the run manager for chat.agent going forward, and they unblock anything else that needs "one identifier, many runs over time" with a stable channel pair the client can write to and subscribe to.

Use cases unblocked

Chat agents that persist across many runs. One session per chat (keyed on your own chatId via externalId), turns 1..N attach to the same Session, the UI subscribes once and keeps receiving output as new runs take over.
Approval loops and long-running tasks with user feedback. The task waits on .in, the client writes to .in, the server enforces no-writes-after-close.
Workflow progress streams that live past the run. Subscribe to .out after the task finishes to replay history.
Resume-next-day flows. A session is a durable row, not a transient stream. Send a message a day later and the server triggers a fresh run on the same session.

How it works (Session-as-run-manager)

A Session row is task-bound (taskIdentifier + triggerConfig are required) and owns its current run via currentRunId + currentRunVersion for optimistic claim. Three trigger paths:

Session create — POST /api/v1/sessions creates the row and triggers the first run synchronously.
Append-time probe — POST /realtime/v1/sessions/:session/in/append checks if the current run is alive; if it has terminated (idle exit, crash, etc.), the server triggers a new run before processing the append.
End-and-continue handoff — POST /api/v1/sessions/:session/end-and-continue, called by the running agent, triggers a fresh run and atomically swaps currentRunId. Used by chat.requestUpgrade() for version handoffs.

Every triggered run is recorded in the SessionRun audit table with a reason (initial, continuation, upgrade, manual).

Public API surface

Control plane

POST /api/v1/sessions — create. Idempotent on (env, externalId). Triggers the first run, returns the session and a session-scoped public access token. Returns 409 if the upserted row is already closed.
GET /api/v1/sessions/:session — retrieve by friendlyId (session_abc...) or by your own externalId (server disambiguates by prefix).
GET /api/v1/sessions — list with filters (type, tag, taskIdentifier, externalId, derived status ACTIVE/CLOSED/EXPIRED, created-at range) and cursor pagination. Backed by ClickHouse.
PATCH /api/v1/sessions/:session — update tags / metadata / externalId.
POST /api/v1/sessions/:session/close — terminate. Idempotent, hard-blocks new server-brokered writes.
POST /api/v1/sessions/:session/end-and-continue — agent-only handoff to a fresh run.

Realtime

PUT /realtime/v1/sessions/:session/:io — initialize a channel. Returns S2 credentials in headers so high-throughput clients can write direct to S2.
GET /realtime/v1/sessions/:session/:io — SSE subscribe. Supports Last-Event-ID resume and an opt-in X-Peek-Settled: 1 header that fast-closes the stream when the upstream is already settled (trigger:turn-complete), eliminating long-poll wait on reconnect-on-reload paths.
POST /realtime/v1/sessions/:session/:io/append — server-side appends.
POST /api/v1/runs/:runFriendlyId/session-streams/wait — runs wait on a session stream as a waitpoint, with a race-check to avoid suspending if data already landed.

Auth scopes

sessions is a new resource type. read:sessions:{id}, write:sessions:{id}, admin:sessions:{id} flow through the existing JWT validator. Session-scoped public access tokens minted by the server replace browser-held trigger-task tokens for chat-style flows — the browser never sees a run identifier or a run-scoped token in steady state.

What's coming after this PR

SDK + chat.agent migration: separate branch, separate PR, ships in the next @trigger.dev/sdk prerelease alongside this server deploy. Customers using the prerelease chat.agent will follow the upgrade guide.
Dashboard surfaces: dedicated agent list, agent playground, agent view on the run dashboard. Tracking separately.

Implementation notes

Postgres Session table: scalar scoping columns (projectId, runtimeEnvironmentId, environmentType, organizationId) without FKs, matching the January TaskRun FK-removal decision. Point-lookup indexes only — list queries go to ClickHouse. Terminal markers (closedAt, expiresAt) are write-once.
ClickHouse sessions_v1: ReplacingMergeTree, partitioned by month, ordered by (org_id, project_id, environment_id, created_at, session_id). Tags indexed via tokenbf_v1 skip index.
SessionsReplicationService: mirrors RunsReplicationService exactly — leader-locked logical replication consumer, ConcurrentFlushScheduler, retry with exponential backoff + jitter, identical metric shape. Dedicated slot + publication so the two consume independently.
S2 keys: sessions/{addressingKey}/{out|in}. The existing runs/{runId}/{streamId} key format for run-scoped streams is untouched.
Optimistic claim: ensureRunForSession triggers a run upfront (cheap to cancel if it loses the race), then attempts an updateMany keyed on currentRunVersion. Loser cancels its triggered run and reuses the winner's. No DB lock held across the trigger.

What did NOT change

Run-scoped streams.pipe / streams.input and the existing /realtime/v1/streams/{runId}/... routes are unchanged. Sessions are net-new — not a reshaping of the current streams API.

Deploy notes

Set SESSION_REPLICATION_CLICKHOUSE_URL and SESSION_REPLICATION_ENABLED=1 to enable the replication consumer.
The Session table needs REPLICA IDENTITY FULL set on the prod source DB before the publication is created (same one-time DDL we did for TaskRun). Required for delete events to carry full column values.
Cross-form authorization on the GET /api/v1/sessions/:session loader (a JWT minted for either form authorizes both URL forms). Action routes are URL-form-specific, matching how the SDK mints PATs.

Verification

Webapp typecheck clean (10/10).
apps/webapp/test/sessionsReplicationService.test.ts — round-trip tests for insert/update/delete through Postgres logical replication into ClickHouse via testcontainers.
Live end-to-end against local dev: create + retrieve (both forms) + update + close, .out.initialize + .out.append x2 + .in.send + .out.subscribe over SSE, list with all filter combinations + pagination, end-and-continue swap, X-Peek-Settled fast-close (verified in browser via reconnect-on-reload and via curl). Replicated row lands in ClickHouse within ~1s.
Multi-round Devin + CodeRabbit review feedback addressed (read-after-write paths use prisma writer, info-leak on auth-routes masked as 403, peek-settled discriminator parsing fix, etc.).

Test plan

pnpm run typecheck --filter webapp
pnpm run test --filter webapp ./test/sessionsReplicationService.test.ts --run
Start the webapp with SESSION_REPLICATION_CLICKHOUSE_URL and SESSION_REPLICATION_ENABLED=1. Confirm the slot and publication auto-create on boot.
POST /api/v1/sessions and verify the row replicates to trigger_dev.sessions_v1 within a couple of seconds.
POST /api/v1/sessions/:id/close, then confirm POST /realtime/v1/sessions/:id/out/append returns 400.
Reuse a closed session's externalId on POST /api/v1/sessions and confirm 409.
GET /realtime/v1/sessions/:id/out with X-Peek-Settled: 1 after a turn completes and confirm X-Session-Settled: true response header + immediate close.