Database commits halved for task dequeues
Database transactions are being merged in the task dequeue flow, cutting commits per operation from two to one to reduce database pressure.
Database transactions are being consolidated in the engine service. The task dequeue flow now merges execution snapshot creation into the primary task update transaction. Snapshot IDs are pre-generated locally, eliminating the need for subsequent database read steps to construct event payloads.
Consolidating these calls cuts the database commit load in half per dequeue operation, significantly reducing overhead on the system's highest-volume execution path. This represents the first phase of a broader initiative targeting five distinct operational flows to lower database resource consumption.
View Original GitHub Description
Summary
Nests the TaskRunExecutionSnapshot creation inside the taskRun.update() Prisma call in the dequeue flow, reducing 2 DB commits → 1 per dequeue operation. This is the highest-volume of the five unmerged flows identified in TRI-8450 (~9,200 commits/sec on the engine service).
Pattern: Follows the same nested-write approach already used in the completion path (runAttemptSystem.ts:735) and trigger path (engine/index.ts:674).
Changes:
dequeueSystem.ts: Moved snapshot creation intoexecutionSnapshots: { create: {...} }within the existingtaskRun.update(). Pre-generates the snapshot ID viagenerateInternalId()(plain cuid, matching what Prisma's@default(cuid())produces) so the event emission, heartbeat enqueue, and return value can all be constructed from data already in scope — no extra DB read needed after the merged write.SnapshotId.toFriendlyId()is used only for the return value'sfriendlyIdfield, matching the originalcreateExecutionSnapshotbehavior.executionSnapshotSystem.ts: Added publicenqueueHeartbeatIfNeeded()method that exposes the heartbeat scheduling logic (previously only available internally viacreateExecutionSnapshot). This is needed becausePENDING_EXECUTINGrequires a heartbeat, unlike theFINISHEDstatus in the completion reference pattern. This method is reusable by future merge targets (retry-immediate, checkpoint, cancel, requeue).
Net DB change per dequeue: eliminates 1 write transaction (the separate TaskRunExecutionSnapshot.create). No extra reads added — the snapshot ID is pre-generated and the executionSnapshotCreated event payload is constructed inline from values already available in the closure.
Review & Testing Checklist for Human
- Verify manually-constructed event payload matches DB state: The
executionSnapshotCreatedevent is now built inline (not read back from DB). Confirm the field values (runStatus: "PENDING",attemptNumber,checkpointId,workerId,runnerId,completedWaitpointIds) match what Prisma actually writes. A mismatch here would be silent — event consumers would get stale/wrong data. - Verify
attemptNumbersource is equivalent: Old code usedlockedTaskRun.attemptNumber(post-update result). New code usesresult.run.attemptNumber(pre-update). ThetaskRun.update()data payload does NOT includeattemptNumber, so they should be identical — but confirm this assumption holds for all dequeue scenarios (e.g. retried runs). - Verify
isValiddefaults totruein schema: The oldcreateExecutionSnapshotexplicitly setisValid: error ? false : true. The nested create omitsisValid(no error in the dequeue happy path). Confirm the Prisma schema default forTaskRunExecutionSnapshot.isValidistrue. - Verify
runStatus: "PENDING"hardcoding matches the mapping: The old code passedlockedTaskRun.status("DEQUEUED") tocreateExecutionSnapshot, which mapped it to "PENDING" viarun.status === "DEQUEUED" ? "PENDING" : run.status. The new code hardcodes"PENDING"directly. This is correct but brittle ifstatusever changes from "DEQUEUED" to something else upstream. - Spot-check
completedWaitpointsconnect + order logic: The nested create replicates the connect/order logic fromcreateExecutionSnapshot(lines 387-393). Verify thesnapshot.completedWaitpointstype providesidandindexfields compatible with this usage. - Verify
checkpointin return value: The return now usessnapshot.checkpoint(from the previous snapshot) instead of reading the newly-created snapshot's checkpoint relation. SincecheckpointIdis passed through unchanged, they should be identical — but worth a sanity check.
Recommended test plan: deploy to staging, run the sample_pg_activity.py sampler for a 5-minute window, and verify the COMMIT count drop on the engine service + proportional IO:XactSync reduction.
Notes
- This only covers the dequeue flow (flow #1 from TRI-8450). The remaining four flows (retry-immediate, checkpoint, requeue, cancel) are separate follow-ups.
- The new
enqueueHeartbeatIfNeededmethod is deliberately designed for reuse by those follow-up PRs. - CI note: the
priority.test.tsfailure in shard 7 is a flaky ordering assertion unrelated to this change (it comparesfriendlyIdvalues in dequeue order). Theauditcheck is also pre-existing/unrelated.
Link to Devin session: https://app.devin.ai/sessions/034fe0e7224f49278a2de260203e1377 Requested by: @ericallam