Scheduled runs route to dedicated node pools

myftija

·Mar 25, 2026·#3271feat(supervisor): schedule-tree node affinity

Scheduled runs now prefer a dedicated node pool, reducing contention with on-demand runs that are more sensitive to cold start latency.

Scheduled runs create predictable hourly spikes that compete with on-demand runs for node capacity. When a burst of scheduled runs lands at the top of the hour, it can saturate shared pool resources, causing slower cold starts for users waiting on on-demand results.

This PR introduces soft scheduling affinity rules that route scheduled runs to a dedicated pool while on-demand runs avoid it. The affinity is a preference, not a requirement — runs fall back gracefully if the target pool is out of capacity.

The supervisor checks the run's rootTriggerSource annotation at dequeue time. When it equals "schedule", the workload manager applies soft node affinity preferences toward the scheduled runs pool. Non-scheduled runs apply soft anti-affinity to avoid that pool.

In the supervisor app's Kubernetes workload manager, new environment variables control the behavior: KUBERNETES_SCHEDULE_AFFINITY_ENABLED toggles the feature, while KUBERNETES_SCHEDULE_AFFINITY_POOL_LABEL_VALUE and KUBERNETES_SCHEDULE_AFFINITY_WEIGHT configure which pool to target and how strongly. This follows the same pattern as existing large machine affinity, which was also refactored for consistency.

This is part of an ongoing effort to improve resource isolation and reduce cold start latency for on-demand workloads.

View Original GitHub Description

Scheduled runs create predictable hourly spikes that compete with on-demand runs for node capacity. Runs triggered "on-demand" via the SDK, API, or dashboard, are more sensitive to cold start latency since users are typically waiting on the result. When a burst of scheduled runs lands at the top of the hour, it can saturate the shared pool resources causing contention, affecting cold starts across the board.

The idea in this change is to absorb these periodic spikes in a dedicated pool without affecting the cold starts of on-demand runs. Scheduled runs are inherently less sensitive to cold starts.

Changes in this PR

Follows up on run annotations (#3241), which made trigger origin available on every run in the tree. This PR exposes annotations at dequeue time to the supervisor. This enables scheduling decisions based on trigger source.

The affinities are soft preferences at schedule time, so runs fall back gracefully if the target pool is out out of capacity.