API key rotation now includes a 24-hour grace period

A 24-hour overlap window during API key rotation keeps previous keys active, preventing downtime while environment variables are updated.
API key rotation traditionally forced a hard cutover, breaking live services until new credentials could be deployed.
Now, API keys can be rotated with zero downtime. When a new key is generated, the previous key remains active for a 24-hour grace period. This provides a window to update environment variables across deployments while both keys are accepted.
The grace period can also be ended immediately or extended if needed. This change applies directly to the runtime authentication pipeline, so both old and new keys correctly authenticate and mint valid JWTs for workloads.
View Original GitHub Description
Summary
Regenerating a RuntimeEnvironment API key no longer immediately invalidates the previous one. Rotation is now overlap-based: the old key keeps working for 24 hours so customers can roll it out in their env vars without downtime, then stops working.
Design
- New
RevokedApiKeytable (one row per revocation). Holds the archivedapiKey, a FK to the env, anexpiresAt, and acreatedAt. Indexed onapiKey(high-cardinality equality — single-row hits) and onruntimeEnvironmentId. regenerateApiKeywraps both writes in a single$transaction: insert aRevokedApiKeywithexpiresAt = now + 24h, update the env with the newapiKey/pkApiKey.findEnvironmentByApiKeydoes a two-step lookup: primary unique-index hit onRuntimeEnvironment.apiKeyfirst; on miss,RevokedApiKey.findFirst({ apiKey, expiresAt: { gt: now } })with aninclude: { runtimeEnvironment }. Two-step (notOR-join) keeps the hot path identical to today and puts the fallback cost only on invalid keys. Both lookups use$replica.- Admin endpoint
POST /admin/api/v1/revoked-api-keys/:idaccepts{ expiresAt }and updates the row. Setting tonowends the grace window immediately; setting to the future extends it. - Modal copy on the regenerate dialog updated — previously warned of downtime, now explains the 24h overlap.
Why a separate table instead of columns on RuntimeEnvironment
- Keeps the hot auth path's primary lookup unchanged — no OR/nullable-apiKey semantics to reason about.
- Naturally supports multiple in-flight grace windows (regenerate twice in a day → two old keys valid until their independent expiries).
- FK + cascade cleans up correctly when an env is deleted; nothing to backfill.
Test plan
Verified locally against hello-world with dev and prod env keys:
- baseline — current key authenticates (
GET /api/v1/runs) →200 - regenerate via UI — DB shows old key in
RevokedApiKeywithexpiresAt ≈ now+24h, env has new key - grace window — both old and new keys →
200; bogus key →401 - admin endpoint:
expiresAt = now→ old key401 - admin endpoint:
expiresAt = +1h(after early-expire) → old key200again - admin endpoint:
expiresAt = past→ old key401 - admin 400 (invalid body), 404 (unknown id), 401 (missing/non-admin PAT)
- same flow exercised end-to-end on a PROD-typed env — behavior identical
-
pnpm run typecheck --filter webapppasses