Stale Codex transport metadata now auto-normalizes

ngutman

·Apr 16, 2026·#67635fix(openai-codex): normalize stale transport metadata in resolution and discovery

OpenClaw now self-heals corrupted openai-codex transport settings, preventing Cloudflare 403 errors from breaking CLI and agent workflows.

OpenClaw users running openai-codex models were hitting a wall: requests failed with Cloudflare challenges because transport metadata persisted in contained stale endpoints or missing fields from earlier versions. The system routed requests to https://chatgpt.com/backend-api/v1 or left the API type undefined, triggering Cloudflare bot detection instead of reaching the actual Codex endpoint.

A two-part fix now normalizes transport metadata at both the resolution and discovery layers. During model resolution, the embedded runner checks for missing api fields or legacy /backend-api/v1 URLs and rewrites them to the canonical openai-codex-responses transport with the correct base URL. Discovery and listing flows now do the same instead of returning early when api was missing, allowing stale configuration rows to self-heal on access.

The change lives in the OpenAI Codex provider plugin, the embedded runner model normalization, and the model discovery pipeline. Users affected by the regression no longer need to manually edit their or add explicit transport overrides to their config.

View Original GitHub Description

Summary

Problem: stale openai-codex model metadata could survive in both runtime resolution and discovery/listing flows, especially when api was missing or the stored base URL was https://chatgpt.com/backend-api/v1.
Why it matters: OpenClaw could route Codex requests through broken transport metadata, leading to HTML/Cloudflare failures that were surfaced downstream as misleading provider errors.
What changed: openai-codex now canonicalizes native transport metadata to api: "openai-codex-responses" and baseUrl: "https://chatgpt.com/backend-api" across both model resolution and discovery normalization, with regression tests for missing-api and stale-/v1 rows.
What did NOT change (scope boundary): this PR does not change user-facing HTML/DNS error copy; it fixes the stale Codex transport metadata that triggers the broken request path.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #66674
Closes #67131
Related #66633, #66969
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: openai-codex transport normalization only repaired some stale native metadata paths during model resolution, and discovery normalization returned early unless api was already a string.
Missing detection / guardrail: native Codex routes with missing api or legacy https://chatgpt.com/backend-api/v1 were not normalized consistently across both execution and discovery/listing seams.
Contributing context (if known): users could carry stale agent-scoped models.json entries forward from earlier versions, so the bug persisted even after auth and probes looked healthy.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- extensions/openai/base-url.test.ts
- extensions/openai/openai-codex-provider.test.ts
- src/agents/pi-embedded-runner/model.test.ts
- src/agents/pi-model-discovery.auth.test.ts
- manual CLI validation via openclaw infer model run
Scenario the test should lock in: stale openai-codex rows with missing api or /backend-api/v1 normalize to the canonical native Codex transport in both runtime resolution and discovery/listing flows, and the live CLI path succeeds with local Codex OAuth credentials.
Why this is the smallest reliable guardrail: the bug lives in normalization seams, so focused provider + model/discovery tests catch it cheaply, and a single live CLI run confirms the end-to-end request path.
Existing test that already covers this (if any): none before this PR.
If no new test is added, why not: N/A

User-visible / Behavior Changes

openai-codex models with stale native metadata now self-heal during resolution and discovery/listing.
Agent and CLI flows that depend on those normalized rows will use the canonical Codex transport without requiring manual config overrides.

Diagram (if applicable)

Before:
[stale openai-codex row] -> [missing api or /backend-api/v1] -> [broken Codex request path]

After:
[stale openai-codex row] -> [normalize to openai-codex-responses + /backend-api] -> [canonical Codex transport]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS
Runtime/container: local pnpm workspace checkout
Model/provider: openai-codex/gpt-5.4
Integration/channel (if any): embedded runner model resolution + discovery/listing seams, plus local CLI infer path
Relevant config (redacted): stale native Codex metadata represented as missing api or https://chatgpt.com/backend-api/v1; local openai-codex:default OAuth profile present in ~/.openclaw/agents/main/agent/auth-profiles.json

Steps

Resolve or discover an openai-codex model row with missing api.
Resolve or discover an openai-codex model row using https://chatgpt.com/backend-api/v1.
Run the local CLI infer path with temporary agent dirs that contain those stale models.json variants and the local Codex OAuth profile:
- OPENCLAW_AGENT_DIR=<temp-agent-dir> pnpm openclaw infer model run --model openai-codex/gpt-5.4 --prompt 'Reply with exactly OK.' --json
Observe the CLI response.

Expected

Native Codex rows normalize to api: "openai-codex-responses" and baseUrl: "https://chatgpt.com/backend-api".
The live local CLI infer path succeeds for both stale metadata variants.

Actual

Matches expected in provider normalization, embedded runner resolution, and discovery normalization tests.
Live CLI validation succeeded for both stale metadata variants and returned OK.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Evidence notes:

Manual live CLI results on this branch:
- missing api case returned:
  - { "provider": "openai-codex", "model": "gpt-5.4", "outputs": [{ "text": "OK" }] }
- stale /backend-api/v1 case returned:
  - { "provider": "openai-codex", "model": "gpt-5.4", "outputs": [{ "text": "OK" }] }

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- stale openai-codex rows with missing api normalize correctly in provider tests
- stale /backend-api/v1 rows normalize correctly in provider tests
- embedded runner resolution normalizes both stale cases
- discovery normalization now repairs the same stale cases instead of returning them unchanged
- live pnpm openclaw infer model run --model openai-codex/gpt-5.4 succeeds against local Codex OAuth credentials when the temp agent dir contains a missing-api stale row
- live pnpm openclaw infer model run --model openai-codex/gpt-5.4 succeeds against local Codex OAuth credentials when the temp agent dir contains a stale /backend-api/v1 row
Edge cases checked:
- canonical https://chatgpt.com/backend-api remains recognized
- /backend-api/v2 and /backend-api/codex remain rejected by the base URL helper
- custom/non-native routes are not rewritten by these new tests
- unrelated memory-lancedb recall warnings did not block Codex inference during manual verification
What you did not verify:
- user-facing DNS/HTML error copy changes (not in scope)
- gateway/UI-specific flows beyond the local CLI infer path

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Risks and Mitigations

Risk: normalization could accidentally rewrite non-native Codex-like routes.
- Mitigation: normalization stays scoped to known native OpenAI/Codex base URLs, and tests explicitly reject /backend-api/v2 and /backend-api/codex.