AAC uploads remapped to M4A for transcription compatibility
Audio uploads in AAC format are now automatically normalized to M4A before hitting OpenAI-compatible transcription endpoints, fixing failures at MIME-sensitive providers.
When users uploaded AAC audio files for transcription, the files kept their original .aac extension instead of being normalized to .m4a. Downstream OpenAI-compatible audio paths often key off filename and container hints, and raw AAC files were being rejected or handled inconsistently at those endpoints.
A new normalization step now catches AAC uploads and remaps them to the widely-accepted M4A container format before the transcription request is built. The fix handles both explicit .aac extensions and cases where the file has no extension but carries an audio/aac MIME type — both now surface as M4A to downstream handlers.
This change lives in the media-understanding package's OpenAI-compatible audio path. No broader audio transcoding or codec conversion logic was modified.
View Original GitHub Description
Summary
Describe the problem and fix in 2–5 bullets:
- Problem: AAC audio uploads could keep an incompatible extension instead of being remapped to a widely accepted M4A container name.
- Why it matters: downstream OpenAI-compatible audio paths reject or mishandle raw AAC uploads more often than M4A-labelled equivalents.
- What changed: remap AAC uploads to M4A in the media-understanding audio normalization path and carry the missing changelog entry.
- What did NOT change (scope boundary): no broader audio transcoding policy or codec conversion logic changed.
- Supersedes #61094.
Change Type (select all)
- Bug fix
- Feature
- Refactor required for the fix
- Docs
- Security hardening
- Chore/infra
Scope (select all touched areas)
- Gateway / orchestration
- Skills / tool execution
- Auth / tokens
- Memory / storage
- Integrations
- API / contracts
- UI / DX
- CI/CD / infra
Linked Issue/PR
- Closes #
- Related #61094
- This PR fixes a bug or regression
Root Cause (if applicable)
- Root cause: the audio normalization path did not translate AAC inputs into the container/extension shape expected by downstream OpenAI-compatible audio handling.
- Missing detection / guardrail: coverage did not lock in AAC-to-M4A remapping for uploads entering the OpenAI-compatible audio path.
- Contributing context (if known): providers often key off filename/container hints even when the underlying audio bytes are otherwise usable.
Regression Test Plan (if applicable)
- Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
- Target test or file: src/media-understanding/openai-compatible-audio.test.ts
- Scenario the test should lock in: AAC inputs are surfaced as M4A for OpenAI-compatible audio handling.
- Why this is the smallest reliable guardrail: the behavior is a local media normalization decision.
- Existing test that already covers this (if any): none.
- If no new test is added, why not: N/A.
User-visible / Behavior Changes
AAC uploads now normalize to M4A where the downstream audio path expects that container label.
Diagram (if applicable)
N/A
Security Impact (required)
- New permissions/capabilities? (
Yes/No) No - Secrets/tokens handling changed? (
Yes/No) No - New/changed network calls? (
Yes/No) No - Command/tool execution surface changed? (
Yes/No) No - Data access scope changed? (
Yes/No) No - If any
Yes, explain risk + mitigation: N/A
Repro + Verification
Environment
- OS: macOS
- Runtime/container: Node 22 /
pnpm test:serial - Model/provider: OpenAI-compatible audio
- Integration/channel (if any): Media understanding
- Relevant config (redacted): targeted fixture/test doubles only
Steps
- Feed an AAC upload into the OpenAI-compatible audio normalization path.
- Run
pnpm test:serial src/media-understanding/openai-compatible-audio.test.ts. - Confirm the surfaced media metadata uses an M4A filename/container shape.
Expected
- AAC uploads normalize to an M4A-labelled file shape before the downstream request is built.
Actual
- Before the fix, AAC could stay in a shape that downstream handlers rejected or handled inconsistently.
Evidence
Attach at least one:
- Failing test/log before + passing after
- Trace/log snippets
- Screenshot/recording
- Perf numbers (if relevant)
Human Verification (required)
What you personally verified (not just CI), and how:
- Verified scenarios:
pnpm test:serial src/media-understanding/openai-compatible-audio.test.ts. - Edge cases checked: AAC inputs entering the OpenAI-compatible audio path.
- What you did not verify: live provider upload behavior outside the targeted unit coverage.
Review Conversations
- I replied to or resolved every bot review conversation I addressed in this PR.
- I left unresolved only the conversations that still need reviewer or maintainer judgment.
Compatibility / Migration
- Backward compatible? (
Yes/No) Yes - Config/env changes? (
Yes/No) No - Migration needed? (
Yes/No) No - If yes, exact upgrade steps: N/A
Risks and Mitigations
- Risk: narrow fix may miss adjacent provider-specific edge cases not covered by the focused regression.
- Mitigation: keep the patch scoped and ship targeted regression coverage for the implicated path only.