A single config.toml gets the local OpenAI Codex CLI talking to apimodels GPT-5.4. This page shows the full config, the reasoning-effort variants, pricing, and a curl sanity check.
Use wire_api = "responses" (not chat), base_url = "https://apimodels.app/api/v1", model = "gpt-5.4", and set your API key in the env var. Done.
Open Console in your console and create an sk_… key. New accounts get $1 of free credit.
Paste the snippet below into ~/.codex/config.toml (create the file if needed).
Export your key, then just run codex.
~/.codex/config.toml
# ~/.codex/config.toml
model_provider = "apimodels"
model = "gpt-5.4"
[model_providers.apimodels]
name = "apimodels"
base_url = "https://apimodels.app/api/v1"
wire_api = "responses"
env_key = "APIMODELS_API_KEY"Shell
export APIMODELS_API_KEY="sk_…your_key…"
codex| Setting | Value | Why |
|---|---|---|
| wire_api | responses | GPT-5.4 is only served via /v1/responses upstream — chat mode won't reach it. |
| base_url | https://apimodels.app/api/v1 | Our shared /v1 endpoint prefix. |
| model | gpt-5.4 | Or with a suffix: -low / -medium / -high / -xhigh — see the table below. |
| env_key | APIMODELS_API_KEY | Any name — Codex just reads whatever env var you point it at. |
| model_provider | apimodels | Must match the [model_providers.<name>] table key below. |
Bump reasoning depth by appending a suffix to the model name. All levels share the same per-token price; higher effort just emits more reasoning_tokens (billed as part of output_tokens).
| model | reasoning.effort | Use for |
|---|---|---|
| gpt-5.4 | none | Fast / single-step / simple completions |
| gpt-5.4-low | low | Light reasoning, day-to-day coding |
| gpt-5.4-medium | medium | Multi-step refactors, design tradeoffs |
| gpt-5.4-high | high | Hard debugging, cross-file analysis |
| gpt-5.4-xhigh | xhigh | The hardest problems — give it room to think |
Each call is billed at usage.input_tokens + usage.output_tokens against your apimodels balance. Pricing for other models is in /docs/llm.
Before installing Codex, confirm the endpoint and your key work with a single curl:
curl -s https://apimodels.app/api/v1/responses \
-H "Authorization: Bearer $APIMODELS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"input": "Reply with exactly: ok",
"max_output_tokens": 16
}'Expected response (truncated):
{
"id": "resp_...",
"object": "response",
"model": "gpt-5.4",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{ "type": "output_text", "text": "ok" }]
}],
"usage": {
"input_tokens": 22,
"output_tokens": 5,
"total_tokens": 27
}
}HTTP 200 with output[0].content[0].text === "ok" means you're good to go.
Codex supports both wire_api modes, but GPT-5.4 specifically is only routed via /v1/responses upstream — /chat/completions can't reach it. Other gpt-* models (gpt-5, gpt-5.1, gpt-5.2) work with either; for those you can use wire_api = "chat" against /v1/chat/completions. For GPT-5.4 it must be responses.
| Symptom | Cause / fix |
|---|---|
| HTTP 401 Invalid or missing API key | Env var not exported, or the key was disabled. Re-export APIMODELS_API_KEY=… or mint a new key in the console. |
| HTTP 400 Unknown model: gpt-5.4-foo | Only the five gpt-5.4[-low/-medium/-high/-xhigh] variants are valid — check for typos. |
| 404 / endpoint not found | Most often wire_api is set to "chat" — Codex then calls /chat/completions, which GPT-5.4 upstream doesn't serve. Set wire_api = "responses". |
| Empty reply or reasoning-only with no visible text | max_output_tokens too small — reasoning tokens ate the budget. Leave several hundred tokens for high/xhigh. |
| Bill higher than expected | output_tokens includes reasoning_tokens — at -high / -xhigh these can be many times the visible output. Pick the lowest effort that meets your quality bar. |