A single config.toml gets the local OpenAI Codex CLI talking to apimodels GPT-5.4 / GPT-5.5. This page shows the full config, reasoning-effort control, pricing, and a curl sanity check.
Use wire_api = "responses" (not chat), base_url = "https://apimodels.app/api/v1", model = "gpt-5.4" (or "gpt-5.5"), set reasoning depth via model_reasoning_effort, and put your API key in the env var. Done.
Open Console in your console and create an sk_… key.
Paste the snippet below into ~/.codex/config.toml (create the file if needed).
Export your key, then just run codex.
~/.codex/config.toml
# ~/.codex/config.toml — drop-in, paste as-is
model_provider = "apimodels"
model = "gpt-5.4" # or "gpt-5.5"
review_model = "gpt-5.5" # model used by /review
model_reasoning_effort = "medium" # low | medium | high | xhigh
disable_response_storage = true
[model_providers.apimodels]
name = "apimodels"
base_url = "https://apimodels.app/api/v1"
wire_api = "responses"
env_key = "APIMODELS_API_KEY"Shell
# put your apimodels key in the env var the config points at,
# then launch codex (add this line to ~/.zshrc to make it stick):
export APIMODELS_API_KEY="sk_…your_key…"
codex| Setting | Value | Why |
|---|---|---|
| wire_api | responses | Codex's native mode — recommended. gpt-5-4 / 5-5 now work over both chat and responses, but Codex runs best on responses (native reasoning + multi-turn tool state). |
| base_url | https://apimodels.app/api/v1 | Our shared /v1 endpoint prefix. |
| model | gpt-5.4 / gpt-5.5 | Use Codex's dot names (gpt-5.4 / gpt-5.5); the dash forms gpt-5-4 / gpt-5-5 also work. |
| review_model | gpt-5.5 | Model used by the /review command. Optional — defaults to model above. |
| model_reasoning_effort | low / medium / high / xhigh | Codex turns this into the request-body reasoning.effort field — see below. |
| env_key | APIMODELS_API_KEY | Any name — Codex just reads whatever env var you point it at. |
| model_provider | apimodels | Must match the [model_providers.<name>] table key below. |
Reasoning depth is the reasoning.effort field in the request body (Codex sets it via model_reasoning_effort) — no longer a model-name suffix. All levels share the same per-token price; higher effort just emits more reasoning_tokens (billed as part of output_tokens). When calling the API directly, pass "reasoning": { "effort": "high" }.
| reasoning.effort | Use for |
|---|---|
| low | Fast / single-step / simple completions (default) |
| medium | Multi-step refactors, design tradeoffs |
| high | Hard debugging, cross-file analysis |
| xhigh | The hardest problems — give it room to think |
Output tokens include reasoning tokens. Each call is billed against your apimodels balance. Pricing for other models is in /docs/llm.
Before installing Codex, confirm the endpoint and your key work with a single curl:
curl -s https://apimodels.app/api/v1/responses \
-H "Authorization: Bearer $APIMODELS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"input": "Reply with exactly: ok",
"reasoning": { "effort": "low" },
"max_output_tokens": 16
}'Expected response (truncated):
{
"id": "resp_...",
"object": "response",
"model": "gpt-5.4",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{ "type": "output_text", "text": "ok" }]
}],
"usage": {
"input_tokens": 22,
"output_tokens": 5,
"total_tokens": 27
}
}HTTP 200 with output[].content[0].text === "ok" means you're good to go.
gpt-5-4 / gpt-5-5 now work either way. For Codex use wire_api = "responses" — its native mode, which handles reasoning and multi-turn tool calls most cleanly. If your client only speaks OpenAI chat format, wire_api = "chat" against /v1/chat/completions reaches these models too. For ordinary chat models (MiniMax-M2.5, grok-4.2, Claude, Gemini, …) just use chat.
| Symptom | Cause / fix |
|---|---|
| HTTP 401 Invalid or missing API key | Env var not exported, or the key was disabled. Re-export APIMODELS_API_KEY=… or mint a new key in the console. |
| HTTP 400 Unknown model | Model name typo. Use gpt-5.4 / gpt-5.5 (dots, Codex default); gpt-5-4 / gpt-5-5 also work. |
| 404 / endpoint not found | Wrong base_url — it must be https://apimodels.app/api/v1 (include /api/v1, no trailing slash). Codex appends /responses itself. |
| Empty reply or reasoning-only with no visible text | max_output_tokens too small — reasoning tokens ate the budget. Leave several hundred tokens for high/xhigh. |
| Bill higher than expected | output_tokens includes reasoning_tokens — at effort high / xhigh these can be many times the visible output. Pick the lowest effort that meets your quality bar. |