GPT-5.4 / GPT-5.5 · tunable reasoning · pay-per-token

Use apimodels in OpenAI Codex

A single config.toml gets the local OpenAI Codex CLI talking to apimodels GPT-5.4 / GPT-5.5. This page shows the full config, reasoning-effort control, pricing, and a curl sanity check.

TL;DR

Use wire_api = "responses" (not chat), base_url = "https://apimodels.app/api/v1", model = "gpt-5.4" (or "gpt-5.5"), set reasoning depth via model_reasoning_effort, and put your API key in the env var. Done.

Get running in 3 steps

1
Grab an API key
Open Console in your console and create an sk_… key.
2
Drop the config
Paste the snippet below into ~/.codex/config.toml (create the file if needed).
3
Export + run
Export your key, then just run codex.

Full config

~/.codex/config.toml

# ~/.codex/config.toml — drop-in, paste as-is
model_provider           = "apimodels"
model                    = "gpt-5.4"          # or "gpt-5.5"
review_model             = "gpt-5.5"          # model used by /review
model_reasoning_effort   = "medium"           # low | medium | high | xhigh
disable_response_storage = true

[model_providers.apimodels]
name     = "apimodels"
base_url = "https://apimodels.app/api/v1"
wire_api = "responses"
env_key  = "APIMODELS_API_KEY"

Shell

# put your apimodels key in the env var the config points at,
# then launch codex (add this line to ~/.zshrc to make it stick):
export APIMODELS_API_KEY="sk_…your_key…"
codex

Settings reference

Setting	Value	Why
wire_api	responses	Codex's native mode — recommended. gpt-5-4 / 5-5 now work over both chat and responses, but Codex runs best on responses (native reasoning + multi-turn tool state).
base_url	https://apimodels.app/api/v1	Our shared /v1 endpoint prefix.
model	gpt-5.4 / gpt-5.5	Use Codex's dot names (gpt-5.4 / gpt-5.5); the dash forms gpt-5-4 / gpt-5-5 also work.
review_model	gpt-5.5	Model used by the /review command. Optional — defaults to model above.
model_reasoning_effort	low / medium / high / xhigh	Codex turns this into the request-body reasoning.effort field — see below.
env_key	APIMODELS_API_KEY	Any name — Codex just reads whatever env var you point it at.
model_provider	apimodels	Must match the [model_providers.<name>] table key below.

Reasoning effort

Reasoning depth is the reasoning.effort field in the request body (Codex sets it via model_reasoning_effort) — no longer a model-name suffix. All levels share the same per-token price; higher effort just emits more reasoning_tokens (billed as part of output_tokens). When calling the API directly, pass "reasoning": { "effort": "high" }.

reasoning.effort	Use for
low	Fast / single-step / simple completions (default)
medium	Multi-step refactors, design tradeoffs
high	Hard debugging, cross-file analysis
xhigh	The hardest problems — give it room to think

Pricing

GPT-5.4 input / cached / output

$1.5 / $0.25 / $9

/ 1M tokens

GPT-5.5 input / cached / output

$3 / $0.5 / $18

/ 1M tokens

Output tokens include reasoning tokens. Each call is billed against your apimodels balance. Pricing for other models is in /docs/llm.

30-second sanity check

Before installing Codex, confirm the endpoint and your key work with a single curl:

curl -s https://apimodels.app/api/v1/responses \
  -H "Authorization: Bearer $APIMODELS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Reply with exactly: ok",
    "reasoning": { "effort": "low" },
    "max_output_tokens": 16
  }'

Expected response (truncated):

{
  "id": "resp_...",
  "object": "response",
  "model": "gpt-5.4",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{ "type": "output_text", "text": "ok" }]
  }],
  "usage": {
    "input_tokens": 22,
    "output_tokens": 5,
    "total_tokens": 27
  }
}

HTTP 200 with output[].content[0].text === "ok" means you're good to go.

responses or chat?

gpt-5-4 / gpt-5-5 now work either way. For Codex use wire_api = "responses" — its native mode, which handles reasoning and multi-turn tool calls most cleanly. If your client only speaks OpenAI chat format, wire_api = "chat" against /v1/chat/completions reaches these models too. For ordinary chat models (MiniMax-M2.5, grok-4.2, Claude, Gemini, …) just use chat.

Troubleshooting

Symptom	Cause / fix
HTTP 401 Invalid or missing API key	Env var not exported, or the key was disabled. Re-export APIMODELS_API_KEY=… or mint a new key in the console.
HTTP 400 Unknown model	Model name typo. Use gpt-5.4 / gpt-5.5 (dots, Codex default); gpt-5-4 / gpt-5-5 also work.
404 / endpoint not found	Wrong base_url — it must be https://apimodels.app/api/v1 (include /api/v1, no trailing slash). Codex appends /responses itself.
Empty reply or reasoning-only with no visible text	max_output_tokens too small — reasoning tokens ate the budget. Leave several hundred tokens for high/xhigh.
Bill higher than expected	output_tokens includes reasoning_tokens — at effort high / xhigh these can be many times the visible output. Pick the lowest effort that meets your quality bar.

TL;DR

Get running in 3 steps

Grab an API key

Drop the config

Export + run

Full config

Settings reference

Reasoning effort

Pricing

30-second sanity check

responses or chat?

Troubleshooting

TL;DR

Get running in 3 steps

Grab an API key

Drop the config

Export + run

Full config

Settings reference

Reasoning effort

Pricing

30-second sanity check

responses or chat?

Troubleshooting