GPT-5.4 · tunable reasoning · pay-per-token

Use apimodels in OpenAI Codex

A single config.toml gets the local OpenAI Codex CLI talking to apimodels GPT-5.4. This page shows the full config, the reasoning-effort variants, pricing, and a curl sanity check.

TL;DR

Use wire_api = "responses" (not chat), base_url = "https://apimodels.app/api/v1", model = "gpt-5.4", and set your API key in the env var. Done.

Get running in 3 steps

1
Grab an API key
Open Console in your console and create an sk_… key. New accounts get $1 of free credit.
2
Drop the config
Paste the snippet below into ~/.codex/config.toml (create the file if needed).
3
Export + run
Export your key, then just run codex.

Full config

~/.codex/config.toml

# ~/.codex/config.toml
model_provider = "apimodels"
model          = "gpt-5.4"

[model_providers.apimodels]
name     = "apimodels"
base_url = "https://apimodels.app/api/v1"
wire_api = "responses"
env_key  = "APIMODELS_API_KEY"

Shell

export APIMODELS_API_KEY="sk_…your_key…"
codex

Settings reference

Setting	Value	Why
wire_api	responses	GPT-5.4 is only served via /v1/responses upstream — chat mode won't reach it.
base_url	https://apimodels.app/api/v1	Our shared /v1 endpoint prefix.
model	gpt-5.4	Or with a suffix: -low / -medium / -high / -xhigh — see the table below.
env_key	APIMODELS_API_KEY	Any name — Codex just reads whatever env var you point it at.
model_provider	apimodels	Must match the [model_providers.<name>] table key below.

Reasoning effort

Bump reasoning depth by appending a suffix to the model name. All levels share the same per-token price; higher effort just emits more reasoning_tokens (billed as part of output_tokens).

model	reasoning.effort	Use for
gpt-5.4	none	Fast / single-step / simple completions
gpt-5.4-low	low	Light reasoning, day-to-day coding
gpt-5.4-medium	medium	Multi-step refactors, design tradeoffs
gpt-5.4-high	high	Hard debugging, cross-file analysis
gpt-5.4-xhigh	xhigh	The hardest problems — give it room to think

Pricing

Input

$0.552

/ 1M tokens

Output (incl. reasoning tokens)

$4.412

/ 1M tokens

Each call is billed at usage.input_tokens + usage.output_tokens against your apimodels balance. Pricing for other models is in /docs/llm.

30-second sanity check

Before installing Codex, confirm the endpoint and your key work with a single curl:

curl -s https://apimodels.app/api/v1/responses \
  -H "Authorization: Bearer $APIMODELS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Reply with exactly: ok",
    "max_output_tokens": 16
  }'

Expected response (truncated):

{
  "id": "resp_...",
  "object": "response",
  "model": "gpt-5.4",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{ "type": "output_text", "text": "ok" }]
  }],
  "usage": {
    "input_tokens": 22,
    "output_tokens": 5,
    "total_tokens": 27
  }
}

HTTP 200 with output[0].content[0].text === "ok" means you're good to go.

Why not wire_api = "chat"?

Codex supports both wire_api modes, but GPT-5.4 specifically is only routed via /v1/responses upstream — /chat/completions can't reach it. Other gpt-* models (gpt-5, gpt-5.1, gpt-5.2) work with either; for those you can use wire_api = "chat" against /v1/chat/completions. For GPT-5.4 it must be responses.

Troubleshooting

Symptom	Cause / fix
HTTP 401 Invalid or missing API key	Env var not exported, or the key was disabled. Re-export APIMODELS_API_KEY=… or mint a new key in the console.
HTTP 400 Unknown model: gpt-5.4-foo	Only the five gpt-5.4[-low/-medium/-high/-xhigh] variants are valid — check for typos.
404 / endpoint not found	Most often wire_api is set to "chat" — Codex then calls /chat/completions, which GPT-5.4 upstream doesn't serve. Set wire_api = "responses".
Empty reply or reasoning-only with no visible text	max_output_tokens too small — reasoning tokens ate the budget. Leave several hundred tokens for high/xhigh.
Bill higher than expected	output_tokens includes reasoning_tokens — at -high / -xhigh these can be many times the visible output. Pick the lowest effort that meets your quality bar.

GPT-5.4 · tunable reasoning · pay-per-token

Use apimodels in OpenAI Codex

A single config.toml gets the local OpenAI Codex CLI talking to apimodels GPT-5.4. This page shows the full config, the reasoning-effort variants, pricing, and a curl sanity check.

TL;DR

Use wire_api = "responses" (not chat), base_url = "https://apimodels.app/api/v1", model = "gpt-5.4", and set your API key in the env var. Done.

Get running in 3 steps

1
Grab an API key
Open Console in your console and create an sk_… key. New accounts get $1 of free credit.
2
Drop the config
Paste the snippet below into ~/.codex/config.toml (create the file if needed).
3
Export + run
Export your key, then just run codex.

Full config

~/.codex/config.toml

# ~/.codex/config.toml
model_provider = "apimodels"
model          = "gpt-5.4"

[model_providers.apimodels]
name     = "apimodels"
base_url = "https://apimodels.app/api/v1"
wire_api = "responses"
env_key  = "APIMODELS_API_KEY"

Shell

export APIMODELS_API_KEY="sk_…your_key…"
codex

Settings reference

Setting	Value	Why
wire_api	responses	GPT-5.4 is only served via /v1/responses upstream — chat mode won't reach it.
base_url	https://apimodels.app/api/v1	Our shared /v1 endpoint prefix.
model	gpt-5.4	Or with a suffix: -low / -medium / -high / -xhigh — see the table below.
env_key	APIMODELS_API_KEY	Any name — Codex just reads whatever env var you point it at.
model_provider	apimodels	Must match the [model_providers.<name>] table key below.

Reasoning effort

Bump reasoning depth by appending a suffix to the model name. All levels share the same per-token price; higher effort just emits more reasoning_tokens (billed as part of output_tokens).

model	reasoning.effort	Use for
gpt-5.4	none	Fast / single-step / simple completions
gpt-5.4-low	low	Light reasoning, day-to-day coding
gpt-5.4-medium	medium	Multi-step refactors, design tradeoffs
gpt-5.4-high	high	Hard debugging, cross-file analysis
gpt-5.4-xhigh	xhigh	The hardest problems — give it room to think

Pricing

Input

$0.552

/ 1M tokens

Output (incl. reasoning tokens)

$4.412

/ 1M tokens

Each call is billed at usage.input_tokens + usage.output_tokens against your apimodels balance. Pricing for other models is in /docs/llm.

30-second sanity check

Before installing Codex, confirm the endpoint and your key work with a single curl:

curl -s https://apimodels.app/api/v1/responses \
  -H "Authorization: Bearer $APIMODELS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Reply with exactly: ok",
    "max_output_tokens": 16
  }'

Expected response (truncated):

{
  "id": "resp_...",
  "object": "response",
  "model": "gpt-5.4",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{ "type": "output_text", "text": "ok" }]
  }],
  "usage": {
    "input_tokens": 22,
    "output_tokens": 5,
    "total_tokens": 27
  }
}

HTTP 200 with output[0].content[0].text === "ok" means you're good to go.

Why not wire_api = "chat"?

Troubleshooting

Symptom	Cause / fix
HTTP 401 Invalid or missing API key	Env var not exported, or the key was disabled. Re-export APIMODELS_API_KEY=… or mint a new key in the console.
HTTP 400 Unknown model: gpt-5.4-foo	Only the five gpt-5.4[-low/-medium/-high/-xhigh] variants are valid — check for typos.
404 / endpoint not found	Most often wire_api is set to "chat" — Codex then calls /chat/completions, which GPT-5.4 upstream doesn't serve. Set wire_api = "responses".
Empty reply or reasoning-only with no visible text	max_output_tokens too small — reasoning tokens ate the budget. Leave several hundred tokens for high/xhigh.
Bill higher than expected	output_tokens includes reasoning_tokens — at -high / -xhigh these can be many times the visible output. Pick the lowest effort that meets your quality bar.

TL;DR

Get running in 3 steps

Grab an API key

Drop the config

Export + run

Full config

Settings reference

Reasoning effort

Pricing

30-second sanity check

Why not wire_api = "chat"?

Troubleshooting

TL;DR

Get running in 3 steps

Grab an API key

Drop the config

Export + run

Full config

Settings reference

Reasoning effort

Pricing

30-second sanity check

Why not wire_api = "chat"?

Troubleshooting