Docs/Audio TTS API

TTS

Audio TTS API

Access ElevenLabs and Minimax text-to-speech through a unified API -- multilingual, expressive, and high-quality audio synthesis.

Quick Start

Get your API key from Console

Choose an audio provider below

POST to create a TTS task

Poll GET to retrieve the audio URL

Authentication

Add Authorization header to all requests:

Authorization: Bearer YOUR_API_KEY

Endpoints

POST/api/v1/audio/tts

Create a TTS task

GET/api/v1/audio/tts?task_id=xxx

Query task status and get audio URL

API Reference

Select a provider to see its parameters and examples

11labs

ElevenLabs

Industry-leading text-to-speech from ElevenLabs. Ultra-low latency, 70+ languages, and highly expressive voice synthesis.

Models

ElevenLabs TTS Flash

eleven-tts-flash

$0.10/1K chars

Ultra-fast, 32 languages

ElevenLabs TTS Turbo

eleven-tts-turbo

$0.10/1K chars

Low latency, 32 languages

ElevenLabs TTS Multilingual

eleven-tts-multilingual

$0.20/1K chars

High quality, 29 languages

ElevenLabs TTS v3

eleven-tts-v3

$0.20/1K chars

Most expressive, 70+ languages

ElevenLabs Dialogue

eleven-dialogue

$0.20/1K chars

Multi-speaker dialogue (async task)

Voice Isolator

eleven-isolator

$0.24/min

Voice isolation / denoise (async task, billed by input length)

AI Dubbing

eleven-dubbing

$0.75/min

Dubbing translation, 29 languages, no watermark (async task)

Parameters

modelrequired

stringeleven-tts-flash / eleven-tts-turbo / eleven-tts-multilingual / eleven-tts-v3

textrequired

stringThe text content to convert to speech

voice_idrequired

stringVoice ID to use for synthesis. See popular voices below.

language_code

stringLanguage hint, e.g. "en", "zh", "ja". Improves accuracy for multilingual models.

callback_url

stringWebhook URL called when task completes

Popular Voice IDs

Sarah

Mature, reassuring female (default)

EXAVITQu4vr4xnSDxMaL

Bella

Professional, bright female

hpp4J3VqNfWAUOO0d1Us

Adam

Deep male

pNInz6obpgDQGcFmaJgB

George

Warm British storyteller

JBFqnCBsd6RMkjVDRZzb

Jessica

Playful, bright female

cgSgspJ2msm6clMCkdW9

Daniel

Steady British broadcaster

onwK4e9ZLuTAKqWW03F9

Notes

-eleven-tts-flash is recommended for real-time applications
-eleven-tts-v3 supports audio tags for emotional control
-Get the full live voice list from GET /api/v1/voices; do not use voice_ids from other systems or outdated lists
-Voice cloning: POST /api/v1/audio/voices/eleven {name, sample_url} (1-3 min of clean speech, ≤10MB) returns a stable cv_-prefixed voice_id usable directly in TTS. Cloning is free — you pay only for synthesis. GET the same path lists your clones; DELETE /api/v1/audio/voices/eleven/{voice_id} removes one
-Dialogue / Isolator / Dubbing use the async task endpoint POST /api/v1/audio/generations (then poll GET ?task_id=). Dialogue takes inputs:[{text, voice_id}] (preset voices; cv_ not yet supported); Isolator takes audio_url; Dubbing takes source_url + target_lang (e.g. "zh"/"en"/"ja") — output is watermark-free, video input yields a dubbed video

Code Example

# Step 1: Create TTS task
curl -X POST https://apimodels.app/api/v1/audio/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-tts-v3",
    "text": "Hello, this is a test of ElevenLabs text-to-speech.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "language_code": "en"
  }'

# Step 2: Poll status
curl "https://apimodels.app/api/v1/audio/tts?task_id=TASK_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response Format

Create Task Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "pending"
  }
}

Success Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "completed",
    "result": "https://cdn.example.com/audio.mp3",
    "createTime": 1705123450000,
    "completeTime": 1705123460000
  }
}

Failed Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "failed",
    "failMsg": "Invalid voice_id"
  }
}

Webhook Callback (callback_url)

Pass callback_url in the create request. When the task reaches the completed or failed terminal state, our server sends a single HTTP POST to that URL with Content-Type: application/json (no signing header). Delivery is retried up to 3 times (exponential backoff 1s/2s/4s, 10s per attempt); if still unsuccessful, a background job keeps retrying for up to 30 minutes until your endpoint returns 2xx.

Payload Structure

POST {your callback_url}
Content-Type: application/json

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "model": "<provider>/<model_name>",
    "state": "completed" | "failed",
    "param": "<JSON string>",            // request params, JSON.parse once
    "resultJson": "<JSON string> | null", // result object, JSON.parse once
    "failCode": null | "CONTENT_MODERATION | INVALID_INPUT | INSUFFICIENT_BALANCE | UPSTREAM_BUSY | UPSTREAM_FAILED | TIMEOUT | INTERNAL_ERROR | OTHER",
    "failMsg": null | "string",
    "retryable": true | false,           // present when state=failed: safe to retry/fallback
    "costTime": 12345,                    // duration in ms
    "completeTime": 1705123460000,        // ms epoch
    "createTime": 1705123450000           // ms epoch
  }
}

Note: data.param and data.resultJson are both JSON strings — call JSON.parse once on each.

Audio task: shape after JSON.parse(data.resultJson)

{
  "resultUrls":    ["https://r2.apimodels.app/audio/xxx.mp3"],
  "audioDuration": 12.5   // optional, seconds
}

resultUrls is an array of R2-hosted audio URLs (length 1 in success). When state=failed, resultJson is typically null or {"resultUrls":[]} — do not assume an audio link is present.

Node.js receiver example

app.post('/webhook/audio', express.json(), (req, res) => {
  const { taskId, state, resultJson, failMsg } = req.body.data
  if (state === 'completed') {
    const r = JSON.parse(resultJson)
    console.log('audio ready', taskId, r.resultUrls[0], r.audioDuration)
  } else {
    console.warn('audio failed', taskId, failMsg)
  }
  res.status(200).end()                 // must be 2xx, otherwise we retry
})

Notes

- A task stops retrying only after a 2xx response — once delivered it is never pushed again.
- Callbacks are not signed today. Embed a random token in your callback_url path and verify it on receipt.
- Use a public HTTPS endpoint that responds within 10 seconds (per-attempt timeout).

Task States

pendingQueued, waiting to start

processingAudio is being synthesized

completedDone -- audio URL available in result field

failedSynthesis failed

Error Codes

400Bad Request -- invalid or missing parameters

401Unauthorized -- invalid API key

402Payment Required -- insufficient credits

404Not Found -- task ID not found

500Internal Server Error

Important Notes

-Audio files are stored for 7 days -- download promptly
-Poll every 3-5 seconds for status updates
-Use callback_url for production workloads
-Keep your API key secure

Try in Playground Get API Key

Docs/Audio TTS API

TTS

Audio TTS API

Access ElevenLabs and Minimax text-to-speech through a unified API -- multilingual, expressive, and high-quality audio synthesis.

Quick Start

Get your API key from Console

Choose an audio provider below

POST to create a TTS task

Poll GET to retrieve the audio URL

Authentication

Add Authorization header to all requests:

Authorization: Bearer YOUR_API_KEY

Endpoints

POST/api/v1/audio/tts

Create a TTS task

GET/api/v1/audio/tts?task_id=xxx

Query task status and get audio URL

API Reference

Select a provider to see its parameters and examples

11labs

ElevenLabs

Industry-leading text-to-speech from ElevenLabs. Ultra-low latency, 70+ languages, and highly expressive voice synthesis.

Models

ElevenLabs TTS Flash

eleven-tts-flash

$0.10/1K chars

Ultra-fast, 32 languages

ElevenLabs TTS Turbo

eleven-tts-turbo

$0.10/1K chars

Low latency, 32 languages

ElevenLabs TTS Multilingual

eleven-tts-multilingual

$0.20/1K chars

High quality, 29 languages

ElevenLabs TTS v3

eleven-tts-v3

$0.20/1K chars

Most expressive, 70+ languages

ElevenLabs Dialogue

eleven-dialogue

$0.20/1K chars

Multi-speaker dialogue (async task)

Voice Isolator

eleven-isolator

$0.24/min

Voice isolation / denoise (async task, billed by input length)

AI Dubbing

eleven-dubbing

$0.75/min

Dubbing translation, 29 languages, no watermark (async task)

Parameters

modelrequired

stringeleven-tts-flash / eleven-tts-turbo / eleven-tts-multilingual / eleven-tts-v3

textrequired

stringThe text content to convert to speech

voice_idrequired

stringVoice ID to use for synthesis. See popular voices below.

language_code

stringLanguage hint, e.g. "en", "zh", "ja". Improves accuracy for multilingual models.

callback_url

stringWebhook URL called when task completes

Popular Voice IDs

Sarah

Mature, reassuring female (default)

EXAVITQu4vr4xnSDxMaL

Bella

Professional, bright female

hpp4J3VqNfWAUOO0d1Us

Adam

Deep male

pNInz6obpgDQGcFmaJgB

George

Warm British storyteller

JBFqnCBsd6RMkjVDRZzb

Jessica

Playful, bright female

cgSgspJ2msm6clMCkdW9

Daniel

Steady British broadcaster

onwK4e9ZLuTAKqWW03F9

Notes

-eleven-tts-flash is recommended for real-time applications
-eleven-tts-v3 supports audio tags for emotional control
-Get the full live voice list from GET /api/v1/voices; do not use voice_ids from other systems or outdated lists
-Voice cloning: POST /api/v1/audio/voices/eleven {name, sample_url} (1-3 min of clean speech, ≤10MB) returns a stable cv_-prefixed voice_id usable directly in TTS. Cloning is free — you pay only for synthesis. GET the same path lists your clones; DELETE /api/v1/audio/voices/eleven/{voice_id} removes one
-Dialogue / Isolator / Dubbing use the async task endpoint POST /api/v1/audio/generations (then poll GET ?task_id=). Dialogue takes inputs:[{text, voice_id}] (preset voices; cv_ not yet supported); Isolator takes audio_url; Dubbing takes source_url + target_lang (e.g. "zh"/"en"/"ja") — output is watermark-free, video input yields a dubbed video

Code Example

# Step 1: Create TTS task
curl -X POST https://apimodels.app/api/v1/audio/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-tts-v3",
    "text": "Hello, this is a test of ElevenLabs text-to-speech.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "language_code": "en"
  }'

# Step 2: Poll status
curl "https://apimodels.app/api/v1/audio/tts?task_id=TASK_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response Format

Create Task Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "pending"
  }
}

Success Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "completed",
    "result": "https://cdn.example.com/audio.mp3",
    "createTime": 1705123450000,
    "completeTime": 1705123460000
  }
}

Failed Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "state": "failed",
    "failMsg": "Invalid voice_id"
  }
}

Webhook Callback (callback_url)

Payload Structure

POST {your callback_url}
Content-Type: application/json

{
  "code": 200,
  "msg": "success",
  "data": {
    "taskId": "clxxx...",
    "model": "<provider>/<model_name>",
    "state": "completed" | "failed",
    "param": "<JSON string>",            // request params, JSON.parse once
    "resultJson": "<JSON string> | null", // result object, JSON.parse once
    "failCode": null | "CONTENT_MODERATION | INVALID_INPUT | INSUFFICIENT_BALANCE | UPSTREAM_BUSY | UPSTREAM_FAILED | TIMEOUT | INTERNAL_ERROR | OTHER",
    "failMsg": null | "string",
    "retryable": true | false,           // present when state=failed: safe to retry/fallback
    "costTime": 12345,                    // duration in ms
    "completeTime": 1705123460000,        // ms epoch
    "createTime": 1705123450000           // ms epoch
  }
}

Note: data.param and data.resultJson are both JSON strings — call JSON.parse once on each.

Audio task: shape after JSON.parse(data.resultJson)

{
  "resultUrls":    ["https://r2.apimodels.app/audio/xxx.mp3"],
  "audioDuration": 12.5   // optional, seconds
}

resultUrls is an array of R2-hosted audio URLs (length 1 in success). When state=failed, resultJson is typically null or {"resultUrls":[]} — do not assume an audio link is present.

Node.js receiver example

app.post('/webhook/audio', express.json(), (req, res) => {
  const { taskId, state, resultJson, failMsg } = req.body.data
  if (state === 'completed') {
    const r = JSON.parse(resultJson)
    console.log('audio ready', taskId, r.resultUrls[0], r.audioDuration)
  } else {
    console.warn('audio failed', taskId, failMsg)
  }
  res.status(200).end()                 // must be 2xx, otherwise we retry
})

Notes

- A task stops retrying only after a 2xx response — once delivered it is never pushed again.
- Callbacks are not signed today. Embed a random token in your callback_url path and verify it on receipt.
- Use a public HTTPS endpoint that responds within 10 seconds (per-attempt timeout).

Task States

pendingQueued, waiting to start

processingAudio is being synthesized

completedDone -- audio URL available in result field

failedSynthesis failed

Error Codes

400Bad Request -- invalid or missing parameters

401Unauthorized -- invalid API key

402Payment Required -- insufficient credits

404Not Found -- task ID not found

500Internal Server Error

Important Notes

-Audio files are stored for 7 days -- download promptly
-Poll every 3-5 seconds for status updates
-Use callback_url for production workloads
-Keep your API key secure

Try in Playground Get API Key