API ModelsAPI Models
Docs/Audio Models API
TTS

Audio Models API

Text-to-speech APIs powered by ElevenLabs and Minimax. Streaming TTS, async TTS, voice cloning and voice design.

Overview

The Audio API provides text-to-speech capabilities from multiple providers. ElevenLabs offers real-time streaming TTS, while Minimax (海螺) provides async TTS with voice cloning and voice design features.

ElevenLabs TTS

Real-time streaming TTS with natural voices. Supports 70+ languages.

Endpoints

POST/api/v1/tts/stream

Streaming text-to-speech conversion

GET/api/v1/voices

Get available voices

Available Models

Prices are per 1K characters

ModelPriceDescription
eleven-tts-flash¥0.20Ultra-fast, lowest latency, 32 languages
eleven-tts-turbo¥0.20Fast, low latency, 32 languages
eleven-tts-multilingual¥0.40High quality, 29 languages
eleven-tts-v3¥0.50Most expressive, 70+ languages

TTS Request Parameters

modelrequiredstring
The model to use (see table above)
textrequiredstring
The text to convert to speech
voice_idstring
Voice ID. Default: 21m00Tcm4TlvDq8ikWAM (Rachel)
language_codestring
Language code (e.g., "en", "zh", "ja")
output_formatstring
Audio format. Default: mp3_44100_128
voice_settingsobject
Voice customization settings

Voice Settings

stabilitynumber
0-1. Higher = more consistent. Default: 0.5
similarity_boostnumber
0-1. Higher = more similar to original. Default: 0.75
stylenumber
0-1. Style exaggeration. Default: 0
use_speaker_boostboolean
Boost presence and clarity. Default: true
speednumber
0.7-1.2. Speech speed. Default: 1.0

ElevenLabs - Basic TTS

curl -X POST https://apimodels.app/api/v1/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-tts-flash",
    "text": "Hello, this is a test of text to speech.",
    "voice_id": "21m00Tcm4TlvDq8ikWAM"
  }' \
  --output audio.mp3

Response

The API returns a streaming audio response. The response headers include:

Content-Type: audio/mpeg
Transfer-Encoding: chunked
X-Credits-Charged: 0.02

ElevenLabs - With Voice Settings

curl -X POST https://apimodels.app/api/v1/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-tts-v3",
    "text": "This is expressive speech with custom settings.",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "language_code": "en",
    "output_format": "mp3_44100_192",
    "voice_settings": {
      "stability": 0.7,
      "similarity_boost": 0.8,
      "style": 0.3,
      "use_speaker_boost": true,
      "speed": 1.1
    }
  }' \
  --output audio.mp3

Popular Voices

NameVoice IDDescription
Rachel21m00Tcm4TlvDq8ikWAMConversational female
AdampNInz6obpgDQGcFmaJgBDeep male
AntoniErXwobaYiN019PkySvjVFriendly male
BellaEXAVITQu4vr4xnSDxMaLWarm female
JoshTxGEqnHWrfWFTfGW9XjXDeep young male
SamyoZ06aMxZJJ28mfd3POQDynamic male

ElevenLabs - Get Voices

curl -X GET "https://apimodels.app/api/v1/voices?source=shared&page=0&page_size=30" \
  -H "Authorization: Bearer YOUR_API_KEY"

Voices Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "voices": [
      {
        "voice_id": "21m00Tcm4TlvDq8ikWAM",
        "name": "Rachel",
        "category": "premade",
        "description": "Calm, friendly female voice",
        "labels": {
          "gender": "female",
          "age": "young",
          "accent": "american"
        },
        "preview_url": "https://..."
      }
    ],
    "has_more": true,
    "page": 0,
    "page_size": 30
  }
}

Output Formats

mp3_44100_128MP3, 44.1kHz, 128kbps (default)
mp3_44100_192MP3, 44.1kHz, 192kbps (higher quality)
pcm_16000PCM, 16kHz (raw audio)
pcm_22050PCM, 22.05kHz (raw audio)
pcm_44100PCM, 44.1kHz (raw audio)

Minimax TTS

Async text-to-speech by Minimax (海螺). Supports voice cloning, voice design, and pronunciation dictionaries.

Endpoints

POST/api/v1/minimax/tts/stream

Sync TTS - returns audio directly (best for short text)

POST/api/v1/minimax/tts

Async TTS - create task, poll for result (best for long text)

GET/api/v1/minimax/tts/status?task_id=xxx

Query async TTS task status

GET/api/v1/minimax/files/retrieve?file_id=xxx

Get file download URL

POST/api/v1/minimax/files

Upload audio file (voice clone)

POST/api/v1/minimax/voice/clone

Quick voice clone

POST/api/v1/minimax/voice/design

Voice design from description

GET/api/v1/minimax/voices

Get available voice list

Available Models

Prices are per 1K characters

ModelPriceDescription
minimax-speech-02-turbo¥3.20Fast, cost-effective async TTS
minimax-speech-2.6-hd¥5.60High-definition, rich expressiveness

TTS Request Parameters

modelrequiredstring
minimax-speech-02-turbo or minimax-speech-2.6-hd
textrequiredstring
Text to convert to speech
voice_settingobject
Voice configuration (voice_id, speed, vol, pitch, etc.)
audio_settingobject
Audio output settings (sample_rate, bitrate, format, channel)
language_booststring
Boost specific language recognition (e.g., "zh", "en", "ja")
pronunciation_dictobject
Custom pronunciation dictionary

Minimax - Sync TTS (Stream)

Returns audio data directly. Best for short text (under ~5000 characters).

curl -X POST https://apimodels.app/api/v1/minimax/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-turbo",
    "text": "Hello, this is Minimax text to speech.",
    "voice_setting": {
      "voice_id": "male-qn-qingse",
      "speed": 1.0,
      "vol": 1.0,
      "pitch": 0
    },
    "audio_setting": {
      "sample_rate": 32000,
      "bitrate": 128000,
      "format": "mp3"
    }
  }' \
  --output audio.mp3

Async TTS Workflow

1
POST /api/v1/minimax/tts to create task, get task_id
2
GET /api/v1/minimax/tts/status?task_id=xxx to poll status
3
When status is done, get file_id from response
4
GET /api/v1/minimax/files/retrieve?file_id=xxx to get download URL

Minimax - Async TTS

# Step 1: Create async TTS task
curl -X POST https://apimodels.app/api/v1/minimax/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-turbo",
    "text": "Hello, this is Minimax text to speech.",
    "voice_setting": {
      "voice_id": "male-qn-qingse",
      "speed": 1.0,
      "vol": 1.0,
      "pitch": 0
    },
    "audio_setting": {
      "sample_rate": 32000,
      "bitrate": 128000,
      "format": "mp3"
    }
  }'

TTS Response

{
  "task_id": "xxx-xxx-xxx",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Status Response

{
  "task_id": "xxx-xxx-xxx",
  "status": 2,
  "file_id": "yyy-yyy-yyy",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Voice Clone Workflow

1
POST /api/v1/minimax/files to upload audio sample (purpose: voice_clone)
2
POST /api/v1/minimax/voice/clone with file_id to create cloned voice
3
Use the cloned voice_id in TTS requests
First use of a cloned/designed voice for TTS incurs an additional ¥9.9 fee.

Minimax - Voice Clone

# Step 1: Upload audio sample
curl -X POST https://apimodels.app/api/v1/minimax/files \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "purpose=voice_clone" \
  -F "file=@sample.mp3"

# Step 2: Clone voice (use file_id from step 1)
curl -X POST https://apimodels.app/api/v1/minimax/voice/clone \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "FILE_ID_FROM_UPLOAD",
    "voice_id": "my-cloned-voice-001"
  }'

# Step 3: Use cloned voice in TTS
curl -X POST https://apimodels.app/api/v1/minimax/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-turbo",
    "text": "This uses my cloned voice.",
    "voice_setting": {
      "voice_id": "my-cloned-voice-001"
    }
  }'

Voice Design

Create custom voices from text descriptions. POST /api/v1/minimax/voice/design with a prompt describing the desired voice characteristics.

Minimax - Get Voice List

Returns the full Minimax system voice list (327 voices). Optionally filter by language with ?language=xxx.

curl -X GET "https://apimodels.app/api/v1/minimax/voices" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Filter by language
curl -X GET "https://apimodels.app/api/v1/minimax/voices?language=English" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

{
  "voices": [
    {
      "voice_id": "male-qn-qingse",
      "name": "青涩青年音色",
      "language": "中文 (普通话)"
    },
    {
      "voice_id": "English_Graceful_Lady",
      "name": "Graceful Lady",
      "language": "English"
    }
  ],
  "total": 327
}

Supported language values: 中文 (普通话), 中文 (粤语), English, 日文, 韩文, 西班牙文, 葡萄牙文, 法文, 印尼文, 德文, 俄文, 意大利文, 阿拉伯文, 土耳其文 ...

Billing

ElevenLabs Billing

Credits based on character count:

Credits = (characters / 1000) * model_price * 10

Minimum charge: 10% of model price per request

Minimax Billing

Credits based on character count:

Credits = (characters / 1000) * credits_per_1K

Minimum charge: 10% of model price per request

Voice clone/design: additional ¥9.9 (99 credits) on first use of custom voice

Error Codes

400Invalid request parameters
401Invalid or missing API key
402Insufficient credits
404Model or voice not found
500Internal server error