Audio Models API
Text-to-speech APIs powered by ElevenLabs and Minimax. Streaming TTS, async TTS, voice cloning and voice design.
Overview
The Audio API provides text-to-speech capabilities from multiple providers. ElevenLabs offers real-time streaming TTS, while Minimax (海螺) provides async TTS with voice cloning and voice design features.
ElevenLabs TTS
Real-time streaming TTS with natural voices. Supports 70+ languages.
Endpoints
/api/v1/tts/streamStreaming text-to-speech conversion
/api/v1/voicesGet available voices
Available Models
Prices are per 1K characters
| Model | Price | Description |
|---|---|---|
| eleven-tts-flash | ¥0.20 | Ultra-fast, lowest latency, 32 languages |
| eleven-tts-turbo | ¥0.20 | Fast, low latency, 32 languages |
| eleven-tts-multilingual | ¥0.40 | High quality, 29 languages |
| eleven-tts-v3 | ¥0.50 | Most expressive, 70+ languages |
TTS Request Parameters
Voice Settings
ElevenLabs - Basic TTS
curl -X POST https://apimodels.app/api/v1/tts/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "eleven-tts-flash",
"text": "Hello, this is a test of text to speech.",
"voice_id": "21m00Tcm4TlvDq8ikWAM"
}' \
--output audio.mp3Response
The API returns a streaming audio response. The response headers include:
Content-Type: audio/mpeg Transfer-Encoding: chunked X-Credits-Charged: 0.02
ElevenLabs - With Voice Settings
curl -X POST https://apimodels.app/api/v1/tts/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "eleven-tts-v3",
"text": "This is expressive speech with custom settings.",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"language_code": "en",
"output_format": "mp3_44100_192",
"voice_settings": {
"stability": 0.7,
"similarity_boost": 0.8,
"style": 0.3,
"use_speaker_boost": true,
"speed": 1.1
}
}' \
--output audio.mp3Popular Voices
| Name | Voice ID | Description |
|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Conversational female |
| Adam | pNInz6obpgDQGcFmaJgB | Deep male |
| Antoni | ErXwobaYiN019PkySvjV | Friendly male |
| Bella | EXAVITQu4vr4xnSDxMaL | Warm female |
| Josh | TxGEqnHWrfWFTfGW9XjX | Deep young male |
| Sam | yoZ06aMxZJJ28mfd3POQ | Dynamic male |
ElevenLabs - Get Voices
curl -X GET "https://apimodels.app/api/v1/voices?source=shared&page=0&page_size=30" \ -H "Authorization: Bearer YOUR_API_KEY"
Voices Response
{
"code": 200,
"msg": "success",
"data": {
"voices": [
{
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"name": "Rachel",
"category": "premade",
"description": "Calm, friendly female voice",
"labels": {
"gender": "female",
"age": "young",
"accent": "american"
},
"preview_url": "https://..."
}
],
"has_more": true,
"page": 0,
"page_size": 30
}
}Output Formats
mp3_44100_128MP3, 44.1kHz, 128kbps (default)mp3_44100_192MP3, 44.1kHz, 192kbps (higher quality)pcm_16000PCM, 16kHz (raw audio)pcm_22050PCM, 22.05kHz (raw audio)pcm_44100PCM, 44.1kHz (raw audio)Minimax TTS
Async text-to-speech by Minimax (海螺). Supports voice cloning, voice design, and pronunciation dictionaries.
Endpoints
/api/v1/minimax/tts/streamSync TTS - returns audio directly (best for short text)
/api/v1/minimax/ttsAsync TTS - create task, poll for result (best for long text)
/api/v1/minimax/tts/status?task_id=xxxQuery async TTS task status
/api/v1/minimax/files/retrieve?file_id=xxxGet file download URL
/api/v1/minimax/filesUpload audio file (voice clone)
/api/v1/minimax/voice/cloneQuick voice clone
/api/v1/minimax/voice/designVoice design from description
/api/v1/minimax/voicesGet available voice list
Available Models
Prices are per 1K characters
| Model | Price | Description |
|---|---|---|
| minimax-speech-02-turbo | ¥3.20 | Fast, cost-effective async TTS |
| minimax-speech-2.6-hd | ¥5.60 | High-definition, rich expressiveness |
TTS Request Parameters
Minimax - Sync TTS (Stream)
Returns audio data directly. Best for short text (under ~5000 characters).
curl -X POST https://apimodels.app/api/v1/minimax/tts/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-speech-02-turbo",
"text": "Hello, this is Minimax text to speech.",
"voice_setting": {
"voice_id": "male-qn-qingse",
"speed": 1.0,
"vol": 1.0,
"pitch": 0
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3"
}
}' \
--output audio.mp3Async TTS Workflow
POST /api/v1/minimax/tts to create task, get task_idGET /api/v1/minimax/tts/status?task_id=xxx to poll statusWhen status is done, get file_id from responseGET /api/v1/minimax/files/retrieve?file_id=xxx to get download URLMinimax - Async TTS
# Step 1: Create async TTS task
curl -X POST https://apimodels.app/api/v1/minimax/tts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-speech-02-turbo",
"text": "Hello, this is Minimax text to speech.",
"voice_setting": {
"voice_id": "male-qn-qingse",
"speed": 1.0,
"vol": 1.0,
"pitch": 0
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3"
}
}'TTS Response
{
"task_id": "xxx-xxx-xxx",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Status Response
{
"task_id": "xxx-xxx-xxx",
"status": 2,
"file_id": "yyy-yyy-yyy",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Voice Clone Workflow
POST /api/v1/minimax/files to upload audio sample (purpose: voice_clone)POST /api/v1/minimax/voice/clone with file_id to create cloned voiceUse the cloned voice_id in TTS requestsMinimax - Voice Clone
# Step 1: Upload audio sample
curl -X POST https://apimodels.app/api/v1/minimax/files \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "purpose=voice_clone" \
-F "file=@sample.mp3"
# Step 2: Clone voice (use file_id from step 1)
curl -X POST https://apimodels.app/api/v1/minimax/voice/clone \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "FILE_ID_FROM_UPLOAD",
"voice_id": "my-cloned-voice-001"
}'
# Step 3: Use cloned voice in TTS
curl -X POST https://apimodels.app/api/v1/minimax/tts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-speech-02-turbo",
"text": "This uses my cloned voice.",
"voice_setting": {
"voice_id": "my-cloned-voice-001"
}
}'Voice Design
Create custom voices from text descriptions. POST /api/v1/minimax/voice/design with a prompt describing the desired voice characteristics.
Minimax - Get Voice List
Returns the full Minimax system voice list (327 voices). Optionally filter by language with ?language=xxx.
curl -X GET "https://apimodels.app/api/v1/minimax/voices" \ -H "Authorization: Bearer YOUR_API_KEY" # Filter by language curl -X GET "https://apimodels.app/api/v1/minimax/voices?language=English" \ -H "Authorization: Bearer YOUR_API_KEY"
Response
{
"voices": [
{
"voice_id": "male-qn-qingse",
"name": "青涩青年音色",
"language": "中文 (普通话)"
},
{
"voice_id": "English_Graceful_Lady",
"name": "Graceful Lady",
"language": "English"
}
],
"total": 327
}Supported language values: 中文 (普通话), 中文 (粤语), English, 日文, 韩文, 西班牙文, 葡萄牙文, 法文, 印尼文, 德文, 俄文, 意大利文, 阿拉伯文, 土耳其文 ...
Billing
ElevenLabs Billing
Credits based on character count:
Credits = (characters / 1000) * model_price * 10
Minimum charge: 10% of model price per request
Minimax Billing
Credits based on character count:
Credits = (characters / 1000) * credits_per_1K
Minimum charge: 10% of model price per request
Voice clone/design: additional ¥9.9 (99 credits) on first use of custom voice