
kling-custom-voiceKling Custom Voice creates a reusable custom voice from an audio sample — upload 5–30 seconds of clean, single-speaker audio (.mp3/.wav/.mp4/.mov) or reference a historical video ID. The resulting voice can be used in Kling TTS and the Kling Lip-Sync models, so a digital human or narration can speak in your proprietary voice and then be lip-synced to video.
Upload .mp3/.wav/.mp4/.mov samples
Use a historical video ID as source
Create reusable voice profiles
$0.006 per voice creation
Clean single voice, 5-30 seconds, no background noise
Create a custom voice to see the result
Kling Custom Voice is a Audio & Speech API provided by Kling. Kling Custom Voice creates a reusable custom voice from an audio sample — upload 5–30 seconds of clean, single-speaker audio (.mp3/.wav/.mp4/.mov) or reference a historical video ID. The resulting voice can be used in Kling TTS and the Kling Lip-Sync models, so a digital human or narration can speak in your proprietary voice and then be lip-synced to video. Through API Models platform, you can access this model via a unified API at prices significantly lower than official rates. Current pricing: per call: $0.006.
Generate professional-grade voiceovers for videos, animations, and ads with diverse voice options.
Quickly produce podcast audio content with support for multi-character dialogue.
Convert text content into natural, fluid speech for audiobook production.
AI-powered multilingual dubbing and translation to help content reach global audiences.
Kling Custom Voice is available through API Models at: per call: $0.006. This is up to 95% cheaper than official pricing.
Sign up at API Models, get your API key, and call our unified API endpoint. We provide detailed API documentation with code examples in cURL, Python, and Node.js.
API Models offers the same Kling Custom Voice model at 60-95% lower cost through our aggregation platform. We provide a unified API interface so you do not need separate accounts for each provider - one API key to access all models.
It creates a custom voice from an audio sample: upload 5–30 seconds of clean, single-speaker audio (.mp3/.wav/.mp4/.mov) or reference a historical video ID. The resulting voice can be used in Kling TTS and the Lip-Sync models.
Once cloned, select that voice in Kling TTS or Kling Lip-Sync TTS to synthesize speech — so a digital human or narration speaks in your proprietary voice — then pair it with lip-sync video.
On API Models, Kling Custom Voice runs alongside 60+ models on one API key and one balance, so choosing is about fit, not lock-in. It supports Custom Voice, Audio Upload, Video Reference, For TTS/Lip Sync, and you can weigh it on price and capability against other Audio & Speech models, then switch by changing a single model-name string — no new account or integration. Browse every Audio & Speech option with live pricing at apimodels.app/models.
Kling Custom Voice supports: Custom Voice, Audio Upload, Video Reference, For TTS/Lip Sync. See the API Models docs for full parameters and call examples.
Yes. API Models exposes Kling Custom Voice through a single unified API and one key — no separate provider accounts, and no need to handle each provider's regional network access yourself.
We support Stripe (Visa, Mastercard, and other international cards) and Alipay. Credits are available instantly after payment.