
rh-lip-sync-ttsKling Lip-Sync TTS is text-to-speech designed for lip-sync, with multi-language and multi-dialect support, speed control, emotion styles and voice cloning. It generates the voice track that feeds directly into Kling Lip-Sync Video for mouth alignment, so you can synthesize emotionally styled speech (optionally in a cloned voice) and sync it to a character's mouth in one workflow.
Chinese and English voices
Custom voice replication
0.8x to 2x playback speed
Multiple voice personas
Kling lip-sync is a 3-step flow — run them in order; intermediate values carry forward automatically.
Upload or paste a public video URL; recognition returns sessionId + faceId.
Trimmed audio must be ≥2s; the insert window must overlap the face window by ≥2s.
Kling Lip-Sync TTS is a Audio & Speech API provided by Kling. Kling Lip-Sync TTS is text-to-speech designed for lip-sync, with multi-language and multi-dialect support, speed control, emotion styles and voice cloning. It generates the voice track that feeds directly into Kling Lip-Sync Video for mouth alignment, so you can synthesize emotionally styled speech (optionally in a cloned voice) and sync it to a character's mouth in one workflow. Through API Models platform, you can access this model via a unified API at prices significantly lower than official rates. Current pricing: per call: $0.01.
Generate professional-grade voiceovers for videos, animations, and ads with diverse voice options.
Quickly produce podcast audio content with support for multi-character dialogue.
Convert text content into natural, fluid speech for audiobook production.
AI-powered multilingual dubbing and translation to help content reach global audiences.
Kling Lip-Sync TTS is available through API Models at: per call: $0.01. This is up to 95% cheaper than official pricing.
Sign up at API Models, get your API key, and call our unified API endpoint. We provide detailed API documentation with code examples in cURL, Python, and Node.js.
API Models offers the same Kling Lip-Sync TTS model at 60-95% lower cost through our aggregation platform. We provide a unified API interface so you do not need separate accounts for each provider - one API key to access all models.
It's text-to-speech designed for lip-sync: multi-language and multi-dialect, with speed control, emotion styles and voice cloning. The generated voice feeds directly into Kling Lip-Sync Video for mouth alignment.
For anyone generating the voice track for a lip-sync video: synthesize text into emotionally styled speech at the right pace (optionally with a cloned voice), then align it to a character's mouth in the video.
On API Models, Kling Lip-Sync TTS runs alongside 60+ models on one API key and one balance, so choosing is about fit, not lock-in. It supports Text to Speech, Multi-Language, Voice Clone, Speed Control, and you can weigh it on price and capability against other Audio & Speech models, then switch by changing a single model-name string — no new account or integration. Browse every Audio & Speech option with live pricing at apimodels.app/models.
Kling Lip-Sync TTS supports: Text to Speech, Multi-Language, Voice Clone, Speed Control. See the API Models docs for full parameters and call examples.
Yes. API Models exposes Kling Lip-Sync TTS through a single unified API and one key — no separate provider accounts, and no need to handle each provider's regional network access yourself.
We support Stripe (Visa, Mastercard, and other international cards) and Alipay. Credits are available instantly after payment.