
rh-lip-sync-videoKling Lip-Sync Video does frame-level lip synchronization, aligning an audio track to the mouth movements of a character in a video — real humans, 3D and 2D animated characters — with local audio upload or online TTS and minute-level duration. The typical flow is to run Kling Face Recognition first to get a faceId, then align audio (uploaded or from Kling Lip-Sync TTS) to that face. Ideal for digital-human voiceover, dub-to-lip-sync and talking animated characters.
Precise lip-audio alignment
Real human, 3D, 2D support
Upload or online TTS
Minute-level video generation
Kling lip-sync is a 3-step flow — run them in order; intermediate values carry forward automatically.
Upload or paste a public video URL; recognition returns sessionId + faceId.
Trimmed audio must be ≥2s; the insert window must overlap the face window by ≥2s.
Kling Lip-Sync Video is a Video Generation API provided by Kling. Kling Lip-Sync Video does frame-level lip synchronization, aligning an audio track to the mouth movements of a character in a video — real humans, 3D and 2D animated characters — with local audio upload or online TTS and minute-level duration. The typical flow is to run Kling Face Recognition first to get a faceId, then align audio (uploaded or from Kling Lip-Sync TTS) to that face. Ideal for digital-human voiceover, dub-to-lip-sync and talking animated characters. Through API Models platform, you can access this model via a unified API at prices significantly lower than official rates. Current pricing: per 5s: $0.065.
Quickly generate brand promotion videos for ad campaigns and social media marketing.
Create compelling short-form video content for platforms like TikTok, Instagram, and YouTube.
Generate product feature demonstrations and tutorials to improve user conversion.
Produce course explanations, knowledge explainers, and training videos at low cost.
Kling Lip-Sync Video is available through API Models at: per 5s: $0.065. This is up to 95% cheaper than official pricing.
Sign up at API Models, get your API key, and call our unified API endpoint. We provide detailed API documentation with code examples in cURL, Python, and Node.js.
API Models offers the same Kling Lip-Sync Video model at 60-95% lower cost through our aggregation platform. We provide a unified API interface so you do not need separate accounts for each provider - one API key to access all models.
It does frame-level lip synchronization — aligning an audio track to the mouth movements of a character in a video, for real humans, 3D and 2D animated characters, with local audio upload or online TTS and minute-level duration. Good for digital-human voiceover, dub-to-lip-sync, and talking animated characters.
Typical flow: first run Kling Face Recognition (kling-identify-face) to detect a face in the video and get a faceId, then align audio (uploaded or generated via Kling Lip-Sync TTS) to that face to produce the lip-synced video.
On API Models, Kling Lip-Sync Video runs alongside 60+ models on one API key and one balance, so choosing is about fit, not lock-in. It supports Lip Sync, Multi-Character, Audio Alignment, Minute-Level Duration, and you can weigh it on price and capability against other Video Generation models, then switch by changing a single model-name string — no new account or integration. Browse every Video Generation option with live pricing at apimodels.app/models.
Kling Lip-Sync Video supports: Lip Sync, Multi-Character, Audio Alignment, Minute-Level Duration. See the API Models docs for full parameters and call examples.
Yes. API Models exposes Kling Lip-Sync Video through a single unified API and one key — no separate provider accounts, and no need to handle each provider's regional network access yourself.
We support Stripe (Visa, Mastercard, and other international cards) and Alipay. Credits are available instantly after payment.