Models/Kling Lip-Sync Video

Kling Lip-Sync Video

rh-lip-sync-video

Kling Lip-Sync Video does frame-level lip synchronization, aligning an audio track to the mouth movements of a character in a video — real humans, 3D and 2D animated characters — with local audio upload or online TTS and minute-level duration. The typical flow is to run Kling Face Recognition first to get a faceId, then align audio (uploaded or from Kling Lip-Sync TTS) to that face. Ideal for digital-human voiceover, dub-to-lip-sync and talking animated characters.

Lip SyncMulti-CharacterAudio AlignmentMinute-Level Duration

per 5s$0.065/s

Frame-Level Sync

Precise lip-audio alignment

Multi-Character

Real human, 3D, 2D support

Audio Modes

Upload or online TTS

Long Duration

Minute-level video generation

API Docs

Kling lip-sync is a 3-step flow — run them in order; intermediate values carry forward automatically.

Step 1 · Face Recognition

Source video (MP4/MOV, 2-60s, 720p/1080p, clear face)

Upload

Upload or paste a public video URL; recognition returns sessionId + faceId.

Step 2 · Prepare Audio

Step 3 · Generate Lip-Sync Video

soundStartTime (ms)

soundEndTime (ms)

soundInsertTime (ms)

Trimmed audio must be ≥2s; the insert window must overlap the face window by ≥2s.

Last updated: 2026-06-21

TL;DR Kling Lip-Sync Video is a Kling video generation model, callable via API Models' unified API (model name `rh-lip-sync-video`). Pricing: per 5s: $0.065. One API key for all image / video / LLM / audio models — 60-95% cheaper than official.

About Kling Lip-Sync Video

Kling Lip-Sync Video is a Video Generation API provided by Kling. Kling Lip-Sync Video does frame-level lip synchronization, aligning an audio track to the mouth movements of a character in a video — real humans, 3D and 2D animated characters — with local audio upload or online TTS and minute-level duration. The typical flow is to run Kling Face Recognition first to get a faceId, then align audio (uploaded or from Kling Lip-Sync TTS) to that face. Ideal for digital-human voiceover, dub-to-lip-sync and talking animated characters. Through API Models platform, you can access this model via a unified API at prices significantly lower than official rates. Current pricing: per 5s: $0.065.

Key Features

Frame-Level Sync -- Precise lip-audio alignment
Multi-Character -- Real human, 3D, 2D support
Audio Modes -- Upload or online TTS
Long Duration -- Minute-level video generation

Use Cases

Marketing Videos

Quickly generate brand promotion videos for ad campaigns and social media marketing.

Social Media Content

Create compelling short-form video content for platforms like TikTok, Instagram, and YouTube.

Product Demos

Generate product feature demonstrations and tutorials to improve user conversion.

Educational Content

Produce course explanations, knowledge explainers, and training videos at low cost.

Why API Models

Unified API -- One API key to access all models, no need to register on multiple platforms
Cost Savings -- 60-95% cheaper than official pricing, ideal for indie developers and startups
Instant Access -- Start using immediately after signup, supports Stripe and Alipay payments
Full Documentation -- Detailed API docs with code examples in cURL, Python, and Node.js

Frequently Asked Questions

How much does Kling Lip-Sync Video cost?

Kling Lip-Sync Video is available through API Models at: per 5s: $0.065. This is up to 95% cheaper than official pricing.

How to use Kling Lip-Sync Video API?

Sign up at API Models, get your API key, and call our unified API endpoint. We provide detailed API documentation with code examples in cURL, Python, and Node.js.

What is the difference between API Models and the official Kling API?

API Models offers the same Kling Lip-Sync Video model at 60-95% lower cost through our aggregation platform. We provide a unified API interface so you do not need separate accounts for each provider - one API key to access all models.

What is Kling Lip-Sync Video?

It does frame-level lip synchronization — aligning an audio track to the mouth movements of a character in a video, for real humans, 3D and 2D animated characters, with local audio upload or online TTS and minute-level duration. Good for digital-human voiceover, dub-to-lip-sync, and talking animated characters.

How do I use Kling Lip-Sync Video?

Typical flow: first run Kling Face Recognition (kling-identify-face) to detect a face in the video and get a faceId, then align audio (uploaded or generated via Kling Lip-Sync TTS) to that face to produce the lip-synced video.

How does Kling Lip-Sync Video compare to other Video Generation models?

On API Models, Kling Lip-Sync Video runs alongside 60+ models on one API key and one balance, so choosing is about fit, not lock-in. It supports Lip Sync, Multi-Character, Audio Alignment, Minute-Level Duration, and you can weigh it on price and capability against other Video Generation models, then switch by changing a single model-name string — no new account or integration. Browse every Video Generation option with live pricing at apimodels.app/models.

What can Kling Lip-Sync Video do?

Kling Lip-Sync Video supports: Lip Sync, Multi-Character, Audio Alignment, Minute-Level Duration. See the API Models docs for full parameters and call examples.

Can I access the Kling Lip-Sync Video API from anywhere (incl. China)?

Yes. API Models exposes Kling Lip-Sync Video through a single unified API and one key — no separate provider accounts, and no need to handle each provider's regional network access yourself.

What payment methods are supported?

We support Stripe (Visa, Mastercard, and other international cards) and Alipay. Credits are available instantly after payment.