Models/Kling Lip-Sync TTS

Kling Lip-Sync TTS

rh-lip-sync-tts

Kling Lip-Sync TTS is text-to-speech designed for lip-sync, with multi-language and multi-dialect support, speed control, emotion styles and voice cloning. It generates the voice track that feeds directly into Kling Lip-Sync Video for mouth alignment, so you can synthesize emotionally styled speech (optionally in a cloned voice) and sync it to a character's mouth in one workflow.

Text to SpeechMulti-LanguageVoice CloneSpeed Control

per call$0.010/image

Multi-Language

Chinese and English voices

Voice Cloning

Custom voice replication

Speed Control

0.8x to 2x playback speed

Emotion Styles

Multiple voice personas

API Docs

Kling lip-sync is a 3-step flow — run them in order; intermediate values carry forward automatically.

Step 1 · Face Recognition

Source video (MP4/MOV, 2-60s, 720p/1080p, clear face)

Upload

Upload or paste a public video URL; recognition returns sessionId + faceId.

Step 2 · Prepare Audio

Step 3 · Generate Lip-Sync Video

soundStartTime (ms)

soundEndTime (ms)

soundInsertTime (ms)

Trimmed audio must be ≥2s; the insert window must overlap the face window by ≥2s.

Last updated: 2026-06-21

TL;DR Kling Lip-Sync TTS is a Kling audio & speech model, callable via API Models' unified API (model name `rh-lip-sync-tts`). Pricing: per call: $0.01. One API key for all image / video / LLM / audio models — 60-95% cheaper than official.

About Kling Lip-Sync TTS

Kling Lip-Sync TTS is a Audio & Speech API provided by Kling. Kling Lip-Sync TTS is text-to-speech designed for lip-sync, with multi-language and multi-dialect support, speed control, emotion styles and voice cloning. It generates the voice track that feeds directly into Kling Lip-Sync Video for mouth alignment, so you can synthesize emotionally styled speech (optionally in a cloned voice) and sync it to a character's mouth in one workflow. Through API Models platform, you can access this model via a unified API at prices significantly lower than official rates. Current pricing: per call: $0.01.

Key Features

Multi-Language -- Chinese and English voices
Voice Cloning -- Custom voice replication
Speed Control -- 0.8x to 2x playback speed
Emotion Styles -- Multiple voice personas

Use Cases

Voiceover & Narration

Generate professional-grade voiceovers for videos, animations, and ads with diverse voice options.

Podcast Production

Quickly produce podcast audio content with support for multi-character dialogue.

Audiobook Creation

Convert text content into natural, fluid speech for audiobook production.

Multilingual Dubbing

AI-powered multilingual dubbing and translation to help content reach global audiences.

Why API Models

Unified API -- One API key to access all models, no need to register on multiple platforms
Cost Savings -- 60-95% cheaper than official pricing, ideal for indie developers and startups
Instant Access -- Start using immediately after signup, supports Stripe and Alipay payments
Full Documentation -- Detailed API docs with code examples in cURL, Python, and Node.js

Frequently Asked Questions

How much does Kling Lip-Sync TTS cost?

Kling Lip-Sync TTS is available through API Models at: per call: $0.01. This is up to 95% cheaper than official pricing.

How to use Kling Lip-Sync TTS API?

Sign up at API Models, get your API key, and call our unified API endpoint. We provide detailed API documentation with code examples in cURL, Python, and Node.js.

What is the difference between API Models and the official Kling API?

API Models offers the same Kling Lip-Sync TTS model at 60-95% lower cost through our aggregation platform. We provide a unified API interface so you do not need separate accounts for each provider - one API key to access all models.

What is Kling Lip-Sync TTS?

It's text-to-speech designed for lip-sync: multi-language and multi-dialect, with speed control, emotion styles and voice cloning. The generated voice feeds directly into Kling Lip-Sync Video for mouth alignment.

Who is Kling Lip-Sync TTS for?

For anyone generating the voice track for a lip-sync video: synthesize text into emotionally styled speech at the right pace (optionally with a cloned voice), then align it to a character's mouth in the video.

How does Kling Lip-Sync TTS compare to other Audio & Speech models?

On API Models, Kling Lip-Sync TTS runs alongside 60+ models on one API key and one balance, so choosing is about fit, not lock-in. It supports Text to Speech, Multi-Language, Voice Clone, Speed Control, and you can weigh it on price and capability against other Audio & Speech models, then switch by changing a single model-name string — no new account or integration. Browse every Audio & Speech option with live pricing at apimodels.app/models.

What can Kling Lip-Sync TTS do?

Kling Lip-Sync TTS supports: Text to Speech, Multi-Language, Voice Clone, Speed Control. See the API Models docs for full parameters and call examples.

Can I access the Kling Lip-Sync TTS API from anywhere (incl. China)?

Yes. API Models exposes Kling Lip-Sync TTS through a single unified API and one key — no separate provider accounts, and no need to handle each provider's regional network access yourself.

What payment methods are supported?

We support Stripe (Visa, Mastercard, and other international cards) and Alipay. Credits are available instantly after payment.