Kling
Create custom voice profiles from audio samples. Upload .mp3/.wav/.mp4/.mov (5-30s) or reference a video ID.
Kling
Sync one or multiple faces in a video with custom audio. Supports precise timing control.
Kling
Identify faces in video for advanced lip-sync. Returns session ID and face IDs.
Kling
Generate sound effects from text descriptions. 3-10 second audio with natural quality.
Kling
Auto-generate sound effects and background music for videos. Supports ASMR mode for immersive content.
Kling
Text-to-speech with multiple voice options. Adjustable speed and multi-language support.
Minimax
High-definition async TTS by Minimax (海螺). Rich expressiveness with natural prosody. Supports voice clone and voice design.
Minimax
Fast and cost-effective async TTS by Minimax (海螺). Supports voice clone, voice design, and pronunciation dictionaries.
ElevenLabs
Ultra low latency model in 32 languages. Ideal for real-time conversational use cases.
ElevenLabs
High quality, low latency model in 32 languages. Best for developer use cases where speed matters.
ElevenLabs
Most life-like, emotionally rich mode in 29 languages. Best for voice overs, audiobooks, post-production.
ElevenLabs
Most expressive model with 70+ languages. Supports audio tags like [laughs], [whispers] for emotional control.
ElevenLabs
Multi-speaker dialogue generation with natural conversation flow. Perfect for podcasts and audiobooks.
ElevenLabs
Extract speech from background noise, music and ambient sounds. Clean audio extraction.
ElevenLabs
Translate audio/video while preserving emotion, timing and tone. Automatic lip-sync.