ByteDance
ByteDance Seedance 2.0 cinematic video — direct official Volcengine API, stable and high-concurrency. Text, image and multimodal generation with friendly per-second pricing.
Gemini Omni Flash — unified video generator for both text-to-video and image-to-video (1 or 3 reference images). 720p / 1080p / 4k, 4 / 6 / 8 / 10s, optional 16:9 or 9:16 framing. One slug, two modes — drop in a prompt, optionally drop in images.
ByteDance
Film-grade edition of Seedance 2.0 — tuned for cinematic lighting, mood, and camera motion, with strong identity-locked references (up to 4 images). Quality tier sits visibly above Ark-direct Seedance 2.0 / Fast, but generation takes longer (typically 60-180s, occasionally several minutes at peak). Pick this when look-and-feel matters more than turnaround time.
ByteDance
Seedance 2.0 Fast — direct official Volcengine API, faster and lower-cost. Text, image and multimodal video generation, stable under high concurrency.
ByteDance
ByteDance DreamActor V2 motion transfer. Drive any character image with reference video motion, supporting multi-person, anime and pets.
Kling
Kling AI lip-sync video generation. Frame-level lip synchronization with audio for real humans, 3D and 2D characters.
Kling
Kling text-to-speech synthesis with multi-language support, voice cloning, speed control and emotion styles.
Smooth cinematic transitions between a required first frame and required last frame. Outputs 720p or 1080p with native audio. Official stable channel — pricier than V3.1-fast but reliable, ideal for production.
Google VEO 3.1 Lite via OpenAI-style /v1/videos API. Reference image support, 4s/6s/8s durations, cost-effective video generation.
Google VEO 3.1 Lite 4K via OpenAI-style /v1/videos API. 4K resolution, reference image support, 4s/6s/8s durations.
VEO 3.1 Fast HD (720p) video generation via GeminiGen. 8s fixed duration, 16:9 aspect ratio, reference image support.
VEO 3.1 Fast Full HD (1080p) video generation via GeminiGen. 8s fixed duration, 16:9 aspect ratio, reference image support.
xAI
Grok Video 3 via RunningHub rhart-video-g (alias of grok-video-3, same upstream). Per-second pricing $0.01/s, 6-30s output, T2V + I2V supported.
RunningHub
VEO 3.1 Fast 4K video generation via RunningHub. Requires start frame image. Supports start-end frame video generation.
Pruna AI
Fast video generation in ~10 seconds. Text/image/audio-to-video with draft mode for 4x faster previews. Built-in audio generation, up to 1080p 48FPS.
Kling
Kling V3 video via Stable-QN channel. Supports text-to-video and image-to-video, 3-15s with optional audio.
Kling
Kling V3 Omni-Video via Stable-QN channel. Multi-modal input with image_list, video_list and keep-original-sound.
Vidu
Fast video generation by Vidu Q3 Turbo. Supports text/image/start-end frame to video, 1-16s, 540p-1080p.
Kling
Generate videos with character motion control. Provide a reference image and motion video to create animated content.
High-performance start-end frame video. Provide first + optional last frame, the model interpolates motion between them in seconds. Budget channel — cheaper than the official VEO, less stable.
xAI
Grok Video 3 via RunningHub rhart-video-g. 10-second video at $0.01/s. Supports text-to-video and image-to-video (up to 7 reference images).
xAI
Grok Video 3 via RunningHub rhart-video-g/image-to-video. Per-second pricing $0.01/s, 6-30 second output. Both T2V (omit images) and I2V (1-7 reference images) supported.