ByteDance
Seedance 2.0 Real Person — full multimodal video generation that ACCEPTS real faces (unlike Ark-direct Seedance, which rejects them). Combine up to 3 reference images (character / scene / lighting), a driving video for camera and motion transfer, and an audio track for voice, plus a text prompt. 4-15s, 480p to 4k. With a driving video, billing follows the minimum-billing table (input + output seconds). Use only with consented subjects.
ByteDance
Seedance 2.0 Real Person Fast — faster, lower-cost real-person video. One reference image plus an optional driving video for motion/camera transfer, with a text prompt. Real-person supported. 4-15s, 480p to 4k. Use only with consented subjects.
Lightricks
LTX-2.3 unified text-to-video and image-to-video. Send a prompt for T2V, or add one reference image for I2V — fast, with 480p / 720p / 1080p output. Billed per second by resolution.
xAI
Grok Imagine Video 1.5 (Beta) — an alternative RunningHub channel for xAI Grok image-to-video. Turn one reference image into a cinematic clip with an optional prompt. Simple flat per-clip pricing by duration (5 / 8 / 10 / 12 / 15s), 480p or 720p.
xAI
xAI Grok Imagine Video 1.5 Preview — image-to-video with native synchronized audio. #1 on the Image-to-Video Arena, with lifelike motion, strong prompt adherence and consistent characters. 480p / 720p, 1-15s.
Omni Flash (Stable) — lower-cost, full-suite Gemini Omni video. Text / image (up to 7 refs) / video-to-video, plus reusable voices and consistent characters. 720p / 1080p / 4k, 4 / 6 / 8 / 10s, 16:9 or 9:16, optional seed.
ByteDance
ByteDance Seedance 2.0 cinematic video — direct official Volcengine API, stable and high-concurrency. Text, image and multimodal generation with friendly per-second pricing.
Gemini Omni Flash — unified video generator for both text-to-video and image-to-video (1 or 3 reference images). 720p / 1080p / 4k, 4 / 6 / 8 / 10s, optional 16:9 or 9:16 framing. One slug, two modes — drop in a prompt, optionally drop in images.
ByteDance
Film-grade edition of Seedance 2.0 — cinematic lighting, mood and camera motion, and it ACCEPTS real-person / realistic human reference images (unlike Ark-direct Seedance 2.0, which rejects real faces). Up to 4 reference images for identity-locked image-to-video — ideal for film-grade portrait and character work. Quality tier sits above the Ark variants; generation takes longer (typically 60-180s). Use only with consented subjects.
ByteDance
Seedance 2.0 Fast — direct official Volcengine API, faster and lower-cost. Text, image and multimodal video generation, stable under high concurrency.
ByteDance
ByteDance DreamActor V2 motion transfer. Drive any character image with reference video motion, supporting multi-person, anime and pets.
Kling
Kling AI lip-sync video generation. Frame-level lip synchronization with audio for real humans, 3D and 2D characters.
Kling
Kling text-to-speech synthesis with multi-language support, voice cloning, speed control and emotion styles.
Smooth cinematic transitions between a required first frame and required last frame. Outputs 720p or 1080p with native audio. Official stable channel — pricier than V3.1-fast but reliable, ideal for production.
VEO 3.1 Fast HD (720p) video generation. 8s fixed duration, 16:9 aspect ratio, reference image support.
VEO 3.1 Fast Full HD (1080p) video generation. 8s fixed duration, 16:9 aspect ratio, reference image support.
xAI
Grok Video 3 (alias of grok-video-3, same upstream). Per-second pricing $0.01/s, 6-30s output, T2V + I2V supported.
VEO 3.1 Fast 4K video generation. Requires start frame image. Supports start-end frame video generation.
Pruna AI
Fast video generation in ~10 seconds. Text/image/audio-to-video with draft mode for 4x faster previews. Built-in audio generation, up to 1080p 48FPS.
Kling
Generate videos with character motion control. Provide a reference image and motion video to create animated content.
Kling
Kling V3 Omni-Video with extended duration and keep-original-sound support for video editing. Flat $0.15/s billing.
Kling
Latest Kling V3 video generation. Supports 3-15s flexible duration, text-to-video and image-to-video with optional audio.
High-performance start-end frame video. Provide first + optional last frame, the model interpolates motion between them in seconds. Budget channel — cheaper than the official VEO, less stable.
xAI
Grok Video 3. 10-second video at $0.01/s. Supports text-to-video and image-to-video (up to 7 reference images).
xAI
Grok Video 3. Per-second pricing $0.01/s, 6-30 second output. Both T2V (omit images) and I2V (1-7 reference images) supported.