OpenAI
Frontier model for complex professional work and agentic coding. Served via the Responses API with adjustable reasoning effort (low–xhigh), web search, and function calling.
Zhipu
Zhipu GLM-5.2 — a reasoning model with strong function-calling / tool-use, served via the OpenAI-compatible chat-completions endpoint.
OpenAI
OpenAI’s advanced reasoning model for agentic coding, knowledge work, scientific research, and complex multi-step task execution. Served via the Responses API with adjustable reasoning effort (low–xhigh), web search, and function calling.
API Models
Remove hardcoded subtitles and burned-in on-screen text from any video, leaving a clean background. Pick subtitle-only or any-text mode, quality or smaller-size output, and optionally target only chosen regions. Billed $0.01 per second of input video.
ByteDance
The fullest, most all-around capable Seedance 2.0 channel — full multimodal generation from text, a first / first+last frame, up to 9 reference images, reference video, reference audio, web search and native audio. Supports real people and asset creation. 480p / 720p / 1080p, 4-15s, per-second pricing.
ByteDance
The faster, cheaper variant of the fullest Seedance 2.0 channel — same full multimodal capability (text / first+last frame / up to 9 reference images / reference video / reference audio / web search / native audio), supports real people and asset creation. 480p / 720p, 4-15s, per-second pricing.
Google's gemini-3.1-flash-image (GA release). High-quality image generation and conversational editing at low latency. Priced by resolution: 512 $0.04, 1K/2K $0.06, 4K $0.10.
Google's gemini-3-pro-image (GA release). Top-quality, high-fidelity image generation and editing with advanced reasoning. Priced by resolution: 1K/2K $0.12, 4K $0.21.
ByteDance
Seedance 2.0 Real Person — full multimodal video generation that ACCEPTS real faces (unlike Ark-direct Seedance, which rejects them). Combine up to 3 reference images (character / scene / lighting), a driving video for camera and motion transfer, and an audio track for voice, plus a text prompt. 4-15s, 480p to 4k. With a driving video, billing follows the minimum-billing table (input + output seconds). Use only with consented subjects.
ByteDance
Seedance 2.0 Real Person Fast — faster, lower-cost real-person video. One reference image plus an optional driving video for motion/camera transfer, with a text prompt. Real-person supported. 4-15s, 480p to 4k. Use only with consented subjects.
Lightricks
LTX-2.3 unified text-to-video and image-to-video. Send a prompt for T2V, or add one reference image for I2V — fast, with 480p / 720p / 1080p output. Billed per second by resolution.
Minimax
Latest high-fidelity TTS by MiniMax (海螺). Predicts emotion and intonation from context for ultra-natural, expressive, personalized speech. Supports voice clone and voice design.
Minimax
Latest fast, cost-effective async TTS by MiniMax (海螺). Great quality-to-price for high-volume synthesis. Supports voice clone and voice design.
Minimax
High-fidelity TTS by MiniMax (海螺). Predicts emotion and intonation from context to produce ultra-natural, expressive, personalized speech — built for social, podcasts, audiobooks, news, education and digital humans. Supports voice clone and voice design.
xAI
Grok Imagine Video 1.5 (Beta) — an alternative RunningHub channel for xAI Grok image-to-video. Turn one reference image into a cinematic clip with an optional prompt. Simple flat per-clip pricing by duration (5 / 8 / 10 / 12 / 15s), 480p or 720p.
xAI
xAI Grok Imagine Video 1.5 Preview — image-to-video with native synchronized audio. #1 on the Image-to-Video Arena, with lifelike motion, strong prompt adherence and consistent characters. 480p / 720p, 1-15s.
Kling
Kling V3 image generation. Text-to-image and single-reference image-to-image, 1K/2K resolution. $0.05 per image.
Kling
Kling V3 Omni image generation. Multi-image reference & fusion, element consistency, single/series output, 1K/2K/4K — 1K/2K $0.05, 4K $0.10 per image.
Omni Flash (Stable) — lower-cost, full-suite Gemini Omni video. Text / image (up to 7 refs) / video-to-video, plus reusable voices and consistent characters. 720p / 1080p / 4k, 4 / 6 / 8 / 10s, 16:9 or 9:16, optional seed.
Anthropic
Anthropic's most capable model yet — built to autonomously carry long, complex work end to end. Ideal for big projects, building agents, and high-stakes scenarios demanding top quality and autonomy.
Anthropic
Latest Opus model with 1M context, 128K max output, and adaptive thinking — same tools and platform features as Opus 4.6.
Anthropic
Claude Opus 4.7 with extended thinking explicitly enabled for the most complex reasoning tasks.
GA release. Our most intelligent Flash model — consistent leadership on agentic execution, coding, and long-horizon tasks at scale.
xAI
Multimodal AI image generation by X platform. Generates high-quality images from text descriptions.
xAI
Upgraded multimodal AI model by X platform with stronger understanding and finer detail generation for higher precision images.
ByteDance
ByteDance Seedance 2.0 cinematic video — direct official Volcengine API, stable and high-concurrency. Text, image and multimodal generation with friendly per-second pricing.
Gemini Omni Flash — unified video generator for both text-to-video and image-to-video (1 or 3 reference images). 720p / 1080p / 4k, 4 / 6 / 8 / 10s, optional 16:9 or 9:16 framing. One slug, two modes — drop in a prompt, optionally drop in images.
ByteDance
Film-grade edition of Seedance 2.0 — cinematic lighting, mood and camera motion, and it ACCEPTS real-person / realistic human reference images (unlike Ark-direct Seedance 2.0, which rejects real faces). Up to 4 reference images for identity-locked image-to-video — ideal for film-grade portrait and character work. Quality tier sits above the Ark variants; generation takes longer (typically 60-180s). Use only with consented subjects.
ByteDance
Seedance 2.0 Fast — direct official Volcengine API, faster and lower-cost. Text, image and multimodal video generation, stable under high concurrency.
ByteDance
ByteDance DreamActor V2 motion transfer. Drive any character image with reference video motion, supporting multi-person, anime and pets.
Kling
Kling AI lip-sync video generation. Frame-level lip synchronization with audio for real humans, 3D and 2D characters.
Kling
Kling text-to-speech synthesis with multi-language support, voice cloning, speed control and emotion styles.
Doubao
Doubao Seedream 5.0 Lite via ByteDance Volcano Ark official API. Unified text-to-image and image-to-image (pass image for I2I, omit for T2I). 2K / 4K output, no watermark, PNG.
Smooth cinematic transitions between a required first frame and required last frame. Outputs 720p or 1080p with native audio. Official stable channel — pricier than V3.1-fast but reliable, ideal for production.
OpenAI
Cheapest OpenAI gpt-image-2 channel. Sync image-to-image edit API with multi-image fusion (up to 10), 1K/2K/4K output and quality control. Flat $0.03 per image.
OpenAI
OpenAI GPT Image 2 (beta channel). Text-to-image and multi-image editing (up to 16 reference images), aspect-ratio controlled output. Independent channel from the primary gpt-image-2 route for redundancy. Priced by resolution: $0.03 (1K) / $0.045 (2K) / $0.06 (4K).
OpenAI
OpenAI gpt-image-2. Text-to-image and multi-image editing (up to 10 reference images), aspect-ratio control, flat $0.03 per image at Medium/High quality.
Gemini 3 Pro Image via a budget channel. Professional asset creation with advanced reasoning and high-fidelity text rendering.
Gemini 3.1 Flash Image via a budget channel. High-performance image generation optimized for speed and high-volume use.
VEO 3.1 Fast HD (720p) video generation. 8s fixed duration, 16:9 aspect ratio, reference image support.
VEO 3.1 Fast Full HD (1080p) video generation. 8s fixed duration, 16:9 aspect ratio, reference image support.
VEO 3.1 Fast 4K video generation. Requires start frame image. Supports start-end frame video generation.
SparkPix
Sub 1 second text-to-image model built for production use cases. State-of-the-art speed, quality, and text rendering.
SparkPix
Sub 1 second multi-image editing model. Fast, affordable AI image editing with precise prompt adherence and multi-image support.
Pruna AI
Fast video generation in ~10 seconds. Text/image/audio-to-video with draft mode for 4x faster previews. Built-in audio generation, up to 1080p 48FPS.
Budget-friendly Gemini 3.1 Flash image generation. Text-to-image and image editing — 1K/2K $0.05, 4K $0.08 per image.
MiniMax
MiniMax M2.5 reaches or sets new SOTA in coding, tool calling, search, and office productivity tasks.
Most cost-effective multimodal model with fastest performance for high-frequency lightweight tasks.
Latest Pro model with enhanced reasoning and multimodal capabilities.
Kling
Create custom voice profiles from audio samples. Upload .mp3/.wav/.mp4/.mov (5-30s) or reference a video ID.
Kling
Generate videos with character motion control. Provide a reference image and motion video to create animated content.
Kling
Identify faces in a video and return a session ID and face IDs for Kling lip-sync video generation.
Anthropic
Latest Opus model with ultimate performance and reasoning capabilities.
Anthropic
Claude Opus 4.6 with extended thinking capability for the most complex reasoning tasks.
Anthropic
Latest Sonnet model with best performance and efficiency.
Anthropic
Claude Sonnet 4.6 with extended thinking capability for complex reasoning tasks.
Kling
AI image generation and editing by Kling (omni-image, model kling-image-o1). Supports 1K/2K resolution and multi-image input. $0.05 per image.
Kling
Generate sound effects from text descriptions. 3-10 second audio with natural quality.
Kling
Auto-generate sound effects and background music for videos. Supports ASMR mode for immersive content.
SeedVR
AI image upscaling and enhancement. Upscale images to 2K or 4K resolution with high quality detail preservation.
Kling
Text-to-speech with multiple voice options. Adjustable speed and multi-language support.
Kling
Kling V3 Omni-Video with extended duration and keep-original-sound support for video editing. Flat $0.15/s billing.
xAI
Trillion-parameter model with 16-Agent cluster collaboration, real-time data processing and self-evolution.
Fast image generation powered by Gemini 3.1 Flash. Supports text-to-image and image editing — 1K/2K $0.05, 4K $0.08 per image.
Minimax
High-definition async TTS by Minimax (海螺). Rich expressiveness with natural prosody. Supports voice clone and voice design.
Budget-friendly image editing powered by Gemini 3.1 Flash. Image-to-image only — 1K/2K $0.04, 4K $0.07 per image.
Minimax
Fast and cost-effective async TTS by Minimax (海螺). Supports voice clone, voice design, and pronunciation dictionaries.
xAI
Image generation and editing powered by Grok 4.2. Supports text-to-image creation and image editing with mask inpainting.
Fast and efficient multimodal model. Great for quick responses and simple tasks.
Advanced multimodal reasoning model with superior capabilities.
Gemini 3 Pro with extended thinking capability for complex reasoning tasks.
Kling
Latest Kling V3 video generation. Supports 3-15s flexible duration, text-to-video and image-to-video with optional audio.
ElevenLabs
Ultra low latency model in 32 languages. Ideal for real-time conversational use cases.
ElevenLabs
High quality, low latency model in 32 languages. Best for developer use cases where speed matters.
ElevenLabs
Most life-like, emotionally rich mode in 29 languages. Best for voice overs, audiobooks, post-production.
ElevenLabs
Most expressive model with 70+ languages. Supports audio tags like [laughs], [whispers] for emotional control.
ElevenLabs
Multi-speaker dialogue generation with natural conversation flow. Perfect for podcasts and audiobooks.
ElevenLabs
Extract speech from background noise, music and ambient sounds. Clean audio extraction.
ElevenLabs
Translate audio/video while preserving emotion, timing and tone. Automatic lip-sync.
Doubao
High quality Doubao Seedream 4.5 image generation. Supports text-to-image and image editing with 2K/4K resolution.
High-performance start-end frame video. Provide first + optional last frame, the model interpolates motion between them in seconds. Budget channel — cheaper than the official VEO, less stable.
Anthropic
Latest Opus model with enhanced capabilities and improved reasoning.
Anthropic
Claude Opus 4.5 with extended thinking capability for the most complex reasoning tasks.
xAI
Grok Video 3. 10-second video at $0.01/s. Supports text-to-video and image-to-video (up to 7 reference images).
xAI
Grok Video 3. Per-second pricing $0.01/s, 6 / 10 / 15 second output. Both T2V (omit images) and I2V (1-7 reference images) supported.
Anthropic
Fast and affordable model for lightweight tasks. Best for simple queries and quick responses.
Anthropic
Claude Haiku 4.5 with extended thinking capability for complex reasoning tasks.
Anthropic
Latest Sonnet model with improved performance and efficiency.
Anthropic
Claude Sonnet 4.5 with extended thinking capability for complex reasoning tasks.
Premium image generation powered by Gemini 3 Pro. 99% success rate. Best quality and reliability.
High quality image generation powered by Gemini 3 Pro. 97% success rate. Supports text-to-image and image editing.
Powerful multimodal model for complex tasks with excellent performance.
Gemini 2.5 Pro with extended thinking capability for complex reasoning.
Fast and cost-effective multimodal model. Best balance of speed and quality.
Gemini 2.5 Flash with extended thinking capability for reasoning tasks.
Fast image generation powered by Gemini 2.5 Flash. Supports text-to-image and image editing with natural language.
Lightweight and ultra-fast model. Best for simple tasks and high volume.
Anthropic
Most capable model with superior reasoning and analysis capabilities.
Anthropic
Claude Opus 4 with extended thinking capability for the most complex reasoning tasks.
Anthropic
Balanced model with excellent performance and cost efficiency. Great for most tasks.
Anthropic
Claude Sonnet 4 with extended thinking capability for complex reasoning tasks.
OpenAI
Small embedding model, efficient and cost-effective for most use cases.
OpenAI
Large embedding model for higher accuracy and flexible dimensions.