Click to upload or drag and drop
Supported formats: MP4, QUICKTIME, X-MATROSKA Maximum file size: 500MB
Video asset URL
Click to upload or drag and drop
Supported formats: MPEG, WAV, X-WAV, AAC, MP4, OGG Maximum file size: 10MB
Target pure vocal audio URL; used to drive video lip movements.
Service identifier
Enable vocal separation to suppress background noise.
Whether to enable scene segmentation and speaker identification. Supported only in Basic mode.
Supported in lite mode. Whether to loop the video when the audio is longer than the video.
Supported in lite mode. Whether to loop the video in reverse (backward). Requires align_audio to be set to true.
Supported in lite mode. Start time of the template video, in seconds.
no output
Volcengine Video-to-Video Lip Sync API: AI Lip Sync & Video Dubbing API
Integrate Volcengine Video-to-Video Lip Sync API for seamless AI lip sync video dubbing. High-accuracy mouth sync engine supporting multi-language video translation at scale. Available on Kie.ai.

Key Features of the Volcengine Lip Sync API
Frame-Accurate Lip Synchronization
Unlike traditional audio-driven methods that produce loose sync, Volcengine's deep learning model achieves pixel-level mouth-to-audio alignment — even preserving subtle articulations like "p" and "b" plosives. The result: natural-looking speech that passes the uncanny valley test.

Multi-Language Video Dubbing Pipeline
Combine the Lip Sync API with Volcengine's Video Translation API to automatically dub videos into 20+ languages. The pipeline detects original speech, translates it, generates synthesized voice, and syncs lip movements — all in a single async workflow with no manual intervention needed.

Native Ecosystem Integration
Built directly into Volcengine's Intelligent Vision Service, the API works out of the box with ByteDance's infrastructure: automatic transcoding, CDN delivery, watermark removal, and video moderation. No need to stitch together multiple services — one API key gives you the full video generation stack.

High-Throughput Async Task Processing
The API uses Volcengine's CVSubmitTask/CVGetResult asynchronous pattern, allowing you to submit hundreds of lip sync jobs simultaneously without blocking. Each job processes independently with progress tracking, callback URL delivery, and automatic retry on failure — built for production-scale content pipelines.

How to Use the Volcengine Video-to-Video Lip Sync API
Get started with our product in just a few simple steps...
Step 1: Sign Up for Volcengine Lip Sync API Access on Kie.ai
Sign up on Kie.ai and generate a secure Volcengine Lip Sync API key. This API key is required for authentication and enables you to access the full capabilities of the Volcengine Video-to-Video Lip Sync generation, including multi-language dubbing and batch processing.
Step 2: Submit a Lip Sync Task to the API
Use your API key to send a POST request with your input video URL, target audio file, and configuration parameters. The Volcengine Lip Sync API processes your request asynchronously via CVSubmitTask, handling frame-by-frame mouth movement analysis and audio-visual alignment within minutes.
Step 3: Retrieve and Deliver the Synced Video
After processing, the API responds with the task status and the output video URL. If you provide a callback URL, the Volcengine Video-to-Video API will automatically deliver the synced video result to your application for seamless integration into your content pipeline
Popular Use Cases for Video-to-Video Lip Sync API
E-Learning & Course Localization
Translate and lip-sync educational video courses for global audiences. A university in Singapore used the API to dub 200+ hours of lecture content into Mandarin, Hindi, and Bahasa Indonesia — reducing localization costs by 70% while maintaining instructor authenticity through accurate lip sync.
Social Media Content Repurposing
Creators on TikTok and YouTube Shorts can clone their lip movements into multiple languages with a single source video. A travel vlogger repurposed one English video into 12 languages using Volcengine's pipeline, growing their international subscriber base by 340% in 3 months — all without reshooting a single frame.
Corporate Video & Global Marketing
Enterprises use the Lip Sync API to quickly adapt product demos, CEO announcements, and training materials for regional markets. A Fortune 500 tech company localized 500 product tutorial videos into Japanese, Korean, and Thai, achieving lip sync accuracy that passed internal QA on the first review — no reshoots needed.
Film & TV Dubbing Automation
Post-production studios integrate the API to automate the initial lip sync pass for foreign-language dubbing. Volcengine's engine handles the first rough sync, reducing manual VFX artist hours by 60% — sync artists only need to fine-tune emotional extremes, dramatically accelerating the dubbing pipeline from weeks to days.
Frequently Asked Questions About Volcengine Video-to-Video Lip Sync API
Find answers to common questions about our service.