Input

prompt *

The text prompt used for video generation

image_url *

Click to upload or drag and drop

Supported formats: JPEG, PNG, WEBP Maximum file size: 10MB

URL of the input image. If the input image does not match the chosen aspect ratio, it is resized and center cropped

audio_url *

Click to upload or drag and drop

Supported formats: MP3, WAV, OGG, M4A, FLAC, AAC, X-MS-WMA, MPEG Maximum file size: 10MB

The URL of the audio file

num_frames

Number of frames to generate. Must be between 40 to 120, (must be multiple of 4)

frames_per_second

Frames per second of the generated video. Must be between 4 to 60. When using interpolation and adjust_fps_for_interpolation is set to true (default true,) the final FPS will be multiplied by the number of interpolated frames plus one. For example, if the generated frames per second is 16 and the number of interpolated frames is 1, the final frames per second will be 32. If adjust_fps_for_interpolation is set to false, this value will be used as-is

resolution

Resolution of the generated video (480p, 580p, or 720p)

negative_prompt

Negative prompt for video generation

seed

Random seed for reproducibility. If None, a random seed is chosen

num_inference_steps

Number of inference steps for sampling. Higher values give better quality but take longer

guidance_scale

Classifier-free guidance scale. Higher values give better adherence to the prompt but may decrease quality

shift

Shift value for the video. Must be between 1.0 and 10.0

enable_safety_checker

If set to true, input data will be checked for safety before processing

nsfw_checker

A configurable parameter. Defaults to true in the Playground.

Output

output typevideo

Examples

Explore different use cases and parameter configurations

README

Wan 2.2 A14B API Speech to Video: Transform Audio into Stunning Videos

Elevate your digital storytelling with Wan 2.2 A14B Turbo API Speech to Video. This revolutionary AI model turns static images and audio clips into dynamic, expressive videos, perfect for creators, marketers, and educators. Available now on Kie.ai, experience seamless integration and unparalleled quality in video generation.

Get Wan 2.2 API Key

What is Wan 2.2 A14B API Speech to Video?

Wan 2.2 A14B API is an advanced open-source AI model designed for speech-to-video generation. Here's a breakdown in three key points:

Audio-Driven Animation:

It synchronizes audio inputs with visual elements, creating lifelike movements from a single image and sound clip.

High-Resolution Output:

Supports 480P - 720P resolutions, ensuring crisp, professional-grade videos for various applications.

MoE Architecture Power:

Built on a Mixture-of-Experts framework with 14 billion parameters, delivering efficient and high-fidelity results.

Key Features of Wan 2.2 A14B Speech to Video API

Echoes of Innovation:

Audio-to-Video Mastery Wan 2.2 A14B Speech to Video API transforms audio clips and static images into realistic animations with precise gestures and expressions. With advanced synchronization, it captures emotional nuances for immersive storytelling, making it ideal for cinematic content creation.

Waves of Clarity:

High-Resolution Rendering Produce crisp videos at 480P to 720P with Wan 2.2 API, supporting 24 fps for smooth playback. This ensures professional quality on standard hardware, perfect for high-definition applications in marketing and education.

Symphony of Speed:

Ultra-Fast Processing Wan 2.2 A14B API accelerates video generation with optimized inference, completing 720P clips in 20-48 seconds. Its MoE architecture boosts efficiency, allowing rapid iterations for creators under tight deadlines.

Harmony in Motion:

Advanced Lip-Sync Tech Achieve flawless audio-visual sync in Wan 2.2 A14B Turbo API Speech to Video, mapping phonemes to natural mouth and facial movements. It handles diverse accents and emotions, delivering lifelike performances across languages.

Rhythm of Customization:

LoRA Integration Customize outputs with LoRA adapters in Wan 2.2 API, enabling style-specific fine-tuning with low VRAM needs. This fosters creativity for branded or experimental videos without full model retraining.

Melody of Efficiency:

MoE Architecture Wan 2.2 A14B Speech to Video API uses a 14B parameter MoE framework for efficient generation, supporting text-to-video and image-to-video modes. It maintains frame consistency and adds bilingual overlays for scalable, resource-smart applications.

How to Use Wan 2.2 A14B API Speech to Video

Get started with our product in just a few simple steps...

Sign Up and Access API:

Prepare Inputs:

Upload a static image and audio clip, ensuring compatibility with supported formats.

Generate Video:

Use the API endpoint to submit your request, specifying resolution and parameters.

Download and Refine:

Retrieve the output video and iterate with LoRAs if needed for customization.

Frequently Asked Questions

Find answers to common questions about our service.

FAQ

What is Wan 2.2 A14B Turbo API?

FAQ

What are the model variants in Wan 2.2 14B API?

FAQ

How does Turbo mode improve video generation?

FAQ

Do I need local GPUs to run Wan 2.2 14B API?

FAQ

Can I test Wan AI API for free?

FAQ

What resolution and frame rate does Wan 2.2 A14B Turbo API support?

FAQ

How is Wan 2.2 A14B different from Wan 2.1?

FAQ