Alibaba has introduced the Wan2.1-VACE, an all-in-one AI model designed to revolutionize the video creation industry. This model combines multiple video processing functions into a single model, streamlining the process and increasing efficiency. As part of Alibaba’s Wan2.1 series, VACE is the first open-source model to provide a unified solution for various video generation and editing tasks. It supports multi-modal inputs and offers comprehensive editing capabilities, including referencing images, repainting, modifying video areas, and spatio-temporal extension.
With this advanced tool, users can generate videos containing specific interacting subjects based on image samples and bring static images to life by adding natural movement effects. They can also enjoy advanced video repainting functions such as pose transfer, motion control, depth control, and recolorization.
The model also supports adding, modifying, or deleting to selective specific areas of a video without affecting the surroundings. It also allows for the extension of video boundaries while intelligently filling in content to enrich the visual experience.
As an all-in-one AI model, Wan2.1-VACE delivers unparalleled versatility, enabling users to seamlessly combine multiple functions and unlock innovative potential. Users can turn a static image into a video while controlling the movement of objects by specifying the motion trajectory. They can seamlessly replace characters or objects with specified references, animate referenced characters, control poses, and expand a vertical image horizontally to create a horizontal video while adding new elements through referencing.
Wan2.1-VACE leverages several innovative technologies to take into account the needs of different video editing tasks during construction and design. Its unified interface, called Video Condition Unit (VCU), supports unified processing of multimodal inputs such as text, images, video, and masks.
The model employs a Context Adapter structure that injects various task concepts using formalized representations of temporal and spatial dimensions. This innovative design enables it to flexibly manage a wide range of video synthesis tasks.
Thanks to advancements in model architecture, Wan2.1-VACE can be widely applied in the rapid production of social media short videos, content creation for advertising and marketing, post-production, special effects processing in film and television, and for educational training video generation.
Training video foundation models requires immense computing resources and vast amounts of high-quality training data. Open access helps lower the barrier for more businesses to leverage AI, enabling them to create high-quality visual content tailored to their needs, quickly and cost-effectively.
Alibaba is open-sourcing the Wan2.1-VACE model in two versions: a 14-billion(B)-parameter and a 1.3-billion(B)-parameter. The models are available to download for free on Hugging Face and GitHub, as well as Alibaba Cloud’s open-source community, ModelScope.
As one of the earliest major global tech companies to open-source its self-developed large-scale AI models, Alibaba open-sourced four Wan2.1 models in February 2025 and, last month, a video generation model that supports video creation with start and end frames. To date, the models have attracted over 3.3 million downloads on Hugging Face and ModelScope.