Wan 2.7

Wan 2.7 AI Video Generator

Wan 2.7 is Alibaba's Tongyi Wanxiang video model featuring Thinking Mode — a built-in chain-of-thought reasoning layer that plans composition, subject placement, and motion logic before generating a single pixel. It supports four generation modes: text-to-video, image-to-video with first-and-last keyframe control, reference-to-video for subject consistency, and instruction-based video editing — all at up to 1080p with native audio.

Thinking Mode: chain-of-thought reasoning plans composition, subject placement, and motion logic before generation startsFour generation modes in one model: T2V, I2V with keyframe control, R2V for subject consistency, and Video EditFirst-and-last frame keyframe control — define the exact visual start and end of every transitionMulti-reference support: up to 9 reference images and video references for consistent character and object identity

Wan 2.7

Released April 2026 by Alibaba's Tongyi Lab. Thinking Mode pre-processes your prompt through chain-of-thought reasoning for more coherent compositions. Use T2V for prompts, I2V for keyframe control, R2V for subject consistency, and Video Edit for instruction-based modification.

Wan 2.7 Thinking Mode preview

Wan 2.7 reasons through your prompt before generating — producing more accurate compositions with complex multi-element scenes.

Play template video
Wan 2.7 Thinking Mode preview

Wan 2.7

Wan 2.7 Thinking Mode preview

Wan 2.7 reasons through your prompt before generating — producing more accurate compositions with complex multi-element scenes.

Wan 2.7 Thinking Mode preview 1
Wan 2.7 Thinking Mode preview 2

Wan 2.7 AI video generator features

Thinking Mode reasoning

Wan 2.7's Thinking Mode runs a chain-of-thought reasoning layer before generation begins. The model parses your prompt, plans subject placement, motion direction, camera composition, and audio cues, then verifies the plan is coherent before generating any video frames. This produces significantly more accurate compositions, fewer spatial artifacts, and stronger adherence to complex multi-subject prompts that simpler models distort.

Four unified generation modes

Wan 2.7 covers text-to-video for pure prompt-driven generation with Thinking Mode, image-to-video with first-and-last keyframe control for precise scene transitions, reference-to-video (R2V) for multi-reference subject and object consistency, and video-edit for instruction-based modification of existing clips. All four modes share the same Wan 2.7 API infrastructure and unified credit system.

First-and-last keyframe control

Upload a start frame image, an end frame image, or both to precisely define the visual boundaries of a generated clip. Wan 2.7 interpolates coherent motion between the specified frames, producing a controlled transition that honors the composition, color, and subject positions in both images. This makes it ideal for product reveals, environment transformations, and scene-to-scene cuts.

Reference-to-video subject consistency

Upload image or video references as inputs to the R2V mode. Wan 2.7 extracts character appearance, clothing color, material texture, and object identity from the references and applies them consistently throughout the generated video. Both image references and video references are supported, enabling character and product consistency across different scenes and camera angles.

Instruction-based video editing

The Video Edit mode accepts an existing source video and a natural-language instruction describing the target change. Wan 2.7 applies local edits — style transfer, color changes, object replacement, background modification — while preserving the original motion structure and temporal consistency. Add up to 5 reference images to specify the target visual appearance for the edited output.

How to use Wan 2.7

01

Select the generation mode: T2V for prompts, I2V for keyframe control, R2V for reference consistency, or Video Edit for modification

02

Write a detailed prompt — Thinking Mode will reason through it before generation, so complex multi-element prompts work particularly well

03

For I2V, upload a first frame image, last frame image, or both to set the exact visual start and end points of the clip

04

For R2V, upload reference images and videos to establish consistent subject and object appearance throughout the generated video

05

Set resolution (720p or 1080p), aspect ratio (16:9, 9:16, or 1:1), duration, and frame rate before submitting

Best Wan 2.7 use cases

Best Wan 2.7 use cases

01

Scene transitions and reveals: use first-and-last frame I2V to produce precise product reveal or environment transformation sequences

02

Character-consistent content series: use R2V with reference images to generate multiple clips featuring the same person, avatar, or product

03

Video localization and re-skinning: use Video Edit to apply new clothing, backgrounds, or color palettes to existing campaign footage

04

Complex narrative scenes: use Thinking Mode to handle multi-subject, multi-camera prompts that require strong spatial coherence

05

Audio-visual short clips: provide an audio file to drive beat-matched motion or lip-sync for music video or speaker content

06

Product catalog videos: use keyframe control to produce consistent start-and-reveal sequences for every product in a catalog

Wan 2.7 prompting tips

Write detailed multi-element prompts — Thinking Mode is optimized for complex instructions that would confuse simpler direct-generation models
Use first-and-last frame control to define scene transitions with precision, especially for product reveals or environment transformations
Provide reference images in R2V mode for each distinct subject — more reference angles give the model better material to maintain consistency
For Video Edit mode, describe the desired final state of the video rather than the change operation — positive descriptions produce cleaner results
Specify audio environment details in T2V prompts when audio generation is enabled — ambient sound, dialogue cues, and music type influence the output

How to use Wan 2.7

Use T2V mode with a detailed prompt and let Thinking Mode handle composition planning for complex multi-subject or multi-action scenes
Set first and last keyframes in I2V mode to generate a precise visual transition between two defined states — product before and after, environment change
Upload character or product reference images in R2V mode to maintain consistent appearance across generated video segments and camera angles
Use Video Edit mode to upload an existing clip and modify clothing, background, color grade, or style with a text instruction and optional reference images
Provide audio input files in WAV or MP3 format to drive lip-sync or beat-matched motion in T2V and I2V modes

Wan 2.7 FAQ

What is Thinking Mode in Wan 2.7?

Thinking Mode is a chain-of-thought reasoning layer built into Wan 2.7. Before generating any frames, the model parses your prompt, plans composition, determines subject placement and motion direction, verifies spatial coherence, and then begins generation. This produces significantly better results on complex prompts with multiple subjects, intricate scene layouts, or detailed camera instructions compared to models that generate directly from text without a planning stage.

What is the difference between I2V and R2V modes in Wan 2.7?

Image-to-video (I2V) uses keyframe images — specifically the first frame, last frame, or both — to define the visual start and end states of the clip. Reference-to-video (R2V) uses reference images and videos to establish consistent subject appearance, clothing, and object identity throughout the clip, regardless of camera angle or scene changes. I2V controls scene boundaries; R2V controls subject consistency.

How does video editing work in Wan 2.7?

Video Edit mode accepts an existing video clip and a natural-language instruction. Wan 2.7 applies the edit while preserving motion structure and temporal consistency. Edits can be local (changing a specific attribute like clothing color or product detail) or global (changing overall scene lighting or visual style). Up to 5 reference images can be provided to specify the target visual appearance for the edited output.

Does Wan 2.7 support audio generation?

Yes. Wan 2.7 supports native audio generation including lip-sync for spoken content and ambient soundscapes. You can also provide audio input files in WAV or MP3 format (3–30 seconds, up to 15MB) to drive beat-matched motion or direct lip-sync generation. Audio inputs are supported in T2V and I2V modes.

What resolution and duration does Wan 2.7 support?

Wan 2.7 generates 720p or 1080p video at 16fps or 24fps. T2V and I2V modes support clips up to 15 seconds; R2V and Video Edit modes support clips up to 10 seconds. Aspect ratios include 16:9, 9:16, and 1:1. The T2V-14B variant delivers maximum quality; T2V-1.3B Turbo offers faster generation at lower credit cost.

How does Wan 2.7 compare to HappyHorse 1.0?

Wan 2.7 brings Thinking Mode reasoning for complex compositional prompts, keyframe control for precise scene transitions, and four generation modes from one model. HappyHorse 1.0 focuses on joint audio-video generation in a single pass with native lip-sync in 7 languages and a video-edit mode supported by up to 5 reference images. Both models are available on Lovimg and serve different production workflows.