What should every video prompt include?
Effective prompts contain two essential parts: a visual description and a motion description. The visual description covers what we see—subjects, environment, lighting, composition and style—while the motion description covers how things move, including the subject’s action, camera movement, motion style and timing. These two together guide the AI toward your desired look and feel.
How should I structure my prompt?
Use this simple template:
[Camera shot] of [subject] [action] in [environment].
[Style + Motion + Lighting details]
For example:
"Medium shot of a mechanic in gray overalls repairing a motorcycle engine in a garage with LED lights and vintage posters. Hand‑held camera slowly circles left while warm, cinematic lighting creates deep shadows."
This format makes prompts easy to understand and edit.
Do I need to specify everything?
No—start with the most critical elements and iterate. Leaving some details open allows the model to fill in creative gaps. However, for precision, clearly describe important objects and motions and avoid mixing conflicting descriptors
Avoid:
fast-paced action” and “dreamy slow motion” in the same scene
How do I describe camera and object motion?
Be explicit. Use phrases like “smooth forward tracking shot,” “slow upward tilt,” or “360° rotation”. For moving subjects, describe the action and timing (e.g., “A golden retriever chases a blue frisbee across a green yard; the camera starts at eye level, then slowly rises to reveal the yard”).
What style details should I include?
Include:
Lighting: natural, studio, dramatic, soft.
Color palette: warm, cool, vibrant, muted.
Visual style: cinematic, commercial, documentary, artistic.
Camera angle: wide shot, close-up, aerial view.
Time of day: morning, sunset, night.
Stick to one consistent style; mixing styles can confuse the AI
Should I use keywords or natural language?
Both can work, but natural language usually provides more control because it conveys context. Keywords are useful for broad direction, but full sentences help the AI understand relationships between elements and actions.
How long should my video prompt be?
There’s no perfect length—clarity matters more than word count. Longer prompts may be necessary for complex scenes with multiple motions, but avoid overly long sentences that could create conflicting instructions.
What common mistakes should I avoid?
Lacking detail: Vague prompts like “a dog in a yard” produce generic outputs; add specifics (“A golden retriever chases a blue frisbee across a green backyard with butterflies fluttering around”).
Unclear motion instructions: Phrases like “make it move” don’t tell the AI how to animate; specify camera and subject movements.
Conflicting styles: Don’t mix opposing styles (e.g., “fast‑paced action” with “dreamy slow motion”).
Ambiguous adjectives: Replace subjective terms (“beautiful,” “nice”) with concrete descriptors (“soft, warm backlighting with subtle shadows”)
Can I reuse prompts across different video tools?
Yes. The core elements—subject, environment, motion and style—apply across Image‑to‑Video, Video‑to‑Video, Text‑to‑Video and Frames‑to‑Video. Adjust details depending on the tool:
For example:
Scene: White ceramic mug on a marble countertop in a modern kitchen with morning light.
Style: Soft side lighting with warm beige tones.
Motion: The camera slowly rotates 360° around the mug, pausing briefly at the handle
Image‑to‑Video: Focus on describing camera movement and style; the image already provides subject/environment.
Video‑to‑Video: Describe how you want to modify or restyle existing footage.
Best for changing different styles, such as anime, lego, pixel etc.
Text‑to‑Video: Combine full scene descriptions with detailed motion instructions for complex narratives.
Frames‑to‑Video: Write individual prompts for each transition between frames.
Best for short ad videos.
What language can I use for prompts?
You can write prompts in any language, and we will still generate results. However, we recommend using English whenever possible, as it generally provides the most accurate understanding, richer details, and more consistent outputs—especially for complex scenes, motion descriptions, and style instructions.





