Seedance 2.0 Text-to-Video

Seedance 2.0 Text-to-Video is a cinematic AI video generation model that transforms text prompts into production-grade videos with native synchronized audio, realistic motion, and director-level creative control.

Overview

Seedance 2.0 Text-to-Video is built on a unified multimodal architecture capable of handling text, image, audio, and video inputs within a single creative workflow. The model generates cinematic videos directly from text prompts while automatically synchronizing dialogue, ambience, music, and visual motion.

It is optimized for filmmakers, advertisers, content creators, and production teams seeking high-end AI-generated cinematic content with smooth motion stability and strong prompt adherence.

Why it looks great

Unified multimodal architecture: Supports text, image, audio, and video-guided generation workflows.
Native audio-visual synchronization: Automatically generates synchronized sound, music, ambience, and visuals together.
Director-level creative control: Supports cinematic instructions for lighting, camera movement, atmosphere, and performance.
Production-grade cinematic quality: Generates dramatic lighting, film-like color grading, and realistic motion.
Exceptional motion stability: Maintains coherent subject movement, scene continuity, and fluid transitions.
Strong prompt understanding: Follows detailed cinematic scene descriptions with high consistency.

Limits and Performance

Supported durations: 4–15 seconds
Supported resolutions: 480p, 720p, 1080p
Supported aspect ratios: 16:9, 9:16, 4:3, 3:4, 1:1, 21:9
Reference image support: Yes
Reference video support: Yes
Reference audio support: Yes
Native audio generation: Included
Best for: Cinematic storytelling, commercials, music videos, and premium social content

Pricing

depends on resolution, duration, and whether reference videos are used.

Resolution	Duration	Without Reference Videos	With Reference Videos
480p	5s	$0.60	$1.20
480p	10s	$1.20	$2.40
480p	15s	$1.80	$3.60
720p	5s	$1.20	$2.40
720p	10s	$2.40	$4.80
720p	15s	$3.60	$7.20
1080p	5s	$3.00	$6.00
1080p	10s	$6.00	$12.00
1080p	15s	$9.00	$18.00

Billing Rule

480p is the base pricing tier.
720p costs 2× the 480p rate.
1080p costs 5× the 480p rate.
Reference video workflows double the standard generation price.
Pricing scales continuously between 4 and 15 seconds.

How to Use

Write a cinematic text prompt describing the scene, action, lighting, and atmosphere.
Select the desired aspect ratio and output resolution.
Choose the target video duration between 4 and 15 seconds.
Optionally upload reference images, videos, or audio clips.
Submit the generation request.
Preview and download the generated video with synchronized audio.

Input Parameters

Parameter	Required	Description
prompt	Yes	Cinematic description of the desired video
aspect_ratio	No	Output format such as 16:9, 9:16, or 21:9
duration	No	Video length from 4 to 15 seconds
resolution	No	Output resolution: 480p, 720p, or 1080p
reference_images	No	Reference images for style or character guidance
reference_videos	No	Reference videos for motion or scene guidance
reference_audios	No	Reference audio clips for synchronization guidance

Output Format

MP4 cinematic video output
Native synchronized audio generation
Film-style lighting and motion rendering
Portrait, landscape, square, and cinematic aspect ratios
Production-ready short-form video content

Pro tips for best quality

Write prompts like a film director with detailed camera and lighting instructions.
Use cinematic phrases such as “dramatic rim lighting”, “tracking shot”, or “golden-hour atmosphere”.
Keep scenes focused on a single action or emotional moment for more coherent motion.
Use 16:9 for cinematic storytelling and 9:16 for premium vertical content.
Start with shorter clips to refine visual style before generating longer scenes.
Include detailed character expressions and environmental descriptions for richer outputs.

Note

Native audio generation is included automatically. Videos are optimized for short-form cinematic storytelling and support continuous durations between 4 and 15 seconds.

Input

Output

Examples

Bytedance Seedance-2.0 Text-to-video text-to-video model

README

Seedance 2.0 Text-to-Video

Overview

Why it looks great

Limits and Performance

Pricing

Pricing

Billing Rule

How to Use

Input Parameters

Output Format

Pro tips for best quality

Note