MAGI-1: Autoregressive Video Generation at Scale

Open-Source AI Video Model By Sand AI

Meet MAGI-1, the groundbreaking open-source video generation model from Sand.ai. Leveraging diffusion and autoregression, MAGI-1 delivers exceptional quality, precise control, and real-time streaming, outperforming leading open-source alternatives like Wan-2.1, Hailuo, and HunyuanVideo.

Try MAGI-1 Now Explore Features

Leading Performance

Fully Open-Source

Autoregressive Control

Streaming Generation

Learn More

ABOUT MAGI-1

Pioneering Autoregressive AI Video Generation

Innovative Architecture, Superior Results

Autoregressive Approach

MAGI-1 utilizes an autoregressive denoising approach, generating video chunk by chunk (24 frames each). This pipelined design allows simultaneous processing of up to four chunks, enabling efficient streaming generation and seamless extension for longer videos. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling.

Advanced Physics Understanding

Thanks to its architecture, MAGI-1 exhibits a remarkable understanding of physical laws, achieving a Physics-IQ benchmark score of 56.02% – far surpassing existing models in predicting physical behavior and ensuring realistic motion.

Leading Performance & Control

Internal evaluations show MAGI-1 significantly outperforming open-source models like Wan-2.1 and HunyuanVideo. Its instruction following and action quality are highly competitive, rivaling even closed-source commercial models like Kling 1.6. MAGI-1 further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control.

DiT Architecture

MAGI-1's Diffusion Transformer architecture incorporates block-causal attention and parallel processing optimizations, enabling its powerful autoregressive capabilities while maintaining computational efficiency.

We believe MAGI-1 offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.

Fully open-source code and models make MAGI-1 a true game changer, empowering the community with state-of-the-art AI video synthesis.

🏆

Top Benchmark Scores

Leading performance vs. open-source models.

🔬

Master Quality Video

Exceptional visual fidelity and detail.

🧠

High Physics IQ

Superior understanding of physical laws.

⏱️

Efficient Streaming

Real-time video output via pipelining.

MAGI-1 Technical Features

Delving into the advanced components powering MAGI-1.

High-Performance VAE

MAGI-1 employs a powerful Transformer-based VAE (614M parameters) for efficient video compression. It achieves 8x spatial and 4x temporal compression, surpassing Wan 2.1 in reconstruction quality while offering faster average decoding time than HunyuanVideo due to early spatial downsampling via Conv3D.

Optimized Diffusion Transformer (DiT)

Built upon the DiT framework, MAGI-1 incorporates key improvements for stability, efficiency, and autoregressive modeling:

Block-Causal Attention: Tailored for autoregressive generation.
Parallel Attention Block: Enhances training efficiency.
QK-Norm & GQA: Improve stability and performance.
Sandwich Normalization & SwiGLU (FFN): Boost model effectiveness.
Softcap Modulation: Refines conditioning integration.

Utilizes T5 for text feature extraction and sinusoidal embeddings for timesteps. The 24B parameter version is currently open-sourced. (4.5B coming soon!)

Distillation Algorithm

We adopt a shortcut distillation approach training a single velocity-based model for variable inference budgets. By enforcing self-consistency (one large step ≈ two smaller steps), the model approximates flow-matching trajectories across multiple step sizes (cyclically sampled from 8). Classifier-free guidance distillation preserves conditional alignment, enabling efficient inference with minimal fidelity loss. Distilled and quantized models are available.

Model Zoo & Requirements

Access pre-trained weights for MAGI-1 and check hardware recommendations.

Model	Download Link	Recommended Hardware
MAGI-1-24B	Hugging Face	H100 / H800 * 8
MAGI-1-24B-distill	Hugging Face	H100 / H800 * 8
MAGI-1-24B-distill+fp8_quant	Hugging Face	H100 / H800 * 4 or RTX 4090 * 8
MAGI-1-4.5B	(Coming Soon - End of April)	RTX 4090 * 1
MAGI-1-VAE	Hugging Face	Included
T5	Hugging Face	Included

Note: If running the 24B model with RTX 4090 * 8, please set pp_size:2 cp_size: 4 in your configuration.

Evaluation & Benchmarks

MAGI-1 demonstrates state-of-the-art performance in various evaluations.

In-house Human Evaluation

MAGI-1 achieves state-of-the-art performance among open-source models like Wan-2.1 and HunyuanVideo, and compares favorably against closed-source models like Hailuo (i2v-01). It particularly excels in instruction following and motion quality, positioning it as a strong potential competitor to commercial models such as Kling.

Physical Evaluation (Physics-IQ)

Thanks to the natural advantages of its autoregressive architecture, MAGI-1 achieves far superior precision in predicting physical behavior on the Physics-IQ benchmark through video continuation—significantly outperforming all existing models.

Physics-IQ Benchmark Scores

Model	Phys. IQ Score ↑	Spatial IoU ↑	Spatio Temporal ↑	Weighted Spatial IoU ↑	MSE ↓
Magi (V2V)	56.02	0.367	0.270	0.304	0.005
VideoPoet (V2V)	29.50	0.204	0.164	0.137	0.010
Magi (I2V)	30.23	0.203	0.151	0.154	0.012
Kling1.6 (I2V)	23.64	0.197	0.086	0.144	0.025
VideoPoet (I2V)	20.30	0.141	0.126	0.087	0.012
Gen 3 (I2V)	22.80	0.201	0.115	0.116	0.015
Wan2.1 (I2V)	20.89	0.153	0.100	0.112	0.023
Sora (I2V)	10.00	0.138	0.047	0.063	0.030
GroundTruth	100.0	0.678	0.535	0.577	0.002

Installation & Usage

Get started with running the MAGI-1 model locally.

Environment Preparation

We provide two ways to set up your environment. Docker is recommended.

1. Docker Environment (Recommended)

docker pull sandai/magi:latest

docker run -it --gpus all --privileged --shm-size=32g \\
--name magi --net=host --ipc=host \\
--ulimit memlock=-1 --ulimit stack=6710886 \\
sandai/magi:latest /bin/bash

2. Run with Source Code

# Create a new environment
conda create -n magi python==3.10.12

# Install pytorch (adjust cuda version if needed)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# Install other dependencies
pip install -r requirements.txt

# Install ffmpeg
conda install -c conda-forge ffmpeg=4.4

# Install MagiAttention (Refer to its repo for details)
# git clone git@github.com:SandAI-org/MagiAttention.git
# cd MagiAttention
# git submodule update --init --recursive
# pip install --no-build-isolation .

Note: Ensure you have Git and Conda installed. Refer to the MagiAttention repository for specific installation details.

Inference Command

Run the MagiPipeline by modifying parameters in the example scripts:

# Run 24B MAGI-1 model
bash example/24B/run.sh

# Run 4.5B MAGI-1 model (when available)
# bash example/4.5B/run.sh

Key Parameter Descriptions

--config_file: Path to the configuration file (e.g., example/24B/24B_config.json).
--mode: Operation mode: t2v (Text to Video), i2v (Image to Video), v2v (Video to Video).
--prompt: Text prompt for generation (e.g., "Good Boy").
--image_path: Path to input image (for i2v mode).
--prefix_video_path: Path to prefix video (for v2v mode).
--output_path: Path to save the generated video.

Customizing Parameters (Example)

Modify run.sh for different modes:

# Image to Video (i2v)
--mode i2v \\
--image_path example/assets/image.jpeg \\
--prompt "Your prompt here" \\
...

# Video to Video (v2v)
--mode v2v \\
--prefix_video_path example/assets/prefix_video.mp4 \\
--prompt "Your prompt here" \\
...

Useful Configs (in `config.json`)

seed: Random seed for generation.
video_size_h / video_size_w: Video dimensions.
num_frames: Duration of generated video.
fps: Frames per second (4 video frames = 1 latent frame).
cfg_number: Use 3 for base model, 1 for distill/quant models.
load: Directory containing model checkpoint.
t5_pretrained / vae_pretrained: Paths to load T5/VAE models.

Resources & Community

Access the code, models, documentation, and connect with the community.

Technical Report

Dive deep into the architecture, methodology, and benchmark results of the MAGI-1 model.

Read Report

Open-Source Code

Explore the complete inference code, including the MagiAttention component, on GitHub.

View on GitHub

Model Weights (24B+)

Download the open-sourced 24B parameter model weights, plus distilled & quantized versions, from Hugging Face.

Get Models