{"id":1913,"date":"2026-03-12T19:18:40","date_gmt":"2026-03-12T11:18:40","guid":{"rendered":"https:\/\/gaga.art\/blog\/?p=1913"},"modified":"2026-03-12T19:18:42","modified_gmt":"2026-03-12T11:18:42","slug":"helios","status":"publish","type":"post","link":"https:\/\/gaga.art\/blog\/helios\/","title":{"rendered":"Helios AI: Real-Time 14B Video Model That Changed Everything"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"572\" src=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-1024x572.webp\" alt=\"helios\" class=\"wp-image-1914\" srcset=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-1024x572.webp 1024w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-300x167.webp 300w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-768x429.webp 768w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios.webp 1376w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"key-takeaways\" style=\"font-size:24px\"><strong>Key Takeaways<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Helios is a 14-billion-parameter autoregressive diffusion model for real-time, long-video generation \u2014 developed jointly by Peking University and ByteDance, released March 2026.<\/li>\n\n\n\n<li>It runs at 19.5 FPS on a single NVIDIA H100 GPU \u2014 matching the speed of 1.3B models while delivering 14B-level quality.<\/li>\n\n\n\n<li>It supports T2V, I2V, and V2V (text-to-video, image-to-video, video-to-video) with videos up to 1452 frames (~60 seconds at 24 FPS).<\/li>\n\n\n\n<li>It achieves all of this without KV-cache, quantization, sparse attention, or any standard long-video anti-drifting heuristics.<\/li>\n\n\n\n<li>With Group Offloading, Helios runs on as little as ~6 GB of VRAM.<\/li>\n\n\n\n<li>All code, weights, and three model variants (Base, Mid, Distilled) are open-source under Apache 2.0.<\/li>\n\n\n\n<li>Bonus at the end: How Gaga AI pairs with Helios \u2014 image-to-video, audio infusion, AI avatar, voice cloning, and TTS.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<div class=\"wp-block-rank-math-toc-block has-custom-cd-994-c-color has-text-color has-link-color wp-elements-803eb71805297e81acbfaebd3fab7f4e\" id=\"rank-math-toc\"><p>Table of Contents<\/p><nav><ul><li><a href=\"#key-takeaways\">Key Takeaways<\/a><\/li><li><a href=\"#what-is-helios\">What Is Helios?<\/a><\/li><li><a href=\"#why-helios-matters-the-problem-it-solves\">Why Helios Matters: The Problem It Solves<\/a><ul><li><a href=\"#wall-1-long-video-drift\">Wall 1: Long-Video Drift<\/a><\/li><li><a href=\"#wall-2-speed-vs-quality\">Wall 2: Speed vs. Quality<\/a><\/li><li><a href=\"#wall-3-memory-constraints\">Wall 3: Memory Constraints<\/a><\/li><\/ul><\/li><li><a href=\"#how-helios-works-architecture-and-key-innovations\">How Helios Works: Architecture and Key Innovations<\/a><ul><li><a href=\"#core-architecture\">Core Architecture<\/a><\/li><li><a href=\"#innovation-1-easy-anti-drifting-no-heuristics-required\">Innovation 1: Easy Anti-Drifting (No Heuristics Required)<\/a><\/li><li><a href=\"#innovation-2-real-time-speed-without-shortcuts\">Innovation 2: Real-Time Speed Without Shortcuts<\/a><\/li><li><a href=\"#innovation-3-training-without-parallelism-frameworks\">Innovation 3: Training Without Parallelism Frameworks<\/a><\/li><\/ul><\/li><li><a href=\"#the-three-model-variants-which-one-should-you-use\">The Three Model Variants: Which One Should You Use?<\/a><\/li><li><a href=\"#video-length-reference-frames-to-seconds\">Video Length Reference: Frames to Seconds<\/a><\/li><li><a href=\"#how-to-install-and-run-helios-step-by-step\">How to Install and Run Helios: Step-by-Step<\/a><ul><li><a href=\"#requirements\">Requirements<\/a><\/li><li><a href=\"#step-1-clone-the-repository\">Step 1 \u2014 Clone the Repository<\/a><\/li><li><a href=\"#step-2-create-the-conda-environment\">Step 2 \u2014 Create the Conda Environment<\/a><\/li><li><a href=\"#step-3-install-py-torch-choose-your-cuda-version\">Step 3 \u2014 Install PyTorch (choose your CUDA version)<\/a><\/li><li><a href=\"#step-4-install-dependencies\">Step 4 \u2014 Install Dependencies<\/a><\/li><li><a href=\"#step-5-download-model-weights\">Step 5 \u2014 Download Model Weights<\/a><\/li><li><a href=\"#step-6-run-inference\">Step 6 \u2014 Run Inference<\/a><\/li><li><a href=\"#step-7-low-vram-mode-group-offloading-6-gb\">Step 7 \u2014 Low-VRAM Mode (Group Offloading, ~6 GB)<\/a><\/li><li><a href=\"#step-8-multi-gpu-with-context-parallelism\">Step 8 \u2014 Multi-GPU with Context Parallelism<\/a><\/li><li><a href=\"#optional-use-via-diffusers-pipeline\">Optional: Use via Diffusers Pipeline<\/a><\/li><li><a href=\"#optional-use-via-v-llm-omni-or-sg-lang-diffusion\">Optional: Use via vLLM-Omni or SGLang-Diffusion<\/a><\/li><\/ul><\/li><li><a href=\"#helios-vs-competing-video-models-performance-comparison\">Helios vs. Competing Video Models: Performance Comparison<\/a><\/li><li><a href=\"#helios-training-pipeline-how-its-built\">Helios Training Pipeline: How It&#8217;s Built<\/a><ul><li><a href=\"#stage-1-base-architectural-adaptation\">Stage 1 \u2014 Base (Architectural Adaptation)<\/a><\/li><li><a href=\"#stage-2-mid-token-compression\">Stage 2 \u2014 Mid (Token Compression)<\/a><\/li><li><a href=\"#stage-3-distilled-sampling-step-reduction\">Stage 3 \u2014 Distilled (Sampling Step Reduction)<\/a><\/li><\/ul><\/li><li><a href=\"#ecosystem-and-day-0-integrations\">Ecosystem and Day-0 Integrations<\/a><\/li><li><a href=\"#common-issues-and-fixes\">Common Issues and Fixes<\/a><ul><li><a href=\"#first-chunk-appears-static-in-i-2-v-mode\">First Chunk Appears Static in I2V Mode<\/a><\/li><li><a href=\"#out-of-memory-oom-errors\">Out of Memory (OOM) Errors<\/a><\/li><li><a href=\"#videos-have-repetitive-motion\">Videos Have Repetitive Motion<\/a><\/li><li><a href=\"#non-multiple-of-33-frame-count\">Non-Multiple-of-33 Frame Count<\/a><\/li><\/ul><\/li><li><a href=\"#bonus-supercharge-helios-outputs-with-gaga-ai\">Bonus: Supercharge Helios Outputs with Gaga AI<\/a><ul><li><a href=\"#image-to-video-ai\">Image-to-Video AI<\/a><\/li><li><a href=\"#video-and-audio-infusion\">Video and Audio Infusion<\/a><\/li><li><a href=\"#ai-voice-clone-tts\">AI Voice Clone + TTS<\/a><\/li><li><a href=\"#the-complete-pipeline-helios-gaga-ai\">The Complete Pipeline: Helios + Gaga AI<\/a><\/li><\/ul><\/li><li><a href=\"#frequently-asked-questions\">Frequently Asked Questions<\/a><ul><li><a href=\"#what-is-helios-ai\">What is Helios AI?<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-helios\"><strong>What Is Helios?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios is the world&#8217;s first 14B real-time video generation model, capable of producing minute-long, high-quality videos at 19.5 FPS on a single H100 GPU \u2014 without relying on any conventional acceleration or anti-drifting techniques.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"517\" src=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-1024x517.webp\" alt=\"helios demo\" class=\"wp-image-1916\" srcset=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-1024x517.webp 1024w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-300x151.webp 300w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-768x388.webp 768w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-1536x775.webp 1536w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/03\/helios-demo-2048x1034.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Released on March 4, 2026 (arXiv:2603.04379), Helios was built by a team from Peking University, ByteDance, Canva, and Chengdu Anu Intelligence. Within 24 hours of publication, it ranked #2 Paper of the Day on Hugging Face and accumulated over 1,100 GitHub stars in the first week.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The model&#8217;s name is fitting. In Greek mythology, Helios was the god of the Sun \u2014 the source of light that moves fast, reaches far, and illuminates everything. The model lives up to the name: it is simultaneously the fastest, the most memory-efficient, and among the highest-quality open-source video generation systems available today.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where to access Helios:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\ud83d\udcc4 <a href=\"https:\/\/arxiv.org\/abs\/2603.04379\" rel=\"nofollow noopener\" target=\"_blank\">arXiv Paper: 2603.04379<\/a><\/li>\n\n\n\n<li>\ud83d\udcbb <a href=\"https:\/\/github.com\/PKU-YuanGroup\/Helios\" rel=\"nofollow noopener\" target=\"_blank\">GitHub: PKU-YuanGroup\/Helios<\/a> \u2014 1.1K stars, Apache 2.0<\/li>\n\n\n\n<li>\ud83e\udd17 <a href=\"https:\/\/huggingface.co\/spaces\/BestWishYsh\/Helios-14B-RealTime\" rel=\"nofollow noopener\" target=\"_blank\">Hugging Face Demo: Helios-14B-RealTime<\/a><\/li>\n\n\n\n<li>\ud83c\udf10 <a href=\"https:\/\/pku-yuangroup.github.io\/Helios-Page\" rel=\"nofollow noopener\" target=\"_blank\">Project Page with Video Demos<\/a><\/li>\n\n\n\n<li>\ud83c\udfa5 <a href=\"https:\/\/www.youtube.com\/watch?v=vd_AgHtOUFQ\" rel=\"nofollow noopener\" target=\"_blank\">Official Demo Video<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why-helios-matters-the-problem-it-solves\"><strong>Why Helios Matters: The Problem It Solves<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Before Helios, real-time long-video generation was considered a contradiction in terms \u2014 you could have speed or quality or length, but not all three simultaneously.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Existing video generation models faced three hard walls:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"wall-1-long-video-drift\" style=\"font-size:24px\"><strong>Wall 1: Long-Video Drift<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The longer a video, the more likely an autoregressive model &#8220;drifts&#8221; \u2014 characters change appearance, motion becomes repetitive, and scene coherence breaks down. Prior solutions (self-forcing, error-banks, keyframe sampling, inverted sampling) added complexity without fully solving the root problem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"wall-2-speed-vs-quality\" style=\"font-size:24px\"><strong>Wall 2: Speed vs. Quality<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Large models (14B parameters) were painfully slow. Acceleration techniques like KV-cache, sparse attention, and quantization helped, but came with quality trade-offs. Smaller distilled models (1.3B) were fast but couldn&#8217;t match the visual fidelity of larger ones.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"wall-3-memory-constraints\" style=\"font-size:24px\"><strong>Wall 3: Memory Constraints<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Training and running 14B models required expensive multi-GPU setups with parallelism frameworks (FSDP, Megatron), making experimentation inaccessible to most researchers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios breaks all three walls simultaneously \u2014 that&#8217;s what makes it a genuine research breakthrough.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-helios-works-architecture-and-key-innovations\"><strong>How Helios Works: Architecture and Key Innovations<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios is a 14B autoregressive diffusion model with a unified input representation that natively handles T2V, I2V, and V2V tasks through a three-stage progressive training pipeline.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"core-architecture\" style=\"font-size:24px\"><strong>Core Architecture<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Component<\/strong><\/td><td><strong>Description<\/strong><\/td><\/tr><tr><td>Model type<\/td><td>Autoregressive Diffusion Model<\/td><\/tr><tr><td>Parameters<\/td><td>14 Billion<\/td><\/tr><tr><td>Input representation<\/td><td>Unified \u2014 handles text, image, and video prompts in one architecture<\/td><\/tr><tr><td>Context compression<\/td><td>Heavy compression of historical and noisy context<\/td><\/tr><tr><td>Sampling steps (Base)<\/td><td>50 steps with HeliosScheduler<\/td><\/tr><tr><td>Sampling steps (Distilled)<\/td><td>3 steps with HeliosDMDScheduler<\/td><\/tr><tr><td>Frame chunk size<\/td><td>33 frames per autoregressive chunk<\/td><\/tr><tr><td>Max length<\/td><td>1452 frames (~60s at 24 FPS)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"innovation-1-easy-anti-drifting-no-heuristics-required\" style=\"font-size:24px\"><strong>Innovation 1: Easy Anti-Drifting (No Heuristics Required)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios solves long-video drift by explicitly simulating drifting conditions during training itself \u2014 rather than patching it with post-hoc strategies.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of applying self-forcing, error-banks, or keyframe sampling at inference, Helios uses:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Unified History Injection \u2014 feeds compressed historical context directly into the model&#8217;s forward pass<\/li>\n\n\n\n<li>Easy Anti-Drifting \u2014 trains the model on deliberately degraded context so it learns to recover<\/li>\n\n\n\n<li>Multi-Term Memory Patchification \u2014 encodes long-range temporal relationships without quadratic memory cost<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The result: Helios generates 1452-frame videos with strong temporal coherence that competing models can&#8217;t match \u2014 even with their heuristics enabled.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"innovation-2-real-time-speed-without-shortcuts\" style=\"font-size:24px\"><strong>Innovation 2: Real-Time Speed Without Shortcuts<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios achieves 19.5 FPS on a single H100 without KV-cache, sparse attention, causal masking, TinyVAE, quantization, or any other standard acceleration trick.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, efficiency comes from:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pyramid Unified Predictor Corrector (PUPC) \u2014 aggressively reduces the number of noisy tokens per step, cutting compute at the architecture level rather than through approximation<\/li>\n\n\n\n<li>Context token compression \u2014 historical frames are compressed to a fraction of their original token count before being fed back into the model<\/li>\n\n\n\n<li>Reduced sampling steps \u2014 the distilled version uses only 3 steps (vs. 50 for the base), eliminating the need for classifier-free guidance (CFG) entirely<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The computational cost ends up comparable to or lower than 1.3B models \u2014 at 14B parameter quality.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"innovation-3-training-without-parallelism-frameworks\" style=\"font-size:24px\"><strong>Innovation 3: Training Without Parallelism Frameworks<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios trains without FSDP, Megatron, or any tensor\/pipeline parallelism \u2014 enabling image-diffusion-scale batch sizes while fitting up to four 14B models in 80 GB of GPU memory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is achieved through:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Infrastructure-level memory optimizations specific to autoregressive diffusion<\/li>\n\n\n\n<li>A three-stage progressive training pipeline that avoids the need for large distributed clusters<\/li>\n\n\n\n<li>Group Offloading \u2014 at inference time, reduces VRAM to approximately 6 GB<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-three-model-variants-which-one-should-you-use\"><strong>The Three Model Variants: Which One Should You Use?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios ships as three checkpoints optimised for different use cases.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Model<\/strong><\/td><td><strong>Best For<\/strong><\/td><td><strong>Scheduler<\/strong><\/td><td><strong>Sampling Steps<\/strong><\/td><td><strong>Quality<\/strong><\/td><\/tr><tr><td>Helios-Base<\/td><td>Highest visual quality, research<\/td><td>HeliosScheduler (v-pred + CFG)<\/td><td>50<\/td><td>\u2b50\u2b50\u2b50\u2b50\u2b50<\/td><\/tr><tr><td>Helios-Mid<\/td><td>Intermediate checkpoint<\/td><td>CFG-Zero* + HeliosScheduler<\/td><td>Between Base and Distilled<\/td><td>\u2b50\u2b50\u2b50\u2b50<\/td><\/tr><tr><td>Helios-Distilled<\/td><td>Speed and efficiency priority<\/td><td>HeliosDMDScheduler (x0-pred)<\/td><td>3<\/td><td>\u2b50\u2b50\u2b50\u2b50<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udca1 All three share the same architecture. Helios-Mid is an intermediate training checkpoint and may not always meet quality expectations. For most users, Helios-Base for quality or Helios-Distilled for speed is the right choice.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"video-length-reference-frames-to-seconds\"><strong>Video Length Reference: Frames to Seconds<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios generates in chunks of 33 frames. Set num_frames to a multiple of 33 for optimal performance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Target Length<\/strong><\/td><td><strong>Num Frames (Adjusted)<\/strong><\/td><td><strong>At 24 FPS<\/strong><\/td><td><strong>At 16 FPS<\/strong><\/td><\/tr><tr><td>~5 seconds<\/td><td>132 (33\u00d74)<\/td><td>5.5s<\/td><td>8s<\/td><\/tr><tr><td>~11 seconds<\/td><td>264 (33\u00d78)<\/td><td>11s<\/td><td>16s<\/td><\/tr><tr><td>~30 seconds<\/td><td>726 (33\u00d722)<\/td><td>30s<\/td><td>45s<\/td><\/tr><tr><td>~60 seconds<\/td><td>1452 (33\u00d744)<\/td><td>60s<\/td><td>90s<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-to-install-and-run-helios-step-by-step\"><strong>How to Install and Run Helios: Step-by-Step<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios runs via a conda environment on CUDA 12.6, 12.8, or 13.0. With Group Offloading, it requires as little as ~6 GB of VRAM.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"requirements\" style=\"font-size:24px\"><strong>Requirements<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Python 3.11.2<\/li>\n\n\n\n<li>PyTorch 2.10.0<\/li>\n\n\n\n<li>CUDA 12.6 \/ 12.8 \/ 13.0<\/li>\n\n\n\n<li>NVIDIA GPU (H100 recommended; consumer GPUs work with Group Offloading)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-clone-the-repository\" style=\"font-size:24px\"><strong>Step 1 \u2014 Clone the Repository<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">git clone &#8211;depth=1 https:\/\/github.com\/PKU-YuanGroup\/Helios.git<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">cd Helios<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-create-the-conda-environment\" style=\"font-size:24px\"><strong>Step 2 \u2014 Create the Conda Environment<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">conda create -n helios python=3.11.2<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">conda activate helios<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-install-py-torch-choose-your-cuda-version\" style=\"font-size:24px\"><strong>Step 3 \u2014 Install PyTorch (choose your CUDA version)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># CUDA 12.6<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 \\<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&#8211;index-url https:\/\/download.pytorch.org\/whl\/cu126<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># CUDA 12.8<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 \\<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&#8211;index-url https:\/\/download.pytorch.org\/whl\/cu128<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-install-dependencies\" style=\"font-size:24px\"><strong>Step 4 \u2014 Install Dependencies<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash install.sh<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-5-download-model-weights\" style=\"font-size:24px\"><strong>Step 5 \u2014 Download Model Weights<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install &#8220;huggingface_hub[cli]&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># Download Helios-Base (best quality)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">huggingface-cli download BestWishYSH\/Helios-Base &#8211;local-dir BestWishYSH\/Helios-Base<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># Download Helios-Distilled (fastest)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">huggingface-cli download BestWishYSH\/Helios-Distilled &#8211;local-dir BestWishYSH\/HeliosDistilled<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-6-run-inference\" style=\"font-size:24px\"><strong>Step 6 \u2014 Run Inference<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Text-to-Video:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">cd scripts\/inference<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-base_t2v.sh &nbsp; &nbsp; &nbsp; # Base model<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-distilled_t2v.sh&nbsp; # Distilled model<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Image-to-Video:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-base_i2v.sh<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-distilled_i2v.sh<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Video-to-Video:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-base_v2v.sh<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash helios-distilled_v2v.sh<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-7-low-vram-mode-group-offloading-6-gb\" style=\"font-size:24px\"><strong>Step 7 \u2014 Low-VRAM Mode (Group Offloading, ~6 GB)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For consumer GPUs or machines with limited memory, enable Group Offloading in your inference script:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># Add this parameter when loading the pipeline<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pipe.enable_group_offload()<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This drops peak VRAM from ~80 GB to approximately 6 GB, at the cost of slower inference due to CPU\u27f7GPU data movement.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-8-multi-gpu-with-context-parallelism\" style=\"font-size:24px\"><strong>Step 8 \u2014 Multi-GPU with Context Parallelism<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios supports Ulysses Attention, Ring Attention, Unified Attention, and Ulysses Anything Attention for multi-GPU distribution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Example for 4 GPUs:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">bash scripts\/inference\/helios-base_cp4gpu.sh<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"optional-use-via-diffusers-pipeline\" style=\"font-size:24px\"><strong>Optional: Use via Diffusers Pipeline<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install git+https:\/\/github.com\/huggingface\/diffusers.git<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then use the standard DiffusionPipeline interface \u2014 Helios has Day-0 Diffusers support.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"optional-use-via-v-llm-omni-or-sg-lang-diffusion\" style=\"font-size:24px\"><strong>Optional: Use via vLLM-Omni or SGLang-Diffusion<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Both frameworks support Helios natively for production serving at scale:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># vLLM-Omni<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install git+https:\/\/github.com\/vllm-project\/vllm-omni.git<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"># SGLang-Diffusion<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">pip install git+<a href=\"https:\/\/github.com\/sgl-project\/sglang.git\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/github.com\/sgl-project\/sglang.git<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"helios-vs-competing-video-models-performance-comparison\"><strong>Helios vs. Competing Video Models: Performance Comparison<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios outperforms all comparable models on both short- and long-video benchmarks while running at speeds previously reserved for sub-2B parameter models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Model<\/strong><\/td><td><strong>Parameters<\/strong><\/td><td><strong>FPS (Single H100)<\/strong><\/td><td><strong>Long Video Quality<\/strong><\/td><td><strong>Anti-Drift Strategy<\/strong><\/td><\/tr><tr><td>Helios<\/td><td>14B<\/td><td>19.5<\/td><td>\u2b50\u2b50\u2b50\u2b50\u2b50<\/td><td>Architecture-level<\/td><\/tr><tr><td>SkyReels V2 DF 14B<\/td><td>14B<\/td><td>Slower<\/td><td>\u2b50\u2b50\u2b50\u2b50<\/td><td>Standard heuristics<\/td><\/tr><tr><td>SkyReels V2 DF 1.3B<\/td><td>1.3B<\/td><td>~19 FPS<\/td><td>\u2b50\u2b50\u2b50<\/td><td>Standard heuristics<\/td><\/tr><tr><td>CausVid<\/td><td>\u2014<\/td><td>Moderate<\/td><td>\u2b50\u2b50\u2b50<\/td><td>Self-forcing<\/td><\/tr><tr><td>Self Forcing<\/td><td>\u2014<\/td><td>Moderate<\/td><td>\u2b50\u2b50\u2b50<\/td><td>Self-forcing<\/td><\/tr><tr><td>MAGI-1<\/td><td>\u2014<\/td><td>Slow<\/td><td>\u2b50\u2b50\u2b50\u2b50<\/td><td>Keyframe sampling<\/td><\/tr><tr><td>Pyramid Flow<\/td><td>\u2014<\/td><td>Slow<\/td><td>\u2b50\u2b50\u2b50<\/td><td>Progressive pyramid<\/td><\/tr><tr><td>InfinityStar<\/td><td>\u2014<\/td><td>Slow<\/td><td>\u2b50\u2b50\u2b50<\/td><td>Error banks<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The benchmark data is reproducible via HeliosBench \u2014 a specialized evaluation framework included in the repository for assessing real-time long-video generation models.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"helios-training-pipeline-how-its-built\"><strong>Helios Training Pipeline: How It&#8217;s Built<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios uses a three-stage progressive training pipeline that converts a bidirectional pretrained model into a fully autoregressive, distilled video generator.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"stage-1-base-architectural-adaptation\" style=\"font-size:24px\"><strong>Stage 1 \u2014 Base (Architectural Adaptation)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply Unified History Injection to enable autoregressive context conditioning<\/li>\n\n\n\n<li>Apply Easy Anti-Drifting to simulate and correct drift at training time<\/li>\n\n\n\n<li>Apply Multi-Term Memory Patchification for efficient long-range temporal modeling<\/li>\n\n\n\n<li>Output: Helios-Base<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"stage-2-mid-token-compression\" style=\"font-size:24px\"><strong>Stage 2 \u2014 Mid (Token Compression)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Introduce Pyramid Unified Predictor Corrector (PUPC) to aggressively reduce noisy token count<\/li>\n\n\n\n<li>Reduce computation while preserving output quality<\/li>\n\n\n\n<li>Apply CFG-Zero* for improved guidance efficiency<\/li>\n\n\n\n<li>Output: Helios-Mid<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"stage-3-distilled-sampling-step-reduction\" style=\"font-size:24px\"><strong>Stage 3 \u2014 Distilled (Sampling Step Reduction)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply Adversarial Hierarchical Distillation to reduce sampling from 50 steps to 3<\/li>\n\n\n\n<li>Eliminate classifier-free guidance (CFG) entirely<\/li>\n\n\n\n<li>Apply dynamic shifting across all timestep-dependent operations<\/li>\n\n\n\n<li>Output: Helios-Distilled<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ecosystem-and-day-0-integrations\"><strong>Ecosystem and Day-0 Integrations<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios launched with immediate support across all major AI inference and training frameworks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On March 4, 2026 (Day 0), the following integrations went live:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u2705 Diffusers (Hugging Face) \u2014 full pipeline support<\/li>\n\n\n\n<li>\u2705 SGLang-Diffusion \u2014 end-to-end unified pipeline with optimized kernels<\/li>\n\n\n\n<li>\u2705 vLLM-Omni \u2014 fully disaggregated serving for production deployment<\/li>\n\n\n\n<li>\u2705 Ascend-NPU (Huawei) \u2014 runs at ~10 FPS on Ascend hardware<\/li>\n\n\n\n<li>\u2705 Cache-DiT \u2014 hybrid cache acceleration and parallelism (added March 6, 2026)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additional updates post-launch:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>March 8, 2026: Full Group Offloading (~6 GB VRAM) and Context Parallelism support released<\/li>\n\n\n\n<li>March 6, 2026: Official Gradio Demo released on Hugging Face Spaces<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"common-issues-and-fixes\"><strong>Common Issues and Fixes<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"first-chunk-appears-static-in-i-2-v-mode\" style=\"font-size:24px\"><strong>First Chunk Appears Static in I2V Mode<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cause: Image-to-video training is based on text-to-video conditioning, so the first chunk sometimes remains too close to the input frame.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fix: Enable is_skip_first_chunk=True in the inference config, or increase image_noise_sigma_min and image_noise_sigma_max to inject more diversity into the first generation unit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"out-of-memory-oom-errors\" style=\"font-size:24px\"><strong>Out of Memory (OOM) Errors<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cause: Default inference uses full GPU memory allocation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fix: Enable Group Offloading via pipe.enable_group_offload(). This reduces peak VRAM to ~6 GB at the cost of inference speed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"videos-have-repetitive-motion\" style=\"font-size:24px\"><strong>Videos Have Repetitive Motion<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cause: The model&#8217;s anti-drifting strategy suppresses error accumulation, but prompt quality affects motion variety significantly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fix: Use more specific, dynamic motion descriptions in your text prompt. Avoid vague prompts like &#8220;a person walking&#8221; \u2014 instead use &#8220;a person walking briskly through a crowded Tokyo street at rush hour, camera tracking from the side.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"non-multiple-of-33-frame-count\" style=\"font-size:24px\"><strong>Non-Multiple-of-33 Frame Count<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cause: Helios generates in chunks of exactly 33 frames.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fix: Always set num_frames to a multiple of 33. Non-multiple values are automatically rounded up, but this wastes compute. Use 132, 264, 726, or 1452 for clean output.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"bonus-supercharge-helios-outputs-with-gaga-ai\"><strong>Bonus: Supercharge Helios Outputs with Gaga AI<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios generates the video. <a href=\"https:\/\/gaga.art\/en\/\">Gaga AI<\/a> adds the voice, the face, the audio, and the narrative intelligence \u2014 turning a raw video generation into a complete production.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"623\" src=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-1024x623.webp\" alt=\"gaga ai video generation\" class=\"wp-image-1426\" srcset=\"https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-1024x623.webp 1024w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-300x183.webp 300w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-768x467.webp 768w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-1536x935.webp 1536w, https:\/\/gaga.art\/blog\/wp-content\/uploads\/2026\/02\/gaga-ai-video-generation-2048x1246.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s how the two tools pair:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"image-to-video-ai\" style=\"font-size:24px\"><strong>Image-to-Video AI<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/gaga.art\/en\/image-to-video-ai\">Gaga AI&#8217;s image-to-video engine<\/a> animates a still image into a motion video clip using a text-directed motion prompt \u2014 complementing Helios&#8217;s I2V mode with a simpler, browser-based interface that requires no GPU setup.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use together: Generate a base scene with Helios at cinematic quality, then use Gaga AI&#8217;s image-to-video to extend specific frames into focused close-up sequences or alternate camera angles.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Best for: Product demos, portrait animation, social content at scale.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-3e41869c wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"http:\/\/gaga.art\/app\" target=\"_blank\" rel=\"noreferrer noopener\">Generate Video Free<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/gaga.art\/\">Learn Gaga AI<\/a><\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"video-and-audio-infusion\" style=\"font-size:24px\"><strong>Video and Audio Infusion<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI analyzes the visual content of a video and generates synchronized audio \u2014 ambient sound, music, environmental effects \u2014 matched to what the camera sees.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios outputs visually rich but silent video. Gaga AI&#8217;s audio infusion layer reads the scene and generates a matching soundscape automatically.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Best for: Nature scenes, cinematic shorts, branded content, e-commerce product videos.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"GAGA-1: The Holistic AI Actor. Voice, Lipsync, and Performance as One | Gaga AI\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/bpblbmmkc78?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>AI Avatar<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI generates a photorealistic, lip-synced talking avatar from a reference photo and any audio input \u2014 giving your Helios-generated environments a human presenter without filming anyone.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pipeline:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Generate a visually rich background scene with Helios (T2V or I2V)<\/li>\n\n\n\n<li>Commission a Gaga AI avatar to present in front of that background<\/li>\n\n\n\n<li>Composite using Gaga AI&#8217;s output transparency layer<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Best for: Training videos, multilingual content, AI spokesperson series.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"ai-voice-clone-tts\" style=\"font-size:24px\"><strong>AI Voice Clone + TTS<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI clones a speaker&#8217;s voice from a 30-second audio sample and converts any text to speech in that voice \u2014 enabling scalable narration in multiple languages without re-recording.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Workflow:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Record 30\u201360 seconds of the target voice<\/li>\n\n\n\n<li>Clone it inside Gaga AI<\/li>\n\n\n\n<li>Generate narration from a text script<\/li>\n\n\n\n<li>Layer narration over your Helios video<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Best for: Multilingual campaigns, consistent brand voice, AI-driven content pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-complete-pipeline-helios-gaga-ai\" style=\"font-size:24px\"><strong>The Complete Pipeline: Helios + Gaga AI<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Text Prompt \/ Reference Image<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios \u2500\u2500 Real-time video (T2V \/ I2V \/ V2V) at 19.5 FPS<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI Audio Infusion \u2500\u2500 Synchronized ambient sound + music<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI AI Avatar \u2500\u2500 On-screen presenter in front of Helios scene<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gaga AI Voice Clone + TTS \u2500\u2500 Multilingual narration in cloned voice<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Final Video \u2500\u2500 Studio-quality, fully produced, ready to publish<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"frequently-asked-questions\"><strong>Frequently Asked Questions<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-is-helios-ai\" style=\"font-size:24px\"><strong>What is Helios AI?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios is a 14-billion-parameter autoregressive diffusion model for real-time long video generation, jointly developed by Peking University and ByteDance. It runs at 19.5 FPS on a single NVIDIA H100 GPU and supports text-to-video, image-to-video, and video-to-video tasks with videos up to 60 seconds long. The paper was published on arXiv as 2603.04379 on March 4, 2026.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"how-fast-is-helios-compared-to-other-video-models\" style=\"font-size:24px\"><strong>How fast is Helios compared to other video models?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios runs at 19.5 FPS end-to-end on a single H100 GPU \u2014 matching the speed of 1.3B distilled models while operating at 14B parameter scale. It achieves this without KV-cache, quantization, sparse attention, or any other standard acceleration technique.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"is-helios-open-source\" style=\"font-size:24px\"><strong>Is Helios open source?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Yes. Helios is fully open source under the Apache 2.0 license. The code, three model checkpoints (Base, Mid, Distilled), training scripts, and HeliosBench evaluation framework are all available on GitHub at PKU-YuanGroup\/Helios.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-gpu-do-i-need-to-run-helios\" style=\"font-size:24px\"><strong>What GPU do I need to run Helios?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios was tested on NVIDIA H100 GPUs. With Group Offloading enabled, it can run on consumer GPUs with as little as ~6 GB of VRAM. Without Group Offloading, a GPU with 80 GB of memory (H100\/A100-class) is required for the full model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-is-the-difference-between-helios-base-helios-mid-and-helios-distilled\" style=\"font-size:24px\"><strong>What is the difference between Helios-Base, Helios-Mid, and Helios-Distilled?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios-Base uses 50 sampling steps and produces the highest visual quality. Helios-Distilled uses only 3 sampling steps (via adversarial hierarchical distillation) for maximum speed, with only a modest quality trade-off. Helios-Mid is an intermediate training checkpoint that falls between the two in both speed and quality.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"how-does-helios-solve-the-long-video-drift-problem\" style=\"font-size:24px\"><strong>How does Helios solve the long-video drift problem?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios solves drift at the architecture and training level rather than through inference-time heuristics. It uses Unified History Injection (feeding compressed historical context into the model), Easy Anti-Drifting (training on deliberately degraded context to teach recovery), and Multi-Term Memory Patchification (efficient long-range temporal encoding). This eliminates the need for self-forcing, error-banks, or keyframe sampling.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-tasks-does-helios-support\" style=\"font-size:24px\"><strong>What tasks does Helios support?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios natively supports three video generation tasks through a unified input representation: Text-to-Video (T2V), Image-to-Video (I2V), and Video-to-Video (V2V). An experimental Interactive Video mode is also available in the repository.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"how-long-can-videos-be-in-helios\" style=\"font-size:24px\"><strong>How long can videos be in Helios?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios can generate videos up to 1452 frames, which equals approximately 60 seconds at 24 FPS or 90 seconds at 16 FPS. Videos are generated in chunks of 33 frames, so the total frame count should be a multiple of 33.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"can-i-train-my-own-version-of-helios\" style=\"font-size:24px\"><strong>Can I train my own version of Helios?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Yes. The full three-stage training pipeline (Base \u2192 Mid \u2192 Distilled) is included in the repository along with configuration files, data preparation guides, DDP training scripts, and DeepSpeed training scripts. A toy training dataset is also provided for testing the pipeline.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"who-built-helios\" style=\"font-size:24px\"><strong>Who built Helios?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Helios was built by Shenghai Yuan (Peking University \/ ByteDance), Yuanyang Yin (ByteDance \/ Chengdu Anu Intelligence), Zongjian Li (Peking University), Xinwei Huang (ByteDance), Xiao Yang (Canva), and Li Yuan (Peking University). The project lead is Li Yuan (\u2020) and Xiao Yang (\u00a7).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Helios is the first 14B AI video model running at 19.5 FPS on one H100 \u2014 generating minute-long videos in real time. Free to use. Full guide inside.<\/p>\n","protected":false},"author":2,"featured_media":1914,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,10],"tags":[],"class_list":["post-1913","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-p-r","category-video"],"_links":{"self":[{"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/posts\/1913","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/comments?post=1913"}],"version-history":[{"count":2,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/posts\/1913\/revisions"}],"predecessor-version":[{"id":1917,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/posts\/1913\/revisions\/1917"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/media\/1914"}],"wp:attachment":[{"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/media?parent=1913"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/categories?post=1913"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gaga.art\/blog\/wp-json\/wp\/v2\/tags?post=1913"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}