Breaking barriers in AI video generation, Meituan releases a powerful 13.6B parameter model under MIT license

The AI video generation landscape just got more exciting. Meituan, the Chinese tech giant best known for its food delivery services, has surprised the AI community by open-sourcing LongCat-Video—a foundational video generation model that’s challenging commercial solutions with its impressive capabilities and remarkably open licensing.
What Makes LongCat-Video Stand Out?
LongCat-Video isn’t just another video generation model. With 13.6 billion parameters, it delivers performance that rivals leading commercial solutions while being completely free and open for commercial use. Here’s what sets it apart:
1. Exceptional Long-Form Video Generation
The standout feature is right in the name: LongCat-Video can generate videos up to 4 minutes long. Unlike many AI video models that struggle with consistency over longer durations, LongCat-Video maintains quality without color drifting or degradation—a common problem in extended video generation.
The model achieves this through native pretraining on Video-Continuation tasks, essentially learning to create videos like producing episodes of a series. Users can generate one segment, then continue with additional prompts to build longer narratives.
2. Unified Multi-Task Architecture
Rather than requiring separate models for different tasks, LongCat-Video handles three critical functions within a single framework:
- Text-to-Video (T2V): Generate videos from text descriptions
- Image-to-Video (I2V): Animate static images
- Video-Continuation: Extend existing video clips
This unified approach means developers only need to deploy one model to access all these capabilities, significantly simplifying workflows.
3. Competitive Performance
According to Meituan’s comprehensive evaluations, LongCat-Video holds its own against top-tier models:
- Comparable to open-source leaders like Wan 2.2-T2V-A14B
- Matches commercial solutions including PixVerse-V5
- Even competes with Google’s Veo3 on certain metrics
In text-to-video evaluations, LongCat-Video achieved an overall quality score of 3.38 (out of 5), with particularly strong visual quality at 3.25. For image-to-video tasks, it scored 3.27 in visual quality, demonstrating consistent performance across different generation modes.
Technical Innovation: What’s Under the Hood?
Efficient Inference Strategy
LongCat-Video generates 720p videos at 30fps within minutes using a clever coarse-to-fine generation strategy. The model works along both temporal and spatial axes, building videos progressively rather than attempting to generate everything at once.
Block Sparse Attention further enhances efficiency, particularly crucial when working with high-resolution outputs. This means users can generate quality videos without requiring prohibitively expensive hardware.
Multi-Reward RLHF
The model leverages Group Relative Policy Optimization (GRPO) with multiple reward signals during training. This multi-reward approach helps the model balance various quality factors—text alignment, motion quality, visual fidelity—resulting in more balanced and realistic outputs.
Architecture Efficiency
With 13.6B parameters (all activated), LongCat-Video achieves results comparable to mixture-of-experts (MoE) models with 28B total parameters. This dense architecture offers better efficiency and simpler deployment while maintaining competitive performance.
Real-World Applications
The examples Meituan shared demonstrate practical versatility:
- Creative content: A woman in a white dress performing ballet on a lake surface, with realistic reflections and natural movement
- Action sequences: Skateboard tricks captured mid-air with smooth, believable physics
- E-commerce: Product demonstrations perfect for the approaching shopping season
- Narrative sequences: A 22-second clip showing a woman in a bathroom, from adjusting the mirror to washing and drying her hands—all generated through sequential prompts
Getting Started with LongCat-Video
Licensing: Truly Open Source
Perhaps the most significant aspect is Meituan’s choice of the MIT License. This means:
- Free for personal and commercial use
- No restrictions on creating commercial products
- Minimal legal overhead
- Maximum flexibility for developers and businesses
This is unusually generous for such a capable model, especially from a major commercial company.
Technical Requirements
The model is available on Hugging Face and GitHub, with comprehensive documentation for deployment. Key requirements include:
- Python 3.10
- PyTorch 2.6.0+
- FlashAttention-2 (or FlashAttention-3/xformers as alternatives)
- CUDA-compatible GPU(s)
The model supports both single-GPU and multi-GPU inference, with context parallelization available for distributed generation of longer videos.
Deployment Options
Meituan provides multiple inference scripts:
- Text-to-video generation
- Image-to-video generation
- Video continuation
- Long-video generation
- Interactive video generation
- Streamlit web interface for easier experimentation
Currently, there’s no official online demo, but the technical barrier to local deployment is relatively low for those with appropriate hardware.
Community Reception and Extensions
The AI community has already begun building on LongCat-Video. CacheDiT, for instance, has implemented cache acceleration support using DBCache and TaylorSeer, achieving nearly 1.7x speedup without noticeable quality loss.
This rapid community adoption suggests strong developer interest and hints at an ecosystem of tools and improvements emerging around the model.
Limitations and Considerations
While impressive, LongCat-Video isn’t without limitations:
- Hardware requirements: Quality video generation still requires substantial computational resources
- Not universally evaluated: The model hasn’t been tested for every possible application
- Performance variations: Like all large models, results may vary across different types of content
- No online demo yet: Users must deploy locally to experiment
Meituan notes that developers should carefully assess accuracy, safety, and fairness before deploying in sensitive scenarios, and comply with all applicable laws regarding data protection and content safety.
Bonus: Want Instant Video Generation? Try Gaga AI
While LongCat-Video offers impressive capabilities for developers and researchers, not everyone has the technical expertise or hardware to deploy open-source models locally. If you’re looking for an immediate, hassle-free video generation solution, Gaga AI Video Generator provides the perfect alternative.

Why Choose Gaga AI?
No Setup Required – Access powerful AI video generation directly through your browser. No installations, no GPU requirements, no configuration headaches.
Instant Results – Generate high-quality videos in minutes without needing to understand Python environments or CUDA versions.
User-Friendly Interface – Simply enter your text prompt or upload an image, and let Gaga AI handle the technical complexity behind the scenes.
Perfect for Non-Technical Users – Content creators, marketers, social media managers, and businesses can create professional videos without coding knowledge.
Commercial-Ready – Create videos for your business, marketing campaigns, or social media content immediately.
When to Use Each Platform
Choose LongCat-Video if you:
- Have technical expertise and GPU infrastructure
- Need complete control over model parameters
- Want to build video generation into your applications
- Require customization and fine-tuning capabilities
- Are conducting AI research or development
Choose Gaga AI if you:
- Need quick results without technical setup
- Don’t have access to powerful GPU hardware
- Prefer a web-based, intuitive interface
- Want to focus on creative content rather than technical deployment
- Need reliable, consistent video generation for business use
Get Started with Gaga AI
Ready to create stunning AI-generated videos in minutes? Visit Gaga AI Video Generator and start transforming your ideas into captivating visual content today—no technical skills required!
Final Words: A Significant Open-Source Contribution
LongCat-Video represents a substantial addition to the open-source AI video generation landscape. Its combination of long-form capability, multi-task architecture, competitive performance, and genuinely open licensing makes it an attractive option for developers, researchers, and businesses looking to integrate video generation into their products.
For those building AI applications, creating content at scale, or simply exploring the cutting edge of generative AI, LongCat-Video offers a powerful new tool without the typical restrictions of commercial models.
The model’s release raises the bar for what the open-source community can expect from video generation AI, potentially accelerating innovation across the entire field. As community tools and optimizations continue to emerge, LongCat-Video’s accessibility and capabilities position it as a foundation for the next wave of AI-powered video applications.
Resources:
- Project Homepage: LongCat-Video
- Model Download: Hugging Face
- GitHub Repository: LongCat-Video
Have you experimented with LongCat-Video? Share your experiences and projects in the comments below!







