# VideoRAG **Repository Path**: soon14/VideoRAG ## Basic Information - **Project Name**: VideoRAG - **Description**: https://github.com/HKUDS/VideoRAG - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-11 - **Last Updated**: 2026-01-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Vimo: Chat with Your Videos

VideoRAG: Chat with Your Videos โ€ข Vimo Desktop

HKUDS%2FVideoRAG | Trendshift [![Blog](https://img.shields.io/badge/Blog-LearnOpenCV-blue?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJYAAACWCAMAAAAL34HQAAAAilBMVEVHcEwuLi4qKio3NzdQUFBjY2NoaGhiYmJ8fHx4eHiGhoabm5t1dXW2tranp6eOjo7Pz8/////9/f35+fn29vbz8/Pv7+/t7e3q6urn5+fj4+Pg4OA4svIyrfAsp+0noesinekemOcalOUchMVjZGQXaZxOT08OT3oMPFwvMTIONEoKIS8OFx0DAwPBWB/1AAAAEXRSTlMACxw5ZXqQmqG1vdfv8ff7/XwvPHUAAAnaSURBVHja3ZyJkqI6GIUbXHpcwHSUbtuMdtuiLcu8/+vdCIRDjElYxLLuz701W03NV+ccAmT5XzqV4w6Go/FkPl/4RS3m88l4NBy4zsuDC0Sv05nnEcLYhvH/ebHsp4R43mz6+ng2ZzD8M+NAOcaGI6Hy38ngZn+Gg4eRuYPR1LsQVWk+iwtVoE1HA/cROo1nORPLcW4WBwSaNxv3rJk7nGRMGwlpfV2CrSDbEG8ydPsTajQjFZXWJc/H+gMFuAteqdls1I9k3D1Sca4AQr3zyn4EXa6bAOvBS2fw6mdQYCpheAWVKyuwZaIVYP7rfcHcUqkKkwC6UQWeICsl44q595NqOCUQCkwFwuq6gJaTQbINBxs6dwrVxMuUUphAtFwti1ot+a8A914hK7z0JoN7+Dea5VLlUIJpFeQ8uip0WwkyDgbBRm53qWSokglItKg3KqpkA5kM1lEwZzgjhX+AqjDREgklACtkAMuc7Jowd+wJqQAlmMBzq3I2QQYwIZjX/pYcTAkT/mVQEMqIBDReQrIMDE4yMm1pZGkgoCDUW80CGcAuXGzDjWwVK5+xMlWZfRDKjqOC5VZCMMb8odOcymOsTBWkAlMzslIwJIwxrymXM/JgoALVFQxcI6cR1VihAlQXMImLlzd2GlMVBopUAaotmJQwnlpwNXDQKFV3wYRe8NGadpUKUF3qysiMq27uVSpIdRfBwFXcj7Wo/AqVRqqugkl6Mb8G12CWj1f9UMFI5Iux2cBG5U4J02jVHxeZurahgfSmFbjeJK5PzjV2LHHXUPWslzc0B0s8BzlVAKoe9BK5z8d7Y7zc7A25ORWly+DyD/AnQu2/UXCJYYJM9PEaIVh1qehqTfxFGJ6irE5huPDJekXr6gWukcVCBItSGxPxwyhN0yQ+FxUn/JdR6BMbGaWIl9FGBxbWolqyxSlKk/Pv8XD4+fnZ/+z3/IfD4fh7TtLotGDLmlzCRsdwF66FhRaqFZlzJo70c6M4Giebk5WZCzbq70Z3imBZqQISRskZTLfIzkkUksDOVdo4dTUDqWShwT4ScqHApCPjkoXEYOW1jWNHm3d7sOjHIkoBZQZLo8UHtcZLm3rnNRfLbiElpzQGlKWOcXoi1G5jLtercy2WX9PCwI/gX536TSI/qGmjP1CTVSvv6/lFqv1PgzrE6XxdJ/VIF5IliaWjoixMz6pUe7kUrnMaMqrjglxqukYQy2QhO6W/GiQ9GNf2Nz0xk42QaySNWRDLYCHlVMfbUN9l3SY7ci6qt1HIxdjMrQ7wuA0NYilUQEIBTeEyyIWbcVh9GmLM0ou1DmWqK6av7JLIZK5wbZQrd5FNHATew5ilFSuYy7kC1NdVAUwaKNJ5oJULY5eH0IvRwXQbUj89a6DAZQQ7pz413YxijFACbxCLkig+qFSCYycKZArXIY4INcglxghX8VAv1scp0VJdaPiV/8eropjMlZzWerngIgYtW+CXi2qwAAWdpJIEq8ZrsbSEHkOXM7V7KCxUqQTXtrgA9q1ycRvtLk6duh4GYSpTyVJtqyUpJrgwSgR1XRzKHt7MOyxUqS4ogKqgcSrBBRs9nVxwcZh5+Mfq4Sqs5D2nunBdKfWXX5cSZn5Br0rqw5XVxT8OhgeThyRSxZKp/qK2kl6KXBGxujhzRbSMT+nlXIgFKkBlTIASYLutFC/INV/q70WEyx4tyiKM79AKVGqVgsFGyMVqheuVWKJFF+nxplgyFMfYKlyqjcd0QW/KhXCRV4xa+mitTvBQthBUh2Oc8IqPh62GCy6eVtpwYeRC4hEtNfAQq7QQWu2OcfqvqDQ+7gQXbIRcCL0hXDM3T7wxWj48hFg80YLqEP+TKj6AC3LBxZsvEteZH3oi8StgqYOWilVSJf+uKgFXRS5p6DJn3hviOa1L/Do6yx5CLFDJlR5kuWQXz9H6Ta1sQMXTekwsiceDRxYLVGol4FJd/E2JJfNk/DJhFiwpWhArp9oiV1LFWe4hVwVrz8NlwWKTl7nu0YNR66B6KCw8/tPUUWCp9+IBI5fu8TN/mW/M4wMNEwULHsY6rHgLFwUWMn8Ta1libeYvCwnrRhTDWI1WmfcUIJrU3whXHC7NI8Rm8eIzM1ZwOktYiod2F3fXWKfgJhZM9DmWeTR9j4Bl9xAVq5nHCPGuVas9FkzcJnqspBqub+lWPEcfdixupWmQX19hfVdM3JmwdsiWota6o1oq1ldzLFUtYKEUtYxYH7fU2tYx0aDWR2e1Ak227JHXZ8se+f4HCHXcOp8CG9YzDacYtxZ9P3x2moePeZSf9/uohloNH9WtXmyAtbO+2Kg3Yq0XmzFhbV4Dd+1fA481XgNbvzRvDVyJ8aU51gzy0kszPjGCOp8Y+KQ2f2KoHlpmR+RPjBYfZMCyfZDtWn+Qtf983dk/X7cNP18xZdPxY/+v6WN/1/5jXz81grrn1MgeUyNKtDA1UmuOUjuRZJ5H6jSRhMxrw0XnycE0GajHajvtdqdJShkJWrWdpKw3pbtUp3QFlzynK5hA1XxKl2VTuoZwocwT4CBTqNpNgNdcLqDm5YILh4Sko9ofk3rLBd0XV+TaVplaL67UXoqi+qWoL4WJU+06LkWpLi7rLdxh3RVkQLrHwp3TeZlzl7Hxy7DMua+9zNl1URhg+kXhfeNF4WZL6Cv9Evp3myV0lLKEfr8NB9hvAChIVXvDwYZNnIbbM+iDt2d038zyvecsF572m1kQeJRj2foDLuQLZGope7iglVLK1p+eNkr9dN4ohaq5rYzm28rU2udX921lLTfhrRpuwju02YSHqr1lcdloy+Kh5ZbF/jd4rhts8Oy0HZY+YDssxq4mm4e9MMKGZt225ij0mmwenrlPu9W6+8Z0io3pYOu6Mb37Nn7+h5238cPCzWzwDIceqHLo4RmOiLypR0Se4EANrXGgBuU86PjRm3r86AkOa1GJCgOpvtxp71yqVmTqPsVBwJWgWuMgYKNjk8FDjk0+3SFTxP1JjuSCyml6BL3/A8xNDqI7Y3DdTzBACSoc936Gw/FBcTj+c8NA9RStBODgczZeePY2Fd2betCuTT1Ed5b/SwsUNIxhbRvGUEPDGIaGMQ9qr0PL0rfX2XRt4OQMJkQIBjCQAa1aS6nlD6CEVGQy6Kl1U6C0bspYUCswaVo39d/oSuHpt9EV2oIVYCBDozK5JVi1LxiEQr8ypOrxTdT4b9uaqPXfck5uOlcUWs4BCi3nHtKgL4MoWN4/QASmDRr03b2kdoY5GuBQ6LX4iHaGaP4IMgH3WW2tiOaPn4IJzR/7AuNeVttSantlPqhVJsqRGosKPFwbILFHNBYFGNqwSnQA0rRhfXDTWtTmPk1rn7TF739gu3see8j9YQAAAABJRU5ErkJggg==)](https://learnopencv.com/videorag-long-context-video-comprehension/) [![Platform](https://img.shields.io/badge/platform-macOS%20|%20Windows%20|%20Linux-lightgrey.svg)]() **๐ŸŽฌ Intelligent Video Conversations | Powered by Advanced AI | Extreme Long-Context Processing**

Vimo is a revolutionary desktop application that lets you **chat with your videos** using cutting-edge AI technology. Built on the powerful [VideoRAG framework](https://arxiv.org/abs/2502.01549), Vimo can understand and analyze videos of any length - from short clips to hundreds of hours of content - and answer your questions with remarkable accuracy. ### ๐ŸŽฅ Watch Vimo in Action See how Vimo transforms video interaction with intelligent conversations and deep understanding capabilities.
Vimo Introduction Video

๐Ÿ‘† Click to watch the Vimo demo video

## โœจ Key Features ### For Everyone - **Drag & Drop Upload**: Simply drag video files into Vimo - **Smart Conversations**: Ask questions in natural language - **Multi-Format Support**: Works with MP4, MKV, AVI, and more - **Cross-Platform**: Available on macOS, Windows, and Linux ### For Power Users - **Extreme Long Videos**: Process videos up to hundreds of hours - **Multi-Video Analysis**: Compare and analyze multiple videos simultaneously - **Advanced Retrieval**: Find specific moments and scenes with precision - **Export Capabilities**: Save insights and references for later use ### For Researchers - **VideoRAG Framework**: Access to cutting-edge retrieval-augmented generation - **Benchmark Dataset**: LongerVideos benchmark with 134+ hours of content - **Performance Metrics**: Detailed evaluation against existing methods - **Extensible Architecture**: Build upon our open-source foundation ## ๐ŸŒŸ Why Vimo? **For Video Enthusiasts & Professionals:** - **Effortless Video Analysis**: Upload any video and start asking questions immediately - **Natural Conversations**: Chat with your videos as if talking to a human expert - **No Length Limits**: Process everything from 30-second clips to 100+ hour documentaries - **Deep Understanding**: Combines visual content, audio, and context for comprehensive answers **For Researchers & Developers:** - **State-of-the-Art Algorithm**: Built on VideoRAG, featuring graph-driven knowledge indexing - **Benchmark Performance**: Evaluated on 134+ hours across lectures, documentaries, and entertainment - **Open Source**: Full access to VideoRAG implementation and research findings - **Scalable Architecture**: Efficient processing with single GPU (RTX 3090) capability ## ๐Ÿ“‹ Table of Contents - [๐Ÿš€ Quick Start](#-quick-start) - [โœจ Key Features](#-key-features) - [๐Ÿ”ฌ VideoRAG Algorithm](#-videorag-algorithm) - [๐Ÿ› ๏ธ Development Setup](#๏ธ-development-setup) - [๐Ÿงช Benchmarks & Evaluation](#-benchmarks--evaluation) - [๐Ÿ“– Citation](#-citation) - [๐Ÿค Contributing](#-contributing) - [๐Ÿ™ Acknowledgement](#-acknowledgement) ## ๐Ÿš€ Quick Start of Vimo ### Option 1: Download Vimo App (Coming Soon) > [!NOTE] > We are preparing the **Beta release** for macOS Apple Silicon first, with Windows and Linux versions coming soon!
Coming Soon - Mac Release
### Option 2: Run from Source Code For detailed setup instructions: - **Vimo Desktop App**: See [Vimo-desktop](Vimo-desktop) for complete installation and configuration steps **Quick Overview:** 1. Set up the Python backend environment and start the VideoRAG server 2. Launch the Electron frontend application 3. Start chatting with your videos! ## ๐Ÿ”ฌ VideoRAG Algorithm

VideoRAG Architecture

VideoRAG introduces a novel dual-channel architecture that combines: - **Graph-Driven Knowledge Indexing**: Multi-modal knowledge graphs for structured video understanding - **Hierarchical Context Encoding**: Preserves spatiotemporal visual patterns across long sequences - **Adaptive Retrieval**: Dynamic retrieval mechanisms optimized for video content - **Cross-Video Understanding**: Semantic relationship modeling across multiple videos ### Technical Highlights - **Efficient Processing**: Handle hundreds of hours on a single RTX 3090 (24GB) - **Structured Indexing**: Distill long videos into concise knowledge representations - **Multi-Modal Retrieval**: Align textual queries with visual and audio content - **LongerVideos Benchmark**: 160+ videos, 134+ hours across diverse domains ### Performance Comparison Our VideoRAG algorithm significantly outperforms existing methods in long-context video understanding:
Performance Comparison
### Experiments and Evaluation See [VideoRAG-algorithm](VideoRAG-algorithm) for detailed development setup including: - Conda environment creation - Model checkpoints download - Dependencies installation - Evaluation scripts ## ๐Ÿงช LongerVideos Benchmark We created the LongerVideos benchmark to evaluate long-context video understanding: | Video Type | #Collections | #Videos | #Queries | Avg. Duration | |------------------|-------------|---------|----------|---------------| | **Lectures** | 12 | 135 | 376 | ~64.3 hours | | **Documentaries**| 5 | 12 | 114 | ~28.5 hours | | **Entertainment**| 5 | 17 | 112 | ~41.9 hours | | **Total** | 22 | 164 | 602 | ~134.6 hours | For detailed evaluation instructions and reproduction scripts, see [VideoRAG-algorithm/reproduce](VideoRAG-algorithm/reproduce). ## ๐Ÿ“– Citation If you find Vimo or VideoRAG helpful in your research, please cite our paper: ```bibtex @article{VideoRAG, title={VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos}, author={Ren, Xubin and Xu, Lingrui and Xia, Long and Wang, Shuaiqiang and Yin, Dawei and Huang, Chao}, journal={arXiv preprint arXiv:2502.01549}, year={2025} } ``` ## ๐Ÿค Contributing We welcome contributions from the community! Whether you're: - **Reporting bugs** or suggesting features for Vimo - **Improving VideoRAG algorithms** or adding new capabilities - **Enhancing documentation** or creating tutorials - **Designing UI/UX improvements** for better user experience Feel free to submit issues and pull requests. Together, we're building the future of intelligent video interaction! ## ๐Ÿ™ Acknowledgement Vimo builds upon the incredible work of the open-source community: - **[VideoRAG](https://arxiv.org/abs/2502.01549)**: The core algorithm powering Vimo's intelligence - **[nano-graphrag](https://github.com/gusye1234/nano-graphrag)** & **[LightRAG](https://github.com/HKUDS/LightRAG)**: Graph-based retrieval foundations - **[ImageBind](https://github.com/facebookresearch/ImageBind)**: Multi-modal representation learning - **[uitars-desktop](https://github.com/bytedance/UI-TARS-desktop)**: Desktop application architecture inspiration **๐ŸŒŸ Transform how you interact with videos. Start your journey with Vimo today!** ---
Built with โค๏ธ by the VideoRAG@HKUDS team.