# MiniCPM-V-CookBook
**Repository Path**: hspghost/MiniCPM-V-CookBook
## Basic Information
- **Project Name**: MiniCPM-V-CookBook
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-12
- **Last Updated**: 2025-11-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# π³ MiniCPM-V & o Cookbook
[π Main Repository](https://github.com/OpenBMB/MiniCPM-o) | [π Full Documentation](https://minicpm-o.readthedocs.io/en/latest/index.html)
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o, bringing vision, speech, and live-streaming capabilities right to your fingertips! For version-specific deployment instructions, see the files in the [deployment](./deployment/) directory.
## β¨ What Makes Our Recipes Special?
### Easy Usage Documentation
Our comprehensive [documentation website](https://minicpm-o.readthedocs.io/en/latest/index.html) presents every recipe in a clear, well-organized manner.
All features are displayed at a glance, making it easy for you to quickly find exactly what you need.
### Broad User Spectrum
We support a wide range of users, from individuals to enterprises and researchers.
* **Individuals**: Enjoy effortless inference using [Ollama](./deployment/ollama/minicpm-v4_5_ollama.md) and [Llama.cpp](./deployment/llama.cpp/minicpm-v4_5_llamacpp.md) with minimal setup.
* **Enterprises**: Achieve high-throughput, scalable performance with [vLLM](./deployment/vllm/minicpm-v4_5_vllm.md) and [SGLang](./deployment/sglang/MiniCPM-v4_5_sglang.md).
* **Researchers**: Leverage advanced frameworks including [Transformers](./finetune/finetune_full.md) , [LLaMA-Factory](./finetune/finetune_llamafactory.md), [SWIFT](./finetune/swift.md), and [Align-anything](./finetune/align_anything.md) to enable flexible model development and cutting-edge experimentation.
### Versatile Deployment Scenarios
Our ecosystem delivers optimal solution for a variety of hardware environments and deployment demands.
* **Web demo**: Launch interactive multimodal AI web demo with [FastAPI](./demo/README.md).
* **Quantized deployment**: Maximize efficiency and minimize resource consumption using [GGUF](./quantization/gguf/minicpm-v4_5_gguf_quantize.md), [BNB](./quantization/bnb/minicpm-v4_5_bnb_quantize.md), and [AWQ](./quantization/awq/minicpm-v4_awq_quantize.md).
* **Edge devices**: Bring powerful AI experiences to [iPhone and iPad](./demo/ios_demo/ios.md), supporting offline and privacy-sensitive applications.
## βοΈ Live Demonstrations
Explore real-world examples of MiniCPM-V deployed on edge devices using our curated recipes. These demos highlight the modelβs high efficiency and robust performance in practical scenarios.
- Run locally on iPhone with [iOS demo](./demo/ios_demo/ios.md).
- Run locally on iPad with [iOS demo](./demo/ios_demo/ios.md), observing the process of drawing a rabbit.
## π₯ Inference Recipes
> *Ready-to-run examples*
| Recipe | Description |
|---------|:-------------|
| **Vision Capabilities** | |
| πΌοΈ [Single-image QA](./inference/minicpm-v4_5_single_image.md) | Question answering on a single image |
| π§© [Multi-image QA](./inference/minicpm-v4_5_multi_images.md) | Question answering with multiple images |
| π¬ [Video QA](./inference/minicpm-v4_5_video_understanding.md) | Video-based question answering |
| π [Document Parser](./inference/minicpm-v4_5_pdf_parse.md) | Parse and extract content from PDFs and webpages |
| π [Text Recognition](./inference/minicpm-v4_5_ocr.md) | Reliable OCR for photos and screenshots |
| π― [Grounding](./inference/minicpm-v4_5_grounding.md) | Visual grounding and object localization in images |
| **Audio Capabilities** | |
| π€ [Speech-to-Text](./inference/speech2text.md) | Multilingual speech recognition |
| π£οΈ [Text-to-Speech](./inference/text2speech.md) | Instruction-following speech synthesis |
| π [Voice Cloning](./inference/voice_clone.md) | Realistic voice cloning and role-play |
## ποΈ Fine-tuning Recipes
> *Customize your model with your own ingredients*
**Data preparation**
Follow the [guidance](./finetune/dataset_guidance.md) to set up your training datasets.
**Training**
We provide training methods serving different needs as following:
| Framework | Description|
|----------|--------|
| [Transformers](./finetune/finetune_full.md) | Most flexible for customization |
| [LLaMA-Factory](./finetune/finetune_llamafactory.md) | Modular fine-tuning toolkit |
| [SWIFT](./finetune/swift.md) | Lightweight and fast parameter-efficient tuning |
| [Align-anything](./finetune/align_anything.md) | Visual instruction alignment for multimodal models |
## π¦ Serving Recipes
> *Deploy your model efficiently*
| Method | Description |
|-------------------------------------------|----------------------------------------------|
| [vLLM](./deployment/vllm/minicpm-v4_5_vllm.md)| High-throughput GPU inference |
| [SGLang](./deployment/sglang/MiniCPM-v4_5_sglang.md)| High-throughput GPU inference |
| [Llama.cpp](./deployment/llama.cpp/minicpm-v4_5_llamacpp.md)| Fast CPU inference on PC, iPhone and iPad |
| [Ollama](./deployment/ollama/minicpm-v4_5_ollama.md)| User-friendly setup |
| [OpenWebUI](./demo/web_demo/openwebui) | Interactive Web demo with Open WebUI |
| [Gradio](./demo/web_demo/gradio/README.md) | Interactive Web demo with Gradio |
| [FastAPI](./demo/README.md) | Interactive Omni Streaming demo with FastAPI |
| [iOS](./demo/ios_demo/ios.md) | Interactive iOS demo with llama.cpp |
## π₯ Quantization Recipes
> *Compress your model to improve efficiency*
| Format | Key Feature |
|-----------------------------------------|------------------------------------|
| [GGUF](./quantization/gguf/minicpm-v4_5_gguf_quantize.md)| Simplest and most portable format |
| [BNB](./quantization/bnb/minicpm-v4_5_bnb_quantize.md) | Simple and easy-to-use quantization method |
| [AWQ](./quantization/awq/minicpm-v4_awq_quantize.md) | High-performance quantization for efficient inference |
## Framework Support Matrix
If you'd like us to prioritize support for another open-source framework, please let us know via this
[short form](https://docs.google.com/forms/d/e/1FAIpQLSdyTUrOPBgWqPexs3ORrg47ZcZ1r4vFQaA4ve2iA7L9sMfMWw/viewform).
## Awesome Works using MiniCPM-V & o
- [text-extract-api](https://github.com/CatchTheTornado/text-extract-api): Document extraction API using OCRs and Ollama supported models 
- [comfyui_LLM_party](https://github.com/heshengtao/comfyui_LLM_party): Build LLM workflows and integrate into existing image workflows 
- [Ollama-OCR](https://github.com/imanoop7/Ollama-OCR): OCR package uses vlms through Ollama to extract text from images and PDF 
- [comfyui-mixlab-nodes](https://github.com/MixLabPro/comfyui-mixlab-nodes): ComfyUI node suite supports Workflow-to-APPγGPT&3D and more 
- [OpenAvatarChat](https://github.com/HumanAIGC-Engineering/OpenAvatarChat): Interactive digital human conversation implementation on single PC 
- [pensieve](https://github.com/arkohut/pensieve): A privacy-focused passive recording project by recording screen content 
- [paperless-gpt](https://github.com/icereed/paperless-gpt): Use LLMs to handle paperless-ngx, AI-powered titles, tags and OCR 
- [Neuro](https://github.com/kimjammer/Neuro): A recreation of Neuro-Sama, but running on local models on consumer hardware 
## π₯ Community
### Contributing
We love new recipes! Please share your creative dishes:
1. Fork the repository
2. Create your recipe
3. Submit a pull request
### Issues & Support
- Found a bug? [Open an issue](https://github.com/OpenBMB/MiniCPM-o/issues)
- Need help? Join our [Discord](https://discord.gg/rftuRMbqzf)
## Institutions
This cookbook is developed by [OpenBMB](https://github.com/openbmb) and [OpenSQZ](https://github.com/opensqz).
## π License
This cookbook is served under the [Apache-2.0 License](LICENSE) - cook freely, share generously! π³
## Citation
If you find our model/code/paper helpful, please consider citing our papers π and staring us βοΈοΌ
```bib
@misc{yu2025minicpmv45cookingefficient,
title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe},
author={Tianyu Yu and Zefan Wang and Chongyi Wang and Fuwei Huang and Wenshuo Ma and Zhihui He and Tianchi Cai and Weize Chen and Yuxiang Huang and Yuanqian Zhao and Bokai Xu and Junbo Cui and Yingjing Xu and Liqing Ruan and Luoyuan Zhang and Hanyu Liu and Jingkun Tang and Hongyuan Liu and Qining Guo and Wenhao Hu and Bingxiang He and Jie Zhou and Jie Cai and Ji Qi and Zonghao Guo and Chi Chen and Guoyang Zeng and Yuxuan Li and Ganqu Cui and Ning Ding and Xu Han and Yuan Yao and Zhiyuan Liu and Maosong Sun},
year={2025},
eprint={2509.18154},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.18154},
}
@article{yao2024minicpm,
title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
journal={Nat Commun 16, 5509 (2025)},
year={2025}
}
```