# Hybrid-SD **Repository Path**: ByteDance/Hybrid-SD ## Basic Information - **Project Name**: Hybrid-SD - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-30 - **Last Updated**: 2026-03-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

arxiv
## **Introduction** Hybrid SD is a novel framework designed for edge-cloud collaborative inference of Stable Diffusion Models. By integrating the superior large models on cloud servers and efficient small models on edge devices, Hybrid SD achieves state-of-the-art parameter efficiency on edge devices with competitive visual quality. ## Installation ```bash conda create -n hybrid_sd python=3.9.2 conda activate hybrid_sd pip install -r requirements.txt ``` ## Pretrained Models We provide a number of pretrained models as follows: - Ours pruned U-Net (224M): [hybrid-sd-224m](https://huggingface.co/cqyan/hybrid-sd-224m) - Ours tiny VAE: [hybrid-sd-tinyvae](https://huggingface.co/cqyan/hybrid-sd-tinyvae) and SDXL version: [hybrid-sd-tinyvae-xl](https://huggingface.co/cqyan/hybrid-sd-tinyvae-xl). Additionaly, we provide the decoder pruned version (**speed up 20%+**) of SD1.5 [hybrid-sd-small-vae](https://huggingface.co/cqyan/hybrid-sd-small-vae) and the SDXL [hybrid-sd-small-vae-xl](https://huggingface.co/cqyan/hybrid-sd-small-vae-xl). **Visual results can be found on [Results](#Results).** - [SD-v1.4](https://huggingface.co/cqyan/hybrid-sd-v1-4-lcm) and Ours pruned LCM (224M) [hybrid-sd-v1-4-lcm-224](https://huggingface.co/cqyan/hybrid-sd-v1-4-lcm-224) ## Hybrid Inference ### **SD Models** To use hybrid SD for inference, you can launch the `scripts/hybrid_sd/hybird_sd.sh`, please specify the large and small models. For hybrid inference for SDXL models, please refer to `scripts/hybrid_sd/hybird_sdxl.sh` accordingly.
Optional arguments - `PATH_MODEL_LARGE`: the large model path. - `PATH_MODEL_SMALL`: the small model path. - `--step`: the steps distributed to different models. (e.g., "10,15" means the first 10 steps are distributed to the large model, while the last 15 steps are shifted to the small model.) - `--seed`: the random seed. - `--img_sz`: the image size. - `--prompts_file`: put prompts in the .txt file. - `--output_dir`: the output directory for saving generated images.
### **Latent Consistency Models (LCMs)** To use hybrid SD for LCMs, you can launch the `scripts/hybrid_sd/hybird_lcm.sh` and specify the large model and small model. You also need to pass `TEACHER_MODEL_PATH` to load VAE, tokenizer, and Text Encoder. ### Evaluation on MS-COCO Benchmark * Evaluate hybrid inference with SD Models on MS-COCO 2014 30K. ```bash bash scripts/hybrid_sd/generate_dpm_eval.sh ``` * Evaluate hybrid inference with LCMs on MS-COCO 2014 30K. ```bash bash scripts/hybrid_sd/generate_lcm_eval.sh ``` ## Training ### Pruning U-Net ```bash # pruning U-Net through significance score. bash scripts/prune_sd/prune_tiny.sh # finetuning the pruned U-Net. bash scripts/prune_sd/kd_finetune_tiny.sh ``` Following [BK-SDM](https://github.com/Nota-NetsPresso/BK-SDM), we use the dataset preprocessed_212k. ### Training our lightweight VAE ```bash bash scripts/optimize_vae/train_tinyvae.sh ```
Note - We use datasets from [Laion_aesthetics_5plus_1024_33M](https://huggingface.co/datasets/MuhammadHanif/Laion_aesthetics_5plus_1024_33M). - We optimize VAE with LPIPS loss and adversarial loss. - We adopt the discriminator from StyelGAN-t along with several data augmentation and degradation techniques for VAE enhancement.
## Training LCMs Training accelerated Latent consistency models (LCM) using the following scripts. 1. Distilling SD models to LCMs ```bash bash scripts/hybrid_sd/lcm_t2i_sd.sh ``` 2. Distilling Pruned SD models to LCMs ```bash bash scripts/hybrid_sd/lcm_t2i_tiny.sh ``` ## Results ### Hybrid SDXL Inference
### VAEs #### Our tiny VAE vs. TAESD Ours VAE shows better visual quality and detail refinements than TAESD. Ours VAE also achieves better FID scores than TAESD on MSCOCO 2017 5K datasets.
#### Our small VAE vs. Baseline | Model (fp16)| Latency on V100 (ms) | GPU Memory (MiB)| |---|:---:|:---:| |SDXL baseline vae|802.7|19203| |SDXL [small vae](https://huggingface.co/cqyan/hybrid-sd-small-vae-xl) (Ours)|611.8|17469| |SDXL [tiny vae](https://huggingface.co/cqyan/hybrid-sd-tiny-vae-xl) (Ours)|61.1|8017| |SD1.5 baseline vae|186.6|12987| |SD1.5 [small vae](https://huggingface.co/cqyan/hybrid-sd-small-vae) (Ours)|135.6|9087| |SD1.5 [tiny vae](https://huggingface.co/cqyan/hybrid-sd-tiny-vae) (Ours)|16.4|6929|
## Acknowledgments - [CompVis](https://github.com/CompVis/latent-diffusion), [Runway](https://runwayml.com/), and [Stability AI](https://stability.ai/) for the pioneering research on Stable Diffusion. - [Diffusers](https://github.com/huggingface/diffusers), [BK-SDM](https://github.com/Nota-NetsPresso/BK-SDM/), [TAESD](https://github.com/madebyollin/taesd) for their valuable contributions. ## Citation If you find our work helpful, please cite it! ``` @article{yan2024hybrid, title={Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models}, author={Yan, Chenqian and Liu, Songwei and Liu, Hongjian and Peng, Xurui and Wang, Xiaojian and Chen, Fangming and Fu, Lean and Mei, Xing}, journal={arXiv preprint arXiv:2408.06646}, year={2024} } ``` ## License This project is licensed under the [Apache-2.0 License](LICENSE).