# wanda **Repository Path**: kilszfdjs/wanda ## Basic Information - **Project Name**: wanda - **Description**: 。。。。。。。。。。。。。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-08-15 - **Last Updated**: 2023-08-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Pruning LLMs by Weights and Activations Official PyTorch implementation of **Wanda** (Pruning by **W**eights **and a**ctivations), as presented in our paper: [A Simple and Effective Pruning Approach for Large Language Models](https://arxiv.org/abs/2306.11695). [Mingjie Sun*](https://eric-mingjie.github.io/), [Zhuang Liu*](https://liuzhuang13.github.io/), [Anna Bair](https://annaebair.github.io/), [J. Zico Kolter](http://zicokolter.com/) (* indicates equal contribution) Carnegie Mellon University, Meta AI and Bosch Center for AI ---

Compared to magnitude pruning which removes weights solely based on their magnitudes, our pruning approach **Wanda** removes weights on a per-output basis, by the product of weight magnitudes and input activation norms. ## Setup Installation instructions can be found in [INSTALL.md](INSTALL.md). ## Usage The [scripts](scripts) directory contains all the bash commands to replicate the main results (Table 2) in our paper. Below is an example command for pruning LLaMA-7B with Wanda, to achieve unstructured 50% sparsity. ```sh python main.py \ --model decapoda-research/llama-7b-hf \ --prune_method wanda \ --sparsity_ratio 0.5 \ --sparsity_type unstructured \ --save out/llama_7b/unstructured/wanda/ ``` We provide a quick overview of the arguments: - `--model`: The identifier for the LLaMA model on the Hugging Face model hub. - `--cache_dir`: Directory for loading or storing LLM weights. The default is `llm_weights`. - `--prune_method`: We have implemented three pruning methods, namely [`magnitude`, `wanda`, `sparsegpt`]. - `--sparsity_ratio`: Denotes the percentage of weights to be pruned. - `--sparsity_type`: Specifies the type of sparsity [`unstructured`, `2:4`, `4:8`]. - `--use_variant`: Whether to use the Wanda variant, default is `False`. - `--save`: Specifies the directory where the result will be stored. For structured N:M sparsity, set the argument `--sparsity_type` to "2:4" or "4:8". An illustrative command is provided below: ```sh python main.py \ --model decapoda-research/llama-7b-hf \ --prune_method wanda \ --sparsity_ratio 0.5 \ --sparsity_type 2:4 \ --save out/llama_7b/2-4/wanda/ ``` For pruning image classifiers, see directory [image_classifiers](image_classifiers) for details. ## Acknowledgement This repository is build upon the [SparseGPT](https://github.com/IST-DASLab/sparsegpt) repository. ## License This project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information. ## Questions Feel free to discuss papers/code with us through issues/emails! mingjies at cs.cmu.edu liuzhuangthu at gmail.com ## Citation If you found this work useful, please consider citing: ``` @article{sun2023simple, title={A Simple and Effective Pruning Approach for Large Language Models}, author={Sun, Mingjie and Liu, Zhuang and Bair, Anna and Kolter, Zico}, year={2023}, journal={arXiv preprint arXiv:2306.11695} } ```