# PoseLLM **Repository Path**: comptart/PoseLLM ## Basic Information - **Project Name**: PoseLLM - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-06 - **Last Updated**: 2025-08-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PoseLLM: Enhancing Language-Guided Human Pose Estimation with Multilayer Perceptron Alignment [[`arXiv`](https://arxiv.org/abs/2507.09139)] ![overview](./img/architecture_posellm.jpg) ## Installation ### 1. Clone code ```shell git clone https://github.com/Ody-trek/PoseLLM cd ./PoseLLM ``` ### 2. Create a conda environment for this repo ```shell conda create -n PoseLLM python=3.10 conda activate PoseLLM ``` ### 3. Install CUDA 11.7 (other version may not work) ```shell conda install -c conda-forge cudatoolkit-dev ``` ### 4. Install PyTorch following official instruction (should match cuda version) ```shell conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia ``` ### 4. Install other dependency python packages (do not change package version) ```shell pip install pycocotools pip install opencv-python pip install accelerate==0.21.0 pip install sentencepiece==0.1.99 pip install transformers==4.31.0 ``` ### 5. Prepare dataset Download [COCO ](https://cocodataset.org/#home), [MPII](http://human-pose.mpi-inf.mpg.de/) and [Human-Art](https://idea-research.github.io/HumanArt/) from website and put the zip file under the directory following below structure, (xxx.json) denotes their original name. ``` ./data |── coco │ └── annotations | | └──coco_train.json(person_keypoints_train2017.json) | | └──coco_val.json(person_keypoints_val2017.json) | └── images | | └──train2017 | | | └──000000000009.jpg | | └──val2017 | | | └──000000000139.jpg ├── HumanArt │ └── annotations | | └──validation_humanart.json | └── images | | └──2D_virtual_human ├── mpii │ └── annot | | └──valid.json | | └──gt_valid.mat | └── images | | └──000001163.jpg ``` ## Usage ### 1. Download trained model ```shell git lfs install git clone https://huggingface.co/KTrek/PoseLLM mkdir checkpoints mkdir checkpoints/ckpts mv PoseLLM/coco checkpoints/ckpts # for training mkdir checkpoints/model_weights mv PoseLLM/pretrained/dinov2_vitl14_pretrain.pth checkpoints/model_weights # clone vicuna1.5 cd checkpoints/model_weights git clone https://huggingface.co/lmsys/vicuna-7b-v1.5 ``` ### 2. Evaluate Model Change `IDX` option in script to specify the gpu ids for evaluation, multiple ids denotes multiple gpu evaluation. ```shell # evaluate on coco val set bash scripts/valid_coco.sh # evaluate on humanart set bash scripts/valid_humanart.sh # evaluate on mpii set bash scripts/valid_mpii.sh ``` ### 3. Train Model ```shell # train on coco bash scripts/train_coco.sh ``` Note that GPU memory should not be less than 24GB, training on 2 RTX A6000 GPUs takes about 4 days. ## Citations If you find this code useful for your research, please cite our paper: ``` @article{zhang2025posellm, title={PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment}, author={Zhang, Dewen and Hussain, Tahir and An, Wangpeng and Shouno, Hayaru}, journal={arXiv preprint arXiv:2507.09139}, year={2025} } ``` ## Acknowledgement The code is mainly encouraged by [LocLLM](https://github.com/kennethwdk/LocLLM).