# PoseLLM

**Repository Path**: comptart/PoseLLM

## Basic Information

- **Project Name**: PoseLLM
- **Description**: No description available
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-06
- **Last Updated**: 2025-08-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PoseLLM: Enhancing Language-Guided Human Pose Estimation with Multilayer Perceptron Alignment

[[`arXiv`](https://arxiv.org/abs/2507.09139)]

![overview](./img/architecture_posellm.jpg)

## Installation

### 1. Clone code
```shell
    git clone https://github.com/Ody-trek/PoseLLM
    cd ./PoseLLM
```
### 2. Create a conda environment for this repo
```shell
    conda create -n PoseLLM python=3.10
    conda activate PoseLLM
```
### 3. Install CUDA 11.7 (other version may not work)
```shell
    conda install -c conda-forge cudatoolkit-dev
```
### 4. Install PyTorch following official instruction (should match cuda version)
```shell
    conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
```
### 4. Install other dependency python packages (do not change package version)
```shell
    pip install pycocotools
    pip install opencv-python
    pip install accelerate==0.21.0
    pip install sentencepiece==0.1.99
    pip install transformers==4.31.0
```
### 5. Prepare dataset
Download [COCO ](https://cocodataset.org/#home), [MPII](http://human-pose.mpi-inf.mpg.de/) and [Human-Art](https://idea-research.github.io/HumanArt/) from website and put the zip file under the directory following below structure, (xxx.json) denotes their original name.

```
./data
|── coco
│   └── annotations
|   |   └──coco_train.json(person_keypoints_train2017.json)
|   |   └──coco_val.json(person_keypoints_val2017.json)
|   └── images
|   |   └──train2017
|   |   |   └──000000000009.jpg
|   |   └──val2017
|   |   |   └──000000000139.jpg
├── HumanArt
│   └── annotations
|   |   └──validation_humanart.json
|   └── images
|   |   └──2D_virtual_human
├── mpii
│   └── annot
|   |   └──valid.json
|   |   └──gt_valid.mat
|   └── images
|   |   └──000001163.jpg
```
## Usage

### 1. Download trained model

```shell
    git lfs install

    git clone https://huggingface.co/KTrek/PoseLLM

    mkdir checkpoints
    mkdir checkpoints/ckpts
    mv PoseLLM/coco checkpoints/ckpts
    # for training
    mkdir checkpoints/model_weights
    mv PoseLLM/pretrained/dinov2_vitl14_pretrain.pth checkpoints/model_weights
    # clone vicuna1.5
    cd checkpoints/model_weights
    git clone https://huggingface.co/lmsys/vicuna-7b-v1.5
```

### 2. Evaluate Model
Change `IDX` option in script to specify the gpu ids for evaluation, multiple ids denotes multiple gpu evaluation.

```shell
    # evaluate on coco val set
    bash scripts/valid_coco.sh
    # evaluate on humanart set
    bash scripts/valid_humanart.sh
    # evaluate on mpii set
    bash scripts/valid_mpii.sh
```

### 3. Train Model

```shell
    # train on coco
    bash scripts/train_coco.sh
```

Note that GPU memory should not be less than 24GB, training on 2 RTX A6000 GPUs takes about 4 days.


## Citations
If you find this code useful for your research, please cite our paper:

```
@article{zhang2025posellm,
  title={PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment},
  author={Zhang, Dewen and Hussain, Tahir and An, Wangpeng and Shouno, Hayaru},
  journal={arXiv preprint arXiv:2507.09139},
  year={2025}
}
```

## Acknowledgement
The code is mainly encouraged by [LocLLM](https://github.com/kennethwdk/LocLLM).