# MS-G3D
**Repository Path**: alexbd/MS-G3D
## Basic Information
- **Project Name**: MS-G3D
- **Description**: PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition", CVPR 2020 Oral
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-20
- **Last Updated**: 2024-10-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# MS-G3D
PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition", CVPR 2020 Oral.
[[PDF](https://arxiv.org/pdf/2003.14111.pdf)][[Demo](https://youtu.be/5TcHIIece38)][[Abstract/Supp](https://openaccess.thecvf.com/content_CVPR_2020/html/Liu_Disentangling_and_Unifying_Graph_Convolutions_for_Skeleton-Based_Action_Recognition_CVPR_2020_paper.html)]

## Dependencies
- Python >= 3.6
- PyTorch >= 1.2.0
- [NVIDIA Apex](https://github.com/NVIDIA/apex) (auto mixed precision training)
- PyYAML, tqdm, tensorboardX
## Data Preparation
*Disk usage warning: after preprocessing, the total sizes of datasets are around 38GB, 77GB, 63GB for NTU RGB+D 60, NTU RGB+D 120, and Kinetics 400, respectively. The raw/intermediate sizes may be larger.*
### Download Datasets
There are 3 datasets to download:
- NTU RGB+D 60 Skeleton
- NTU RGB+D 120 Skeleton
- Kinetics 400 Skeleton
#### NTU RGB+D 60 and 120
1. Request dataset here: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp
2. Download the skeleton-only datasets:
- `nturgbd_skeletons_s001_to_s017.zip` (NTU RGB+D 60)
- `nturgbd_skeletons_s018_to_s032.zip` (NTU RGB+D 120, on top of NTU RGB+D 60)
- Total size should be 5.8GB + 4.5GB.
3. Download missing skeletons lookup files [from the authors' GitHub repo](https://github.com/shahroudy/NTURGB-D#samples-with-missing-skeletons):
- NTU RGB+D 60 Missing Skeletons:
`wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD_samples_with_missing_skeletons.txt`
- NTU RGB+D 120 Missing Skeletons:
`wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD120_samples_with_missing_skeletons.txt`
- Remember to remove the first few lines of text in these files!
#### Kinetics Skeleton 400
1. Download dataset from ST-GCN repo: https://github.com/yysijie/st-gcn/blob/master/OLD_README.md#kinetics-skeleton
2. [This](https://silicondales.com/tutorials/g-suite/how-to-wget-files-from-google-drive/) might be useful if you want to `wget` the dataset from Google Drive
### Data Preprocessing
#### Directory Structure
Put downloaded data into the following directory structure:
```
- data/
- kinetics_raw/
- kinetics_train/
...
- kinetics_val/
...
- kinetics_train_label.json
- keintics_val_label.json
- nturgbd_raw/
- nturgb+d_skeletons/ # from `nturgbd_skeletons_s001_to_s017.zip`
...
- nturgb+d_skeletons120/ # from `nturgbd_skeletons_s018_to_s032.zip`
...
- NTU_RGBD_samples_with_missing_skeletons.txt
- NTU_RGBD120_samples_with_missing_skeletons.txt
```
#### Generating Data
1. NTU RGB+D
- `cd data_gen`
- `python3 ntu_gendata.py`
- `python3 ntu120_gendata.py`
- Time estimate is ~ 3hrs to generate NTU 120 on a single core (feel free to parallelize the code :))
2. Kinetics
- `python3 kinetics_gendata.py`
- ~ 70 mins to generate Kinetics data
3. Generate the bone data with:
- `python gen_bone_data.py --dataset ntu`
- `python gen_bone_data.py --dataset ntu120`
- `python gen_bone_data.py --dataset kinetics`
## Pretrained Models
- Download pretrained models for producing the final results on NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400: [[Dropbox](https://www.dropbox.com/s/9n9897cu1ft1khg/msg3d-pretrained-models.zip)][[GoogleDrive](https://drive.google.com/open?id=1y3VbEnINtyriy82apiTZJtBV1a3cywa-)][[WeiYun](https://share.weiyun.com/DlR4Jse1)]
- Put the folder of pretrained models at repo root:
```
- MS-G3D/
- pretrained-models/
- main.py
- ...
```
- Run `bash eval_pretrained.sh`
## Training & Testing
- The general training template command:
```
python3 main.py
--config
--work-dir
--device
--half # Mixed precision training with NVIDIA Apex (default O1) for GPUs ~11GB memory
[--base-lr ]
[--batch-size ]
[--weight-decay ]
[--forward-batch-size ]
[--eval-start ]
```
- The general testing template command:
```
python3 main.py
--config
--work-dir
--device
--weights
[--test-batch-size <...>]
```
- Template for joint-bone two-stream fusion:
```
python3 ensemble.py
--dataset
--joint-dir
--bone-dir
```
- Use the corresponding config files from `./config` to train/test different datasets
- Examples
- Train on NTU 120 XSub Joint
- Train with 1 GPU:
- `python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml`
- Train with 2 GPUs:
- `python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml --batch-size 32 --forward-batch-size 32 --device 0 1`
- Test on NTU 120 XSet Bone
- `python3 main.py --config ./config/nturgbd120-cross-setup/test_bone.yaml`
- Batch size 32 on 1 GPU (BS 16 per forward pass by accumulating gradients):
- `python3 main.py --config <...> --batch-size 32 --forward-batch-size 16 --device 0`
- Resume training from checkpoint
```
python3 main.py
... # Same params as before
--start-epoch <0 indexed epoch>
--weights
--checkpoint
```
## Notes
- It's recommended to linearly scale up base LR with > 2 GPUs (https://arxiv.org/pdf/1706.02677.pdf, Section 2.1) to use 16 samples per worker during training; e.g.
- 1 GPU: `--base-lr 0.05 --device 0 --batch-size 32 --forward-batch-size 16`
- 2 GPUs: `--base-lr 0.05 --device 0 1 --batch-size 32 --forward-batch-size 32`
- 4 GPUs: `--base-lr 0.1 --device 0 1 2 3 --batch-size 64 --forward-batch-size 64`
- Unfortunately, different PyTorch/CUDA versions & GPU setups can cause different levels of memory usage, and so you may experience out of memory (OOM) on some machines but not others
- 1080Ti GPUs with `--half` and `--amp-opt-level 1` (default) are relatively more stable
- If OOM occurs, try using Apex O2 by setting `--amp-opt-level 2`. However, note that
- NVIDIA Apex does not yet support `nn.DataParallel` for O2
- https://github.com/NVIDIA/apex/issues/227#issuecomment-566843218
- This means you may need to train on a single GPU when using O2
- It may also impact the stability of training and/or the final performance
- Default hyperparameters are stored in the config files; you can tune them & add extra training techniques to boost performance
- The best joint-bone fusion result may not come from the best single stream models; for example, we provided 3 pretrained models for NTU RGB+D 60 XSub joint stream where the best fusion performance comes from the slightly underperforming model (~89.3%) instead of the reported (~89.4%) and the slightly better retrained model (~89.6%).
## Acknowledgements
This repo is based on
- [2s-AGCN](https://github.com/lshiwjx/2s-AGCN)
- [ST-GCN](https://github.com/yysijie/st-gcn)
Thanks to the original authors for their work!
## Citation
Please cite this work if you find it useful:
```
@inproceedings{liu2020disentangling,
title={Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition},
author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={143--152},
year={2020}
}
```
## Contact
Please email `kenziyuliu AT outlook.com` for further questions