# Open-VCLIP
**Repository Path**: teslatasy/Open-VCLIP
## Basic Information
- **Project Name**: Open-VCLIP
- **Description**: 复旦和Meta提出Open-VCLIP
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-07-04
- **Last Updated**: 2023-07-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
**This repository contains the official Pytorch implementation of our paper: "Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023, Poster)"**
Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang
[[`Paper`]](https://arxiv.org/abs/2302.00624)
# Introduction
We introduce a simple yet effective approach, **Open-VCLIP**, which transforms CLIP into strong zero-shot video classifiers and can better recognize unseen actions and events at test time.
# Dependency
The main dependent packages include: PyTorch 1.11.0 and torchvision 0.12.0 and [`PySlowFast`](https://github.com/facebookresearch/SlowFast)
Detailed Installation instruction can be viewed in [`INSTALL.md`](https://github.com/wengzejia1/Open-VCLIP/blob/main/INSTALL.md).
# Checkpoint
We upload the checkpoints of Open-VCLIP, which can be downloaded through the following links:
- checkpoint: [`https://drive.google.com/drive/folders/1VhwPFESkrr9Ed40yU5NEPIkrGIf6WQ9N?usp=share_link`](https://drive.google.com/drive/folders/1VhwPFESkrr9Ed40yU5NEPIkrGIf6WQ9N?usp=share_link)
# Data Preparation
- **Kinetics-400.**
We obtained the compressed version Kinetics-400 dataset, where videos have been resized to 256, from the [`VoV3d Repo`](https://github.com/youngwanLEE/VoV3D/blob/main/DATA.md#kinetics-400). The repository provides the download link for the dataset: [[`Kinetics-400 dataset link`](https://dl.dropbox.com/s/419u0zljf2brsbt/compress.tar.gz)]. After downloading and extracting the data, you should rename the folders "train_256" and "val_256" to "train" and "val" respectively. Additionally, please note that the video "val/crossing_river/ZVdAl-yh9m0.mp4" is invalid and needs to be replaced. You should download a new version of the video from [`here`](https://drive.google.com/file/d/15M07kKQlZEoVzUezppITSnICs83fch8A/view?usp=share_link) and perform the replacement.
- **UCF-101.**
We download UCF-101 dataset by the [`script`](https://github.com/open-mmlab/mmaction2/blob/main/tools/data/ucf101/download_videos.sh) provided by MMAction2.
- **HMDB-51.**
We donwload HMDB-51 dataset by the [`script`](https://github.com/open-mmlab/mmaction2/blob/main/tools/data/hmdb51/download_videos.sh) provided by MMAction2.
- **Kinetics-600 testing.**
Validation data of Kinetics-600 we used can be donwloaded from [`link`](https://pan.baidu.com/s/1d6wI-n3igMdE1rJ2xP2MsA?pwd=c5mu ).
# Training
Training scripts of OpenVCLIP are provided in **/script/training**. An example script can be viewed bellow. After that, you can run **weight_average.py** to perform the SWA operations.
```bash
ROOT=/PATH/TO/Open-VCLIP
CKPT=/PATH/FOR/SAVING/CKPT/
cd $ROOT
# DATA.PATH_PREFIX # need to replace with the path of the dataset
# TRAIN.CLIP_ORI_PATH # need to replace with the path of CLIP weights
# MODEL.TEMPORAL_MODELING_TYPE # selection of temporal modeling module
python -W ignore -u tools/run_net.py \
--cfg configs/Kinetics/TemporalCLIP_vitb16_8x16_STAdapter.yaml \
--opts DATA.PATH_TO_DATA_DIR $ROOT/label_db/weng_compress_full_splits \
DATA.PATH_PREFIX /dev/shm/k400 \
DATA.PATH_LABEL_SEPARATOR , \
DATA.INDEX_LABEL_MAPPING_FILE $ROOT/label_db/k400-index2cls.json \
TRAIN.ENABLE True \
OUTPUT_DIR $CKPT/basetraining/temporalclip_vitb16_8x16_interpolation_bugfix_0.5ratio_rand0.0_0.6sample \
TRAIN.BATCH_SIZE 64 \
TEST.BATCH_SIZE 240 \
TEST.NUM_ENSEMBLE_VIEWS 3 \
TEST.NUM_SPATIAL_CROPS 1 \
NUM_GPUS 8 \
SOLVER.MAX_EPOCH 22 \
SOLVER.WARMUP_EPOCHS 2.0 \
SOLVER.BASE_LR 3.33e-6 \
SOLVER.WARMUP_START_LR 3.33e-8 \
SOLVER.COSINE_END_LR 3.33e-8 \
TRAIN.MIXED_PRECISION True \
DATA.DECODING_BACKEND "pyav" \
MODEL.NUM_CLASSES 400 \
MODEL.TEMPORAL_MODELING_TYPE 'expand_temporal_view' \
MIXUP.ENABLE False \
AUG.ENABLE False \
AUG.NUM_SAMPLE 1 \
TRAIN.EVAL_PERIOD 1 \
TRAIN.CHECKPOINT_PERIOD 1 \
MODEL.LOSS_FUNC soft_cross_entropy \
TRAIN.LINEAR_CONNECT_CLIMB True \
TRAIN.CLIP_ORI_PATH ~/.cache/clip/ViT-B-16.pt \
TRAIN.LINEAR_CONNECT_LOSS_RATIO 0.5 \
TRAIN.LINEAR_CONNECT_SAMPLE_L 0.0 \
TRAIN.LINEAR_CONNECT_SAMPLE_R 0.6 \
```
# Evaluation
Download [checkpoints](https://drive.google.com/drive/folders/1VhwPFESkrr9Ed40yU5NEPIkrGIf6WQ9N?usp=share_link) and evaluate models by scripts in folder **"/script/testing/"**. An example script can be viewed bellow.
#### Example: Testing on UCF-101 (B/16)
Note: Changing the value of **TEST.PATCHING_RATIO** will change the weight interpolation factors.
```bash
ROOT=/PATH/TO/Open-VCLIP
CKPT=/PATH/FOR/SAVING/CKPT/
OUT_DIR=$CKPT/testing
LOAD_CKPT_FILE=/PATH/TO/openvclip-b16/swa_2_22.pth
PATCHING_RATIO=0.5
# DATA.PATH_TO_DATA_DIR $ROOT/zs_label_db/ucf101_full \ # option: ucf101full / ucf101_split1 / ucf101_split2 / ucf101_split3 /
# DATA.PATH_PREFIX # need to replace with the path of the dataset
# TEST.CUSTOM_LOAD_FILE # path of checkpoint to be loaded
# TEST.PATCHING_RATIO # relates to the patching ratio: [old_w * ratio + new_w * (1 - ratio)]
# TEST.CLIP_ORI_PATH # need to replace with the path of CLIP weights
# MODEL.TEMPORAL_MODELING_TYPE # selection of temporal modeling module
cd $ROOT
python -W ignore -u tools/run_net.py \
--cfg configs/Kinetics/TemporalCLIP_vitb16_8x16_STAdapter.yaml \
--opts DATA.PATH_TO_DATA_DIR $ROOT/zs_label_db/ucf101_full \
DATA.PATH_PREFIX /dev/shm/ucf/UCF-101 \
DATA.PATH_LABEL_SEPARATOR , \
DATA.INDEX_LABEL_MAPPING_FILE $ROOT/zs_label_db/ucf101-index2cls.json \
TRAIN.ENABLE False \
OUTPUT_DIR $OUT_DIR \
TEST.BATCH_SIZE 480 \
NUM_GPUS 8 \
DATA.DECODING_BACKEND "pyav" \
MODEL.NUM_CLASSES 101 \
MODEL.TEMPORAL_MODELING_TYPE 'expand_temporal_view'
TEST.CUSTOM_LOAD True \
TEST.CUSTOM_LOAD_FILE $LOAD_CKPT_FILE \
TEST.SAVE_RESULTS_PATH temp.pyth \
TEST.NUM_ENSEMBLE_VIEWS 3 \
TEST.NUM_SPATIAL_CROPS 1 \
TEST.PATCHING_MODEL True \
TEST.PATCHING_RATIO $PATCHING_RATIO \
TEST.CLIP_ORI_PATH /root/.cache/clip/ViT-B-16.pt \
```
# Acknowledgement
This repository is built upon [`PySlowFast`](https://github.com/facebookresearch/SlowFast) and [`CLIP`](https://github.com/openai/CLIP). Thanks for those well-organized codebases.
# Citation
```
@inproceedings{weng2023transforming,
title={Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization},
author={Weng, Zejia and Yang, Xitong and Li, Ang and Wu, Zuxuan and Jiang, Yu-Gang},
booktitle={ICML},
year={2023}
}
```