# TF-CLIP
**Repository Path**: teslatasy/TF-CLIP
## Basic Information
- **Project Name**: TF-CLIP
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-08-31
- **Last Updated**: 2024-08-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Large-scale language-image pre-trained models (e.g., CLIP) have shown superior performances on many cross-modal retrieval tasks.
However, the problem of transferring the knowledge learned from such models to video-based person re-identification (ReID) has barely been explored.
In addition, there is a lack of decent text descriptions in current ReID benchmarks.
To address these issues, in this work, we propose a novel one-stage text-free CLIP-based learning framework named **TF-CLIP** for video-based person ReID.
## :loudspeaker:News
- [2024/01/01] We make TF-CLIP public. Happy New Year!!!
## :fire: Highlight
* We propose a novel one-stage text-free CLIP-based learning framework named **TF-CLIP** for video-based person ReID.
To our best knowledge, we are the first to extract identity-specific sequence features to replace the text features of CLIP.
Meanwhile, we further design a Sequence-Specific Prompt (SSP) module to update the CLIP-Memory online.
* We propose a Temporal Memory Diffusion (TMD) module to capture temporal information.
The frame-level memories in a sequence first communicate with each other to extract temporal information.
The temporal information is then further diffused to each token, and finally aggregated to obtain more robust temporal features.
## :memo: Results
* Performance
* Pretrained Models
- [x] MARS : [Model&Code](https://pan.baidu.com/s/1k4MR3w6NPiyA49FAB5aRlQ?pwd=1234) PASSWORD: 1234
- [x] LSVID : [Model&Code](https://pan.baidu.com/s/1prJcECyiiJsN-3wBRzdMjQ?pwd=1234) PASSWORD: 1234
- [x] iLIDS : [Model&Code](https://pan.baidu.com/s/1XEqOZQPMnsAUQN5jYMkrPg?pwd=1234) PASSWORD: 1234
* t-SNE Visualization
## :bookmark_tabs:Installation
* Install the conda environment
```
conda create -n tfclip python=3.8
conda activate tfclip
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
```
* Install the required packages:
```
pip install yacs
pip install timm
pip install scikit-image
pip install tqdm
pip install ftfy
pip install regex
```
* Prepare Datasets
```
Download the datasets (MARS, LS-VID and iLIDS-VID), and then unzip them to your_dataset_dir.
```
## :car:Run TF-CLIP
For example,if you want to run method on MARS, you need to modify the bottom of configs/vit_base.yml to
```
DATASETS:
NAMES: ('MARS')
ROOT_DIR: ('your_dataset_dir')
OUTPUT_DIR: 'your_output_dir'
```
Then, run
```
CUDA_VISIBLE_DEVICES=0 python train-main.py
```
## :car:Evaluation
For example, if you want to test methods on MARS, run
```
CUDA_VISIBLE_DEVICES=0 python eval-main.py
```
## :hearts: Acknowledgment
This project is based on [CLIP-ReID](https://github.com/Syliz517/CLIP-ReID) and [XCLIP](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
Thanks for these excellent works.
## :hearts: Concat
If you have any questions, please feel free to send an email to yuchenyang@mail.dlut.edu.cn or asuradayuci@gmail.com. .^_^.
## :book: Citation
If you find TF-CLIP useful for you, please consider citing :mega:
```bibtex
@article{tfclip,
Title={TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification},
Author = {Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang, Huchuan Lu},
Volume={38},
Number={7},
Pages = {6764-6772},
Year = {2024},
booktitle= = {AAAI}
}
```
## :book: LICENSE
TF_CLIP is released under the [MIT License](https://github.com/AsuradaYuci/TF-CLIP/blob/main/LICENSE).