# AR-Seg
**Repository Path**: lulululala/AR-Seg
## Basic Information
- **Project Name**: AR-Seg
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-11
- **Last Updated**: 2025-02-11
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# AR-Seg
[[`Paper`](https://openaccess.thecvf.com/content/CVPR2023/html/Hu_Efficient_Semantic_Segmentation_by_Altering_Resolutions_for_Compressed_Videos_CVPR_2023_paper.html)] [[`Video`](https://www.youtube.com/watch?v=WN9ok0xd0po)] [[`BibTeX`](#citation)]
> Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
[Yubin Hu](https://github.com/AlbertHuyb), [Yuze He](https://github.com/hyz317/), Yanghao Li, Jisheng Li, Yuxing Han, Jiangtao Wen, Yong-Jin Liu
CVPR 2023
## Introduction
AR-Seg is an efficient video semantic segmentation framework for compressed videos. It consists of an HR branch for keyframes and an LR branch for non-keyframes.
We design a Cross Resolution Feature Fusion (CReFF) module and a Feature Similarity Training (FST) strategy to compensate for the performance drop because of low-resolution.
## Environment
### Create from Conda Config
```
conda env create -f environment.yml
conda activate AR-Seg
```
### Create with Separate Steps
```
conda create -n AR-Seg python=3.6
conda activate AR-Seg
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
pip install -r requirements.txt
```
## Dataset & Pre-processing
Please refer to the [documentation](./pre-process/README.md).
## Evaluation
We provide the sample code, checkpoints and processed data for evaluting AR-Seg on CamVid and Cityscapes datasets.
### Checkpoints
Please download the checkpoints from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/bb4bedf8c7af4ec8a5b2/) / [GoogleDrive](https://drive.google.com/file/d/1u3CUNoRRDi1V1Y4b5Hv8gFwYGdKjXJtp/view?usp=share_link). And then unzip the files into directory `./checkpoints/`. After unzipping, the directory structure should look like `./checkpoints/camvid-bise18/HR/`.
We release the checkpoints trained on CamVid for different LR branch resolutions, ranging from 0.3x to 0.9x. As for Cityscapes, we release the checkpoints trained for 0.5x LR resolution.
### Processed Data
You can pre-process the CamVid and Cityscapes dataset following the instructions in [documentation](./pre-process/README.md), and then place the processed data under `./data/`.
Or you can download our example processed data of CamVid from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/d/f358201e9ac14c4e801a/) / [GoogleDrive](https://drive.google.com/drive/folders/1EMDyP59-WE2OK8FqYFY_f3SAd7PgZ7Ld?usp=share_link). And then unzip the files into directory `./data/`. After unzipping, the directory structure should look like `./data/camvid-sequence/3M-GOP12`.
### Run the Evaluation Script
You can run the evaluation script with different backbones on different datasets.
```
python evaluation.py --dataset [camvid(default) or cityscapes] --backbone [psp18(default) or bise18] --mode [1 1 1(default) or 0 0 1 or etc.]
```
For example, if you want to evaluate the HR branch performance with BiseNet-18 on CamVid, you can run the script below.
```
python evaluation.py --dataset camvid --backbone bise18 --mode 1 0 0
```
### Check the Evaluation Results
The evaluation results will be stored under `./evaluation-result`. Each file contains $L$ $mIoU_d$ values for each reference distance, ranging from 0 to $L-1$, and the average $mIoU$ in the last row. In the example case, we have $L=12$.
## Training
### Soft Link of the Processed Dataset
CamVid:
```
ln -s camvid_root ./data/CamVid
ln -s camvid_sequence_root ./data/camvid-sequence
```
Note that the `camvid_root` and `camvid_sequence_root` is the same to the one you set when processing the dataset following [documentation](./pre-process/README.md).
Cityscapes:
```
ln -s cityscapes_root ./data/cityscapes
ln -s cityscapes_root/leftImg8bit_sequence ./data/cityscapes-sequence
```
Note that the `cityscapes_root` is the same to the one you set when processing the dataset following [documentation](./pre-process/README.md).
### Phase 1: Training of the HR branch
For phase 1, you can use a pre-trained image segmentation model or train an image segmentation model from scratch.
Train on CamVid:
```
## PSPNet-18
python train.py --data-path=./data/CamVid/ --models-path=./exp/pspnet18-camvid/scale1.0_epoch100_pure --backend='resnet18' --batch-size=8 --epochs=100 --scale=1.0 --gpu=4
## BiseNet-18
python train.py --data-path=./data/CamVid/ --models-path=./exp/bisenet18-camvid/scale1.0_epoch100_pure --backend='resnet18' --batch-size=8 --epochs=100 --scale=1.0 --gpu=7 --model_type=bisenet
```
Train on Cityscapes:
```
## PSPNet-18
python train.py --data-path=./data/cityscapes --models-path=./exp/pspnet18-cityscapes/scale1.0_epoch200_pure_bs8_0.5-2.0-aug-512x1024-lr-0.01-semsegPSP --backend='resnet18' --batch-size=8 --epochs=200 --scale=1.0 --gpu=4 --start-lr=0.01 --model_type=pspnet --dataset=cityscapes
## For BiseNet18, we directy use a pretrained model and convert its format.
```
### Phase 2: Training of the LR branch
Train on CamVid:
```
## PSPNet-18
python train_pair.py --data-path=./data/CamVid/ --sequence-path=./data/camvid-sequence --models-path=./exp/pspnet18-camvid/paper/camvid-psp18-scale0.5-3M-GOP12-30fps/ --backend='resnet18' --batch-size=8 --epochs=100 --scale=0.5 --gpu=0,1 --feat_loss=mse --stage1_epoch=50 --ref_gap=12 --with_motion=1
## BiseNet-18
python train_pair.py --data-path=./data/CamVid/ --sequence-path=./data/camvid-sequence --models-path=./exp/bisenet18-camvid/paper/camvid-bise18-scale0.5-3M-GOP12-30fps/ --backend='resnet18' --batch-size=8 --epochs=100 --scale=0.5 --gpu=0 --feat_loss=mse --stage1_epoch=50 --ref_gap=12 --with_motion=1 --model_type=bisenet
```
Train on Cityscapes:
```
## PSPNet-18
python convert_model_for_cityscapes.py --backbone psp18
python train_pair.py --data-path=./data/cityscapes --sequence-path=./data/cityscapes-sequence --models-path=./exp/pspnet18-cityscapes/paper/cityscapes-psp18-scale0.5-5M-GOP12-30fps_0.01_epoch200-semseg-auxLoss/ --backend='resnet18' --batch-size=8 --epochs=200 --scale=0.5 --gpu=1,2 --feat_loss=mse --stage1_epoch=0 --ref_gap=12 --with_motion=1 --model_type=pspnet --start-lr=0.01 --dataset=cityscapes --bitrate=5
## BiseNet-18
python convert_model_for_cityscapes.py --backbone bise18
python train_pair.py --data-path=./data/cityscapes --sequence-path=./data/cityscapes-sequence --models-path=./exp/bisenet18-cityscapes/paper/cityscapes-bise18-scale0.5-5M-GOP12-30fps_0.01_epoch200 --backend='resnet18' --batch-size=16 --epochs=200 --scale=0.5 --gpu=2 --feat_loss=mse --start-lr=0.01 --stage1_epoch=0 --ref_gap=12 --with_motion=1 --model_type=bisenet --dataset=cityscapes --bitrate=5
```
If you want to train on the Cityscapes dataset, please download the initialization checkpoints of BiseNet from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/fa77fe3f16d04e57bc7b/) / [GoogleDrive](https://drive.google.com/file/d/1chFwwhlpvhb3IIWxvR5p06F5den7qPnP/view?usp=share_link). And then unzip the files into directory `./cityscapes_pretrained/`.
## Citation
```
@InProceedings{Hu_2023_CVPR,
author = {Hu, Yubin and He, Yuze and Li, Yanghao and Li, Jisheng and Han, Yuxing and Wen, Jiangtao and Liu, Yong-Jin},
title = {Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {22627-22637}
}
```