# AR-Seg **Repository Path**: lulululala/AR-Seg ## Basic Information - **Project Name**: AR-Seg - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-11 - **Last Updated**: 2025-02-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AR-Seg [[`Paper`](https://openaccess.thecvf.com/content/CVPR2023/html/Hu_Efficient_Semantic_Segmentation_by_Altering_Resolutions_for_Compressed_Videos_CVPR_2023_paper.html)] [[`Video`](https://www.youtube.com/watch?v=WN9ok0xd0po)] [[`BibTeX`](#citation)] > Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
[Yubin Hu](https://github.com/AlbertHuyb), [Yuze He](https://github.com/hyz317/), Yanghao Li, Jisheng Li, Yuxing Han, Jiangtao Wen, Yong-Jin Liu
CVPR 2023 ## Introduction AR-Seg is an efficient video semantic segmentation framework for compressed videos. It consists of an HR branch for keyframes and an LR branch for non-keyframes.

We design a Cross Resolution Feature Fusion (CReFF) module and a Feature Similarity Training (FST) strategy to compensate for the performance drop because of low-resolution.

## Environment ### Create from Conda Config ``` conda env create -f environment.yml conda activate AR-Seg ``` ### Create with Separate Steps ``` conda create -n AR-Seg python=3.6 conda activate AR-Seg conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch pip install -r requirements.txt ``` ## Dataset & Pre-processing Please refer to the [documentation](./pre-process/README.md). ## Evaluation We provide the sample code, checkpoints and processed data for evaluting AR-Seg on CamVid and Cityscapes datasets. ### Checkpoints Please download the checkpoints from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/bb4bedf8c7af4ec8a5b2/) / [GoogleDrive](https://drive.google.com/file/d/1u3CUNoRRDi1V1Y4b5Hv8gFwYGdKjXJtp/view?usp=share_link). And then unzip the files into directory `./checkpoints/`. After unzipping, the directory structure should look like `./checkpoints/camvid-bise18/HR/`. We release the checkpoints trained on CamVid for different LR branch resolutions, ranging from 0.3x to 0.9x. As for Cityscapes, we release the checkpoints trained for 0.5x LR resolution. ### Processed Data You can pre-process the CamVid and Cityscapes dataset following the instructions in [documentation](./pre-process/README.md), and then place the processed data under `./data/`. Or you can download our example processed data of CamVid from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/d/f358201e9ac14c4e801a/) / [GoogleDrive](https://drive.google.com/drive/folders/1EMDyP59-WE2OK8FqYFY_f3SAd7PgZ7Ld?usp=share_link). And then unzip the files into directory `./data/`. After unzipping, the directory structure should look like `./data/camvid-sequence/3M-GOP12`. ### Run the Evaluation Script You can run the evaluation script with different backbones on different datasets. ``` python evaluation.py --dataset [camvid(default) or cityscapes] --backbone [psp18(default) or bise18] --mode [1 1 1(default) or 0 0 1 or etc.] ``` For example, if you want to evaluate the HR branch performance with BiseNet-18 on CamVid, you can run the script below. ``` python evaluation.py --dataset camvid --backbone bise18 --mode 1 0 0 ``` ### Check the Evaluation Results The evaluation results will be stored under `./evaluation-result`. Each file contains $L$ $mIoU_d$ values for each reference distance, ranging from 0 to $L-1$, and the average $mIoU$ in the last row. In the example case, we have $L=12$. ## Training ### Soft Link of the Processed Dataset CamVid: ``` ln -s camvid_root ./data/CamVid ln -s camvid_sequence_root ./data/camvid-sequence ``` Note that the `camvid_root` and `camvid_sequence_root` is the same to the one you set when processing the dataset following [documentation](./pre-process/README.md). Cityscapes: ``` ln -s cityscapes_root ./data/cityscapes ln -s cityscapes_root/leftImg8bit_sequence ./data/cityscapes-sequence ``` Note that the `cityscapes_root` is the same to the one you set when processing the dataset following [documentation](./pre-process/README.md). ### Phase 1: Training of the HR branch For phase 1, you can use a pre-trained image segmentation model or train an image segmentation model from scratch. Train on CamVid: ``` ## PSPNet-18 python train.py --data-path=./data/CamVid/ --models-path=./exp/pspnet18-camvid/scale1.0_epoch100_pure --backend='resnet18' --batch-size=8 --epochs=100 --scale=1.0 --gpu=4 ## BiseNet-18 python train.py --data-path=./data/CamVid/ --models-path=./exp/bisenet18-camvid/scale1.0_epoch100_pure --backend='resnet18' --batch-size=8 --epochs=100 --scale=1.0 --gpu=7 --model_type=bisenet ``` Train on Cityscapes: ``` ## PSPNet-18 python train.py --data-path=./data/cityscapes --models-path=./exp/pspnet18-cityscapes/scale1.0_epoch200_pure_bs8_0.5-2.0-aug-512x1024-lr-0.01-semsegPSP --backend='resnet18' --batch-size=8 --epochs=200 --scale=1.0 --gpu=4 --start-lr=0.01 --model_type=pspnet --dataset=cityscapes ## For BiseNet18, we directy use a pretrained model and convert its format. ``` ### Phase 2: Training of the LR branch Train on CamVid: ``` ## PSPNet-18 python train_pair.py --data-path=./data/CamVid/ --sequence-path=./data/camvid-sequence --models-path=./exp/pspnet18-camvid/paper/camvid-psp18-scale0.5-3M-GOP12-30fps/ --backend='resnet18' --batch-size=8 --epochs=100 --scale=0.5 --gpu=0,1 --feat_loss=mse --stage1_epoch=50 --ref_gap=12 --with_motion=1 ## BiseNet-18 python train_pair.py --data-path=./data/CamVid/ --sequence-path=./data/camvid-sequence --models-path=./exp/bisenet18-camvid/paper/camvid-bise18-scale0.5-3M-GOP12-30fps/ --backend='resnet18' --batch-size=8 --epochs=100 --scale=0.5 --gpu=0 --feat_loss=mse --stage1_epoch=50 --ref_gap=12 --with_motion=1 --model_type=bisenet ``` Train on Cityscapes: ``` ## PSPNet-18 python convert_model_for_cityscapes.py --backbone psp18 python train_pair.py --data-path=./data/cityscapes --sequence-path=./data/cityscapes-sequence --models-path=./exp/pspnet18-cityscapes/paper/cityscapes-psp18-scale0.5-5M-GOP12-30fps_0.01_epoch200-semseg-auxLoss/ --backend='resnet18' --batch-size=8 --epochs=200 --scale=0.5 --gpu=1,2 --feat_loss=mse --stage1_epoch=0 --ref_gap=12 --with_motion=1 --model_type=pspnet --start-lr=0.01 --dataset=cityscapes --bitrate=5 ## BiseNet-18 python convert_model_for_cityscapes.py --backbone bise18 python train_pair.py --data-path=./data/cityscapes --sequence-path=./data/cityscapes-sequence --models-path=./exp/bisenet18-cityscapes/paper/cityscapes-bise18-scale0.5-5M-GOP12-30fps_0.01_epoch200 --backend='resnet18' --batch-size=16 --epochs=200 --scale=0.5 --gpu=2 --feat_loss=mse --start-lr=0.01 --stage1_epoch=0 --ref_gap=12 --with_motion=1 --model_type=bisenet --dataset=cityscapes --bitrate=5 ``` If you want to train on the Cityscapes dataset, please download the initialization checkpoints of BiseNet from [TsinghuaCloud](https://cloud.tsinghua.edu.cn/f/fa77fe3f16d04e57bc7b/) / [GoogleDrive](https://drive.google.com/file/d/1chFwwhlpvhb3IIWxvR5p06F5den7qPnP/view?usp=share_link). And then unzip the files into directory `./cityscapes_pretrained/`. ## Citation ``` @InProceedings{Hu_2023_CVPR, author = {Hu, Yubin and He, Yuze and Li, Yanghao and Li, Jisheng and Han, Yuxing and Wen, Jiangtao and Liu, Yong-Jin}, title = {Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {22627-22637} } ```