# PdPaper-1 **Repository Path**: linstcl/PdPaper-1 ## Basic Information - **Project Name**: PdPaper-1 - **Description**: No description, website, or topics provided. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: paddle - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-07-27 - **Last Updated**: 2022-07-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TResNet: High Performance GPU-Dedicated Architecture > From PaddlePaddle, AI Studio@Baidu [paperV2](https://arxiv.org/pdf/2003.13630.pdf) | [pretrained models](MODEL_ZOO.md) Official PyTorch Implementation > Tal Ridnik, Hussam Lawen, Asaf Noy, Itamar Friedman, Emanuel Ben Baruch, Gilad Sharir
> DAMO Academy, Alibaba Group **Abstract** > Many deep learning models, developed in recent years, reach higher > ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. > While FLOPs are often seen as a proxy for network efficiency, when > measuring actual GPU training and inference throughput, vanilla > ResNet50 is usually significantly faster than its recent competitors, > offering better throughput-accuracy trade-off. In this work, we > introduce a series of architecture modifications that aim to boost > neural networks' accuracy, while retaining their GPU training and > inference efficiency. We first demonstrate and discuss the bottlenecks > induced by FLOPs-optimizations. We then suggest alternative designs > that better utilize GPU structure and assets. Finally, we introduce a > new family of GPU-dedicated models, called TResNet, which achieve > better accuracy and efficiency than previous ConvNets. Using a TResNet > model, with similar GPU throughput to ResNet50, we reach 80.7\% > top-1 accuracy on ImageNet. Our TResNet models also transfer well and > achieve state-of-the-art accuracy on competitive datasets such as > Stanford cars (96.0\%), CIFAR-10 (99.0\%), CIFAR-100 (91.5\%) and > Oxford-Flowers (99.1\%). They also perform well on multi-label classification and object detection tasks. ## Main Article Results #### TResNet Models TResNet models accuracy and GPU throughput on ImageNet, compared to ResNet50. All measurements were done on Nvidia V100 GPU, with mixed precision. All models are trained on input resolution of 224.

Models Top Training Speed
(img/sec) Top Inference Speed
(img/sec) Max Train Batch Size Top-1 Acc.

ResNet50 805 2830 288 79.0

EfficientNetB1 440 2740 196 79.2

TResNet-M 730 2930 512 80.8

TResNet-L 345 1390 316 81.5

TResNet-XL 250 1060 240 82.0

Models	Top Training Speed (img/sec)	Top Inference Speed (img/sec)	Max Train Batch Size	Top-1 Acc.
ResNet50	805	2830	288	79.0
EfficientNetB1	440	2740	196	79.2
TResNet-M	730	2930	512	80.8
TResNet-L	345	1390	316	81.5
TResNet-XL	250	1060	240	82.0

#### Comparison To Other Networks Comparison of ResNet50 to top modern networks, with similar top-1 ImageNet accuracy. All measurements were done on Nvidia V100 GPU with mixed precision. For gaining optimal speeds, training and inference were measured on 90\% of maximal possible batch size. Except TResNet-M, all the models' ImageNet scores were taken from the [public repository](https://github.com/rwightman/pytorch-image-models), which specialized in providing top implementations for modern networks. Except EfficientNet-B1, which has input resolution of 240, all other models have input resolution of 224.

Model Top Training Speed
(img/sec) Top Inference Speed
(img/sec) Top-1 Acc. Flops[G]

ResNet50 805 2830 79.0 4.1

ResNet50-D 600 2670 79.3 4.4

ResNeXt50 490 1940 79.4 4.3

EfficientNetB1 440 2740 79.2 0.6

SEResNeXt50 400 1770 79.9 4.3

MixNet-L 400 1400 79.0 0.5

TResNet-M 730 2930 80.8 5.5

Model	Top Training Speed (img/sec)	Top Inference Speed (img/sec)	Top-1 Acc.	Flops[G]
ResNet50	805	2830	79.0	4.1
ResNet50-D	600	2670	79.3	4.4
ResNeXt50	490	1940	79.4	4.3
EfficientNetB1	440	2740	79.2	0.6
SEResNeXt50	400	1770	79.9	4.3
MixNet-L	400	1400	79.0	0.5
TResNet-M	730	2930	80.8	5.5

#### Transfer Learning SotA Results Comparison of TResNet to state-of-the-art models on transfer learning datasets (only ImageNet-based transfer learning results). Models inference speed is measured on a mixed precision V100 GPU. Since no official implementation of Gpipe was provided, its inference speed is unknown

Dataset Model Top-1
Acc. Speed
img/sec Input

CIFAR-10 Gpipe 99.0 - 480

TResNet-XL 99.0 1060 224

CIFAR-100 EfficientNet-B7 91.7 70 600

TResNet-XL 91.5 1060 224

Stanford Cars EfficientNet-B7 94.7 70 600

TResNet-L 96.0 500 368

Oxford-Flowers EfficientNet-B7 98.8 70 600

TResNet-L 99.1 500 368

## Reproduce Article Scores We provide code for reproducing the validation top-1 score of TResNet models on ImageNet. First, download pretrained models from [here](MODEL_ZOO.md). Then, run the infer.py script. For example, for tresnet_m (input size 224) run: ```bash python -m infer.py \ --val_dir=/path/to/imagenet_val_folder \ --model_path=/model/path/to/tresnet_m.pth \ --model_name=tresnet_m --input_size=224 ``` ## TResNet Training Due to IP limitations, we do not provide the exact training code that was used to obtain the article results. However, TResNet is now an integral part of the popular [rwightman / pytorch-image-models](https://github.com/rwightman/pytorch-image-models) repo. Using that repo, you can reach very similar results to the one stated in the article. For example, training tresnet_m on [rwightman / pytorch-image-models](https://github.com/rwightman/pytorch-image-models) with the command line: ```bash python -u -m torch.distributed.launch --nproc_per_node=8 \ --nnodes=1 --node_rank=0 ./train.py /data/imagenet/ \ -b=190 --lr=0.6 --model-ema --aa=rand-m9-mstd0.5-inc1 \ --num-gpu=8 -j=16 --amp \ --model=tresnet_m --epochs=300 --mixup=0.2 \ --sched='cosine' --reprob=0.4 --remode=pixel ``` gave accuracy of 80.5%.

Also, during the merge request, we had interesting discussions and insights regarding TResNet design. I am attaching a pdf version the mentioned discussions. They can shed more light on TResNet design considerations and directions for the future. [TResNet discussion and insights](https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/tresnet/TResnet_discussion.pdf) (taken with permission from [here](https://github.com/rwightman/pytorch-image-models/issues/124)) ## Tips For Working With Inplace-ABN See [INPLACE_ABN_TIPS](https://github.com/mrT23/TResNet/blob/master/INPLACE_ABN_TIPS.md). ## Citation ``` @misc{ridnik2020tresnet, title={TResNet: High Performance GPU-Dedicated Architecture}, author={Tal Ridnik and Hussam Lawen and Asaf Noy and Itamar Friedman}, year={2020}, eprint={2003.13630}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## Contact Feel free to contact me if there are any questions or issues (Tal Ridnik, tal.ridnik@alibaba-inc.com).

Dataset	Model	Top-1 Acc.	Speed img/sec	Input
CIFAR-10	Gpipe	99.0	-	480
CIFAR-10	TResNet-XL	99.0	1060	224
CIFAR-100	EfficientNet-B7	91.7	70	600
CIFAR-100	TResNet-XL	91.5	1060	224
Stanford Cars	EfficientNet-B7	94.7	70	600
Stanford Cars	TResNet-L	96.0	500	368
Oxford-Flowers	EfficientNet-B7	98.8	70	600
Oxford-Flowers	TResNet-L	99.1	500	368