# PdPaper-1
**Repository Path**: linstcl/PdPaper-1
## Basic Information
- **Project Name**: PdPaper-1
- **Description**: No description, website, or topics provided.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: paddle
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-07-27
- **Last Updated**: 2022-07-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# TResNet: High Performance GPU-Dedicated Architecture
> From PaddlePaddle, AI Studio@Baidu
[paperV2](https://arxiv.org/pdf/2003.13630.pdf) |
[pretrained models](MODEL_ZOO.md)
Official PyTorch Implementation
> Tal Ridnik, Hussam Lawen, Asaf Noy, Itamar Friedman, Emanuel Ben Baruch, Gilad Sharir
> DAMO Academy, Alibaba Group
**Abstract**
> Many deep learning models, developed in recent years, reach higher
> ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count.
> While FLOPs are often seen as a proxy for network efficiency, when
> measuring actual GPU training and inference throughput, vanilla
> ResNet50 is usually significantly faster than its recent competitors,
> offering better throughput-accuracy trade-off. In this work, we
> introduce a series of architecture modifications that aim to boost
> neural networks' accuracy, while retaining their GPU training and
> inference efficiency. We first demonstrate and discuss the bottlenecks
> induced by FLOPs-optimizations. We then suggest alternative designs
> that better utilize GPU structure and assets. Finally, we introduce a
> new family of GPU-dedicated models, called TResNet, which achieve
> better accuracy and efficiency than previous ConvNets. Using a TResNet
> model, with similar GPU throughput to ResNet50, we reach 80.7\%
> top-1 accuracy on ImageNet. Our TResNet models also transfer well and
> achieve state-of-the-art accuracy on competitive datasets such as
> Stanford cars (96.0\%), CIFAR-10 (99.0\%), CIFAR-100 (91.5\%) and
> Oxford-Flowers (99.1\%). They also perform well on multi-label classification and object detection tasks.
## Main Article Results
#### TResNet Models
TResNet models accuracy and GPU throughput on ImageNet, compared to ResNet50. All measurements were done on Nvidia V100 GPU, with mixed precision. All models are trained on input resolution of 224.
| Models | Top Training Speed (img/sec) |
Top Inference Speed (img/sec) |
Max Train Batch Size | Top-1 Acc. |
|---|---|---|---|---|
| ResNet50 | 805 | 2830 | 288 | 79.0 |
| EfficientNetB1 | 440 | 2740 | 196 | 79.2 |
| TResNet-M | 730 | 2930 | 512 | 80.8 |
| TResNet-L | 345 | 1390 | 316 | 81.5 |
| TResNet-XL | 250 | 1060 | 240 | 82.0 |
| Model | Top Training Speed (img/sec) |
Top Inference Speed (img/sec) |
Top-1 Acc. | Flops[G] |
|---|---|---|---|---|
| ResNet50 | 805 | 2830 | 79.0 | 4.1 |
| ResNet50-D | 600 | 2670 | 79.3 | 4.4 |
| ResNeXt50 | 490 | 1940 | 79.4 | 4.3 |
| EfficientNetB1 | 440 | 2740 | 79.2 | 0.6 |
| SEResNeXt50 | 400 | 1770 | 79.9 | 4.3 |
| MixNet-L | 400 | 1400 | 79.0 | 0.5 |
| TResNet-M | 730 | 2930 | 80.8 | 5.5 |
![]() |
![]() |
| Dataset | Model |
Top-1
Acc. |
Speed
img/sec |
Input |
| CIFAR-10 | Gpipe | 99.0 | - | 480 |
| TResNet-XL | 99.0 | 1060 | 224 | |
| CIFAR-100 | EfficientNet-B7 | 91.7 | 70 | 600 |
| TResNet-XL | 91.5 | 1060 | 224 | |
| Stanford Cars | EfficientNet-B7 | 94.7 | 70 | 600 |
| TResNet-L | 96.0 | 500 | 368 | |
| Oxford-Flowers | EfficientNet-B7 | 98.8 | 70 | 600 |
| TResNet-L | 99.1 | 500 | 368 |