# RTM3D **Repository Path**: birdflyto/RTM3D ## Basic Information - **Project Name**: RTM3D - **Description**: The official PyTorch Implementation of RTM3D and KM3D for Monocular 3D Object Detection - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-11-28 - **Last Updated**: 2021-11-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving ## Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training (KM3D) RTM3D(ECCV2020) and KM3D (namely RTM3D++) are efficiency and accuracy monocular 3D object detection methods for autonomous driving. We replaced the post-processing of RTM3D with KM3D's Geometric Reasoning Module (GRM) to increase the speed of inference. [**KM3D**](https://arxiv.org/abs/2009.00764), [**RTM3D**](https://arxiv.org/abs/2001.03343) ## Introduction RTM3D is a novel one-stage and keypoints-based framework for monocular 3D objects detection. RTM3D is the first real-time system (FPS>24) for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark. KM3D reformulate the geometric constraints as a differentiable version and embed it into the net-work to reduce running time while maintaining the consistency of model outputs in an end-to-end fashion. KM3D achieves 46FPS and SOTA performance on the KITTI benchmark. RTM3D and KM3D only require RGB images without synthetic data, instance segmentation, CAD model, or depth generator. ## Highlights - **Fast:** 47FPS of single image test speed in KITTI benchmark with 384*1280 resolution - **Accuracy:** SOTA on the KITTI benchmark. - **Anchor Free:** No 2D or 3D anchor are reauired - **Differentiable geometric reasoning module:** Promote the running efficiency and optimize outputs of network jointly. Combining the strengths of both CNN and geometric constraints. - **Easy to deploy:** RTM3D and KM3D only uses conventional convolution and upsampling operations, and the geometry module only needs to solve SVD, so it is very easy to deploy and accelerate. ## KM3D Baseline and Model Zoo All experiments are tested with Ubuntu 16.04, Pytorch 1.0.0, CUDA 9.0, Python 3.6, single NVIDIA 1080Ti IoU Setting 1: Car IoU > 0.5, Pedestrian IoU > 0.25, Cyclist IoU > 0.25 IoU Setting 2: Car IoU > 0.7, Pedestrian IoU > 0.5, Cyclist IoU > 0.5 - Training on KITTI train split and evaluation on val split. - Backbone: ResNet-18 - FPS: 46.7 - Model: ([Google Drive](https://drive.google.com/file/d/14ww6mxtitO9aDszZN3ai8N7U1doehvi8/view?usp=sharing)), ([Baidu Cloud](https://pan.baidu.com/s/1zt-O6UzcBVGF-6vg5LzGpA) 提取码:60ks) | Class |AP BEV IoU Setting1 | AP 3D IoU Setting1 |AP BEV IoU Setting2 | AP 3D IoU Setting2 | | :----: | :----: | :----: |:----: | :----: | | - | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | | Car | 55.65, 40.95, 35.61 | 49.10, 35.75, 32.27 | 23.83, 17.94, 16.98 | 17.51, 13.99, 12.73 | | Pedestrian | 22.35, 18.50, 17.64 | 21.68, 18.13, 16.95 | 4.50, 3.87, 3.92 | 3.62, 3.75, 3.03 | | Cyclist | 21.25, 15.12, 14.80 | 21.04, 14.77, 14.65 | 10.70, 9.09, 9.09 | 10.01, 9.09, 9.09 | - Training on KITTI train split and evaluation on val split. - Backbone: DLA-34 - FPS: 28.6 - Model: ([Google Drive](https://drive.google.com/file/d/16IjRxXtGfS1eDv9IeDZkJUUjx4olEYnK/view?usp=sharing)), ([Baidu Cloud](https://pan.baidu.com/s/1pjr-WDY256xBBusULjqL8A) 提取码:1h6s) | Class |AP BEV IoU Setting1 | AP 3D IoU Setting1 |AP BEV IoU Setting2 | AP 3D IoU Setting2 | | :----: | :----: | :----: |:----: | :----: | | - | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | | Car | 60.98, 45.74, 42.93 | 54.97, 42.68, 36.95 | 25.96, 21.88, 18.88 | 19.19/ 16.70, 16.14 | | Pedestrian | 30.38, 26.09, 23.80 | 28.63, 25.09, 20.14 | 11.55, 11.23, 10.76 | 11.37/ 10.85, 10.11 | | Cyclist | 28.69, 18.77, 18.03 | 27.68, 18.30, 17.74 | 9.67, 6.12, 6.21 | 9.14/ 5.97, 5.86 | - Training on KITTI train split with right images augmentation and evaluation on val split. - Backbone: ResNet-18 - FPS: 46.7 - Model: ([Google Drive](https://drive.google.com/file/d/1svqj6ef79bzkiwuNIzpiLw_inDjJnSUZ/view?usp=sharing)), ([Baidu Cloud](https://pan.baidu.com/s/1gcAe2t3vmtWaST3tZPHUrg ) 提取码:sr23) | Class |AP BEV IoU Setting1 | AP 3D IoU Setting1 |AP BEV IoU Setting2 | AP 3D IoU Setting2 | | :----: | :----: | :----: |:----: | :----: | | - | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | | Car | 53.79, 39.83, 34.86 | 47.54, 34.97, 31.77 | 25.03, 18.53, 17.45 | 17.50, 14.06, 12.62 | | Pedestrian | 23.15, 19.29, 18.25 | 22.33, 18.84, 17.63 | 6.21, 6.13, 5.53 | 5.19, 5.32, 4.55 | | Cyclist | 19.49, 12.43, 12.28 | 19.53, 12.43, 12.28 | 10.77, 9.58, 9.59 | 10.33, 9.09, 9.09 | - Training on KITTI train split with right images augmentation and evaluation on val split. - Backbone: DLA-34 - FPS: 28.6 - Model: ([Google Drive](https://drive.google.com/file/d/1oVroM_VOdxvR4qkWe40T2rtahhA795h0/view?usp=sharing)), ([Baidu Cloud](https://pan.baidu.com/s/1rT46n6fajVQ_19gtkaXU4w) 提取码:qqk6) | Class |AP BEV IoU Setting1 | AP 3D IoU Setting1 |AP BEV IoU Setting2 | AP 3D IoU Setting2 | | :----: | :----: | :----: |:----: | :----: | | - | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | Easy / Moderate / Hard | | Car | 63.23, 50.35, 44.56 | 59.10, 44.23, 38.04 | 30.05, 23.07, 21.86 | 22.29, 17.45, 16.86 | | Pedestrian | 32.42, 27.20, 21.51 | 31.86, 26.75, 21.33 | 14.73, 12.54, 11.74 | 12.92, 11.62, 11.06 | | Cyclist | 34.64, 21.98, 22.07 | 34.01, 21.73, 19.68 | 16.89, 11.18, 10.24 | 14.35, 9.42, 9.25 | ## Installation Please refer to [INSTALL.md](readme/INSTALL.md) ## Dataset preparation Please download the official [KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) dataset and organize the downloaded files as follows: ``` KM3DNet ├── kitti_format │ ├── data │ │ ├── kitti │ │ | ├── annotations │ │ │ ├── calib /000000.txt ..... │ │ │ ├── image(left[0-7480] right[7481-14961] input augmentatiom) │ │ │ ├── label /000000.txt ..... | | | ├── train.txt val.txt trainval.txt ├── src ├── demo_kitti_format ├── readme ├── requirements.txt ``` ## Quick Demo Please refer to [DEMO.md](readme/DEMO.md) for a quick demo to test with a pretrained model and visualize the predicted results on your custom data or the original KITTI data. ## Getting Started Please refer to [GETTING_STARTED.md](readme/GETTING_STARTED.md) to learn more usage about this project. ## Acknowledgement - [**CenterNet**](https://github.com/xingyizhou/CenterNet) ## License RTM3D and KM3D are released under the MIT License (refer to the LICENSE file for details). Portions of the code are borrowed from, [CenterNet](https://github.com/xingyizhou/CenterNet), [dla](https://github.com/ucbdrive/dla) (DLA network), [DCNv2](https://github.com/CharlesShang/DCNv2)(deformable convolutions), [iou3d](https://github.com/sshaoshuai/PointRCNN) and [kitti_eval](https://github.com/prclibo/kitti_eval) (KITTI dataset evaluation). Please refer to the original License of these projects (See [NOTICE](NOTICE)). ## Citation If you find this project useful for your research, please use the following BibTeX entry. @misc{2009.00764, Author = {Peixuan Li}, Title = {Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training}, Year = {2020}, Eprint = {arXiv:2009.00764}, } @misc{2001.03343, Author = {Peixuan Li and Huaici Zhao and Pengfei Liu and Feidao Cao}, Title = {RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving}, Year = {2020}, Eprint = {arXiv:2001.03343}, }