# ndt_omp **Repository Path**: jaredjp/ndt_omp ## Basic Information - **Project Name**: ndt_omp - **Description**: 基于x86架构和openmp的ndt加速库。 需要使用pcl1.7,切勿在电脑上安装pcl1.8. 只支持RELEASE模式,因此qt编译时要选择RELEASE。 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 2 - **Created**: 2021-11-18 - **Last Updated**: 2021-12-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README **ARM上使用需要修改cmakelists** ``` #add_definitions(-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2) #set(CMAKE_CXX_FLAGS "-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2") set(CMAKE_CXX_FLAGS "-std=c++11 -fopenmp") ``` **18.04系统使用 由于pcl是1.8 ,不知道为什么pcl_ros里没有vtk的一些库,因此需要修改cmakelists** ``` # -mavx causes a lot of errors!! add_definitions(-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2) set(CMAKE_CXX_FLAGS "-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2") # pcl 1.7 causes a segfault when it is built with debug mode set(CMAKE_BUILD_TYPE "RELEASE") find_package(catkin REQUIRED COMPONENTS roscpp pcl_ros ) find_package(PCL 1.8 REQUIRED) include_directories(${PCL_INCLUDE_DIRS}) link_directories(${PCL_LIBRARY_DIRS}) add_definitions(${PCL_DEFINITIONS}) message(STATUS "PCL_INCLUDE_DIRS:" ${PCL_INCLUDE_DIRS}) message(STATUS "PCL_LIBRARY_DIRS:" ${PCL_LIBRARY_DIRS}) message(STATUS "PCL_DEFINITIONS:" ${PCL_DEFINITIONS}) ``` 2019.1.10 尝试将ndt对象保存下来(因为传入点云,进行初始化,太耗时),失败。因为这种c++的对象要保存,实际上是序列化技术,而现有的序列化技术主要是[google的protobuf和boost serialization](https://www.cnblogs.com/mfrbuaa/p/3940854.html)。protobuf效率高但是太轻量级,boost很全面,支持stl。但是pcl各种继承派生,虚基类。。太难按照标准进行序列化了。[详细教程](https://blog.csdn.net/chenaqiao/article/details/48371597) # ndt_omp This package provides an OpenMP-boosted Normal Distributions Transform (and GICP) algorithm derived from pcl. The NDT algorithm is modified to be SSE-friendly and multi-threaded. It can run up to 10 times faster than its original version in pcl. ### Benchmark (on Core i7-6700K) ``` $ roscd ndt_omp/data $ rosrun ndt_omp align 251370668.pcd 251371071.pcd --- pcl::NDT --- single : 282.222[msec] 10times: 2921.92[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 207.697[msec] 10times: 2059.19[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 139.433[msec] 10times: 1356.79[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 34.6418[msec] 10times: 317.03[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 8 threads) --- single : 54.9903[msec] 10times: 500.51[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 8 threads) --- single : 63.1442[msec] 10times: 343.336[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 8 threads) --- single : 17.2353[msec] 10times: 100.025[msec] fitness: 0.208511 ``` Several methods for neighbor voxel search are implemented. If you select pclomp::KDTREE, results will be completely same as the original pcl::NDT. We recommend to use pclomp::DIRECT7 which is faster and stable. If you need extremely fast registration, choose pclomp::DIRECT1, but it might be a bit unstable.
Red: target, Green: source, Blue: aligned AERO 15 ``` --- pcl::NDT --- single : 540.319[msec] 10times: 5289.08[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 355.901[msec] 10times: 3518.03[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 287.231[msec] 10times: 2848.42[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 68.8867[msec] 10times: 649.162[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 8 threads) --- single : 77.1235[msec] 10times: 717.681[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 8 threads) --- single : 57.8502[msec] 10times: 555.979[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 8 threads) --- single : 18.9918[msec] 10times: 149.262[msec] fitness: 0.208511 ``` 修改了cmakelists以后,禁用了sse ``` #add_definitions(-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2) #set(CMAKE_CXX_FLAGS "-std=c++11 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2") set(CMAKE_CXX_FLAGS "-std=c++11") ``` 结果 ``` --- pcl::NDT --- single : 536.016[msec] 10times: 5277.47[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 369.782[msec] 10times: 3660.63[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 305.225[msec] 10times: 3013.47[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 71.9889[msec] 10times: 686.992[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 8 threads) --- single : 96.6104[msec] 10times: 747.321[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 8 threads) --- single : 61.2044[msec] 10times: 589.004[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 8 threads) --- single : 19.3979[msec] 10times: 160.325[msec] fitness: 0.208511 ``` 台机 Intel® Core™ i7-8700 CPU @ 3.20GHz × 12 ``` --- pcl::NDT --- single : 223.293[msec] 10times: 2185.31[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 211.347[msec] 10times: 2057.26[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 82.9705[msec] 10times: 808.223[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 22.7415[msec] 10times: 205.486[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 12 threads) --- single : 36.593[msec] 10times: 311.669[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 12 threads) --- single : 16.9687[msec] 10times: 150.324[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 12 threads) --- single : 7.12142[msec] 10times: 48.8959[msec] fitness: 0.208511 ``` use tx2 修改了cmakelists以后 ``` --- pcl::NDT --- single : 967.739[msec] 10times: 9643.9[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 697.156[msec] 10times: 7116.16[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 370.99[msec] 10times: 3648.4[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 106.453[msec] 10times: 955.57[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 4 threads) --- single : 208.352[msec] 10times: 2055.04[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 4 threads) --- single : 114.866[msec] 10times: 1158.24[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 4 threads) --- single : 39.8362[msec] 10times: 291.261[msec] fitness: 0.208511 ``` 以下是APEX(xavier)的测试 ``` --- pcl::NDT --- single : 647.187[msec] 10times: 5923.38[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 573.673[msec] 10times: 5307.98[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 231.879[msec] 10times: 2015.26[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 62.265[msec] 10times: 535.759[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 8 threads) --- single : 113.542[msec] 10times: 997.613[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 8 threads) --- single : 67.8999[msec] 10times: 629.635[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 8 threads) --- single : 38.7986[msec] 10times: 322.459[msec] fitness: 0.208511 ``` 以下是海思3559a的测试 ``` --- pcl::NDT --- single : 1056.54[msec] 10times: 10484.1[msec] fitness: 0.213937 --- pclomp::NDT (KDTREE, 1 threads) --- single : 771.621[msec] 10times: 7625.17[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 1 threads) --- single : 471.365[msec] 10times: 4625.03[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 1 threads) --- single : 127.452[msec] 10times: 1190.68[msec] fitness: 0.208511 --- pclomp::NDT (KDTREE, 4 threads) --- single : 403.844[msec] 10times: 4198.75[msec] fitness: 0.213937 --- pclomp::NDT (DIRECT7, 4 threads) --- single : 265.707[msec] 10times: 2889.43[msec] fitness: 0.214205 --- pclomp::NDT (DIRECT1, 4 threads) --- single : 81.731[msec] 10times: 719.643[msec] fitness: 0.208511 ```