Research on 3D Object Detection Based on Laser Point Cloud and Image Fusion
-
摘要: 目前基于激光雷达与摄像头融合的目标检测技术受到了广泛的关注,然而大部分融合算法难以精确检测行人、骑行人等较小目标物体,因此提出一种基于自注意力机制的点云特征融合网络。首先,改进Faster-RCNN目标检测网络以形成候选框,然后根据激光雷达和相机的投影关系提取出图像目标框中的视锥点云,减小点云的计算规模与空间搜索范围;其次,提出一种基于自注意力机制的Self-Attention PointNet网络结构,在视锥范围内对原始点云数据进行实例分割;然后,利用边界框回归PointNet网络和轻量级T-Net网络来预测目标点云的3D边界框参数,同时在损失函数中添加正则化项以提高检测精度;最后,在KITTI数据集上进行验证。结果表明,所提方法明显优于广泛应用的F-PointNet,在简单、中等和困难任务下,汽车、行人和骑行人的检测精度均得到较大的提升,其中骑行人的检测精度提升最为明显。同时,与许多主流的三维目标检测网络相比具有更高的准确率,有效地提高了3D目标检测的精度。Abstract: At present, 3D object detection based on the fusion of lidar and camera has received extensive attention. However, most fusion algorithms are difficult to accurately detect small target objects such as pedestrians and cyclists. Therefore, a feature fusion network based on the self-attention mechanism is proposed, which fully considers the local feature information to achieve accurate 3D object detection. Firstly, to reduce the spatial search range of the point cloud, the Faster-RCNN is improved to form a candidate box. Then, the frustum point cloud was extracted according to the projection relationship between the lidar and the camera. Secondly, a Self-Attention PointNet based on the self-attention mechanism is proposed to segment the original point cloud data within the scope of the frustum. Finally, while using the PointNet and T-Net to predict the 3D bounding box parameters, the regularization term is considered in the loss function to achieve higher convergence accuracy. The KITTI dataset is used for verification and testing. The results show that this method is obviously superior to F-PointNet and the detection accuracy of cars, pedestrians, and cyclists has been greatly improved, and it has higher accuracy than mainstream 3D object detection networks.
-
Key words:
- lidar /
- 3D object detection /
- point cloud fusion /
- attention mechanism /
- deep learning
-
表 1 不同衰减率下的目标检测精度
正则化系数 Car Pedestrian Cyclist Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard 0 82.18 68.93 61.02 64.45 55.36 48.43 68.52 50.53 47.14 0.01 83.75 69.17 62.38 64.54 55.71 48.75 71.11 53.36 50.29 0.001 82.93 69.16 61.95 67.54 57.62 50.84 68.61 51.83 48.12 0.0001 84.25 69.75 63.07 62.89 54.54 48.09 67.82 52.51 48.32 表 2 各处理部分对3D目标检测AP值的影响
正则化损失 Self-Attention Car Pedestrian Cyclist Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard – – 82.18 68.93 61.02 64.45 55.36 48.43 68.52 50.53 47.14 ✓ – 82.93 69.16 61.95 67.54 57.62 50.84 68.61 51.83 48.12 – ✓ 84.79 71.50 63.54 67.18 58.15 51.25 77.04 57.91 54.16 ✓ ✓ 84.10 71.01 63.39 67.07 58.08 51.21 80.77 58.61 54.37 表 3 本模型与其他模型的3D目标检测AP值对比(仅汽车类别)
Method Car Easy Moderate Hard F-PointNet(v2) 83.76 70.92 63.56 PointFusion 77.92 63.00 53.27 RT3D 72.85 61.64 64.38 MV3D 71.29 62.68 56.56 VoxelNet 81.97 65.46 62.85 Ours 84.79 71.50 63.54 -
[1] 薛培林, 吴愿, 殷国栋, 等. 基于信息融合的城市自主车辆实时目标识别[J]. 机械工程学报, 2020, 56(12): 165-173. doi: 10.3901/JME.2020.12.165XUE Peilin, WU Yuan, YIN Guodong, et al. Real-time target recognition of urban autonomous vehicles based on information fusion[J]. Chinese Journal of Mechanical Engineering, 2020, 56(12): 165-173. doi: 10.3901/JME.2020.12.165 [2] 彭育辉, 郑玮鸿, 张剑锋. 基于深度学习的道路障碍物检测方法[J]. 计算机应用, 2020, 40(8): 2428-2433. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202008040.htmPENG Yuhui, ZHENG Weihong, ZHANG Jianfeng. Road obstacle detection method based on deep learning[J]. Journal of Computer Applications, 2020, 40(8): 2428-2433. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202008040.htm [3] WANG D L, POSNER I. Voting for voting in online point cloud object detection[C]//Robotics: Science and Systems Xi, Sapienza Univ Rome: MIT PRESS, 2015: 13-22. [4] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE Comp Soc, 2018: 4490-4499. [5] YAN Yan, MAO Yuxing, LI Bo. SECOND: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337-3354. doi: 10.3390/s18103337 [6] KUANG Hongwu, WANG Bei, AN Jianping, et al. Voxel-FPN: Multi-scale voxel feature aggregation for 3d object detection from lidar point clouds[J]. Sensors, 2020, 20(3): 704-723. doi: 10.3390/s20030704 [7] ENGELCKE M, RAO D, ZENG D, et al. Vote3Deep: fast object detection in 3d point clouds using efficient convolutional neural ntworks[C]//2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore: IEEE, 2017: 1355-1361. [8] B. 3D fully convolutional network for vehicle detection in point cloud[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Vancouver: IEEE, 2017: 1513-1518. [9] QI C R, SU Hao, MO Kaichun, et al. PointNet: Deep learning on point sets for 3d classification and segmentation[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 77-85. [10] QI C R, YI Li, SU Hao, et al. PointNet plus plus : Deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA: NIPS, 2017: 5099-5108. [11] LI Yangyan, BU Rui, SUN Mingchao, et al. PointCNN: Convolution on x-transformed points[C]//Proceedings of Advances in Neural Information Processing Systems 31, Montreal: NIPS, 2018: 820-830. [12] DENG Haowen, BIRDAL T, IlIE S, et al. PPFNet: Global context aware local features for robust 3D point matching[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE, 2018: 195-205. [13] MEYER G P, LADDHA A, KEE E, et al. LaserNet: An efficient probabilistic 3D object detector for autonomous driving[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach CA: IEEE, 2019: 12669-12678. [14] YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle: IEEE, 2020: arXiv: 2002.10187. [15] LI Bo, ZHANG Tianlei, XIA Tian. Vehicle detection from 3 D lidar using fully convolutional network[C]//Proceedings of Robotics: Science and Systems (RSS), Ann Arbor: MIT PRESS, 2016: 42-50. [16] CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3 D object detection network for autonomous driving[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 6526-6534. [17] KU J, MOZIFIAN M, LEE J, et al. Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Madrid: IEEE, 2018: 5750-5757. [18] QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE, 2018: 918-927. [19] WANG Zhixin, JIA Kui. Frustum convNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau: IEEE, 2019: 1742-1749. [20] LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach CA: IEEE, 2019: 7337-7345. [21] LIANG Ming, YANG Bin, WANG Shenlong, et al. Deep continuous fusion for multi-sensor 3D object detection[C]//15th European Conference on Computer Vision (ECCV), Munich: Springer-Verlag Berlin, 2018: 663-678. [22] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 2016, 36(6): 1137-1149. [23] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 936-944. [24] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//15th European Conference on Computer Vision (ECCV), Munich: SPRINGER-VERLAG BERLIN, 2018: 3-19. [25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA: NIPS, 2017: 1049-1064. [26] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The kitti dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237. doi: 10.1177/0278364913491297 [27] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of Advances in Neural Information Processing Systems 28, Montreal: NIPS, 2015: 2017-2025. [28] XU Danfei, ANGUELOV D, JAIN A. PointFusion: deep sensor fusion for 3d bounding box estimation[C]//31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City UT: IEEE, 2018: 244-253. [29] ZENG Yiming, HU Yu, LIU Shice, et al. RT3D: Real-time 3D vehicle detection in lidar point cloud for autonomous driving[J]. IEEE Robotics And Automation Letters, 2018, 3(4): 3434-3440. doi: 10.1109/LRA.2018.2852843