基于激光点云与图像融合的3D目标检测研究

刘永刚; 于丰宁; 章新杰; 陈峥; 秦大同

doi:10.3901/JME.2022.24.289

基于激光点云与图像融合的3D目标检测研究

doi: 10.3901/JME.2022.24.289

刘永刚^{1, 2,*, ,},
于丰宁^1,,
章新杰^2,,
陈峥^3,,
秦大同^1,

1.
重庆大学机械与运载工程学院/机械传动国家重点实验室重庆 400044
2.
吉林大学汽车仿真与控制国家重点实验室长春 130025
3.
昆明理工大学交通工程学院昆明 650500

基金项目:

国家自然科学基金 51775063

汽车仿真与控制国家重点实验室开放基金 20201101

重庆自主品牌汽车协同创新中心揭榜挂帅项目 2022CDJDX-004

详细信息

作者简介:
于丰宁，男，1997年出生，硕士研究生。主要研究方向为智能汽车激光雷达3D目标检测。E-mail：yufengning@cqu.edu.cn

章新杰：男，1984年出生，博士，教授，博士研究生导师。主要研究方向为车辆动力学及控制、智能运载测试与评价、驾驶员模型。E-mail：x_jzhang@jlu.edu.cn

陈峥，男，1982年出生，博士，教授，博士研究生导师。主要研究方向为动力电池管理、智能车辆控制及混合动力汽车能量管理。E-mail：chen@kust.edu.cn

秦大同，男，1956年出生，博士，教授，博士研究生导师。主要研究方向为机械传动系统、车辆动力传动及其智能控制。E-mail：dtqin@cqu.edu.cn

通讯作者:
刘永刚(通信作者)，男，1982年出生，博士，教授，博士研究生导师。主要研究方向为智能汽车决策与控制关键技术、新能源汽车动力系统优化与控制、车辆自动变速传动及综合控制。E-mail：andyliuyg@cqu.edu.cn

中图分类号: TG156
计量
- 文章访问数: 211
- HTML全文浏览量: 14
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2022-01-19
- 修回日期: 2022-09-26
- 网络出版日期: 2024-03-07
- 刊出日期: 2022-12-20

Research on 3D Object Detection Based on Laser Point Cloud and Image Fusion

LIU Yonggang^{1, 2,*
, ,},
YU Fengning^1
,,
ZHANG Xinjie^2
,,
CHEN Zheng^3
,,
QIN Datong^1
,

1.
State Key Laboratory of Mechanical Transmissions, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044
2.
State Key Laboratory of Automotive Simulation and Control, Jinlin University, Changchun 130025
3.
Faculty of Transportation Engineering in Kunming University of Science and Technology, Kunming 650500

摘要

摘要: 目前基于激光雷达与摄像头融合的目标检测技术受到了广泛的关注，然而大部分融合算法难以精确检测行人、骑行人等较小目标物体，因此提出一种基于自注意力机制的点云特征融合网络。首先，改进Faster-RCNN目标检测网络以形成候选框，然后根据激光雷达和相机的投影关系提取出图像目标框中的视锥点云，减小点云的计算规模与空间搜索范围；其次，提出一种基于自注意力机制的Self-Attention PointNet网络结构，在视锥范围内对原始点云数据进行实例分割；然后，利用边界框回归PointNet网络和轻量级T-Net网络来预测目标点云的3D边界框参数，同时在损失函数中添加正则化项以提高检测精度；最后，在KITTI数据集上进行验证。结果表明，所提方法明显优于广泛应用的F-PointNet，在简单、中等和困难任务下，汽车、行人和骑行人的检测精度均得到较大的提升，其中骑行人的检测精度提升最为明显。同时，与许多主流的三维目标检测网络相比具有更高的准确率，有效地提高了3D目标检测的精度。

目前基于激光雷达与摄像头融合的目标检测技术受到了广泛的关注，然而大部分融合算法难以精确检测行人、骑行人等较小目标物体，因此提出一种基于自注意力机制的点云特征融合网络。首先，改进Faster-RCNN目标检测网络以形成候选框，然后根据激光雷达和相机的投影关系提取出图像目标框中的视锥点云，减小点云的计算规模与空间搜索范围；其次，提出一种基于自注意力机制的Self-Attention PointNet网络结构，在视锥范围内对原始点云数据进行实例分割；然后，利用边界框回归PointNet网络和轻量级T-Net网络来预测目标点云的3D边界框参数，同时在损失函数中添加正则化项以提高检测精度；最后，在KITTI数据集上进行验证。结果表明，所提方法明显优于广泛应用的F-PointNet，在简单、中等和困难任务下，汽车、行人和骑行人的检测精度均得到较大的提升，其中骑行人的检测精度提升最为明显。同时，与许多主流的三维目标检测网络相比具有更高的准确率，有效地提高了3D目标检测的精度。
- 激光雷达 /
- 3D目标检测 /
- 点云融合 /
- 注意力机制 /
- 深度学习
Abstract: At present, 3D object detection based on the fusion of lidar and camera has received extensive attention. However, most fusion algorithms are difficult to accurately detect small target objects such as pedestrians and cyclists. Therefore, a feature fusion network based on the self-attention mechanism is proposed, which fully considers the local feature information to achieve accurate 3D object detection. Firstly, to reduce the spatial search range of the point cloud, the Faster-RCNN is improved to form a candidate box. Then, the frustum point cloud was extracted according to the projection relationship between the lidar and the camera. Secondly, a Self-Attention PointNet based on the self-attention mechanism is proposed to segment the original point cloud data within the scope of the frustum. Finally, while using the PointNet and T-Net to predict the 3D bounding box parameters, the regularization term is considered in the loss function to achieve higher convergence accuracy. The KITTI dataset is used for verification and testing. The results show that this method is obviously superior to F-PointNet and the detection accuracy of cars, pedestrians, and cyclists has been greatly improved, and it has higher accuracy than mainstream 3D object detection networks.
- lidar /
- 3D object detection /
- point cloud fusion /
- attention mechanism /
- deep learning

HTML全文

下载: 全尺寸图片幻灯片

图 1 改进的Faster-RCNN网络结构

下载: 全尺寸图片幻灯片

图 2 Self-Attention Block结构

下载: 全尺寸图片幻灯片

图 3 Self-Attention Pointnet网络结构

下载: 全尺寸图片幻灯片

图 4 基于特征融合的三维点云目标检测网络结构

下载: 全尺寸图片幻灯片

图 5 2D检测图像与3D点云数据的校准效果图

下载: 全尺寸图片幻灯片

图 6 视锥切割与目标视锥候选区域提取图

下载: 全尺寸图片幻灯片

图 7 视锥朝向调整图

下载: 全尺寸图片幻灯片

图 8 目标点云坐标系转换

下载: 全尺寸图片幻灯片

图 9 T-Net网络与边界框回归网络结构

下载: 全尺寸图片幻灯片

图 10 3D目标检测训练过程中的损失函数收敛曲线和测试准确率曲线

下载: 全尺寸图片幻灯片

图 11 KITTI数据集上检测结果可视化

下载: 全尺寸图片幻灯片

表 1 不同衰减率下的目标检测精度

正则化系数	Car			Pedestrian			Cyclist
正则化系数	Easy	Moderate	Hard	Easy	Moderate	Hard	Easy	Moderate	Hard
0	82.18	68.93	61.02	64.45	55.36	48.43	68.52	50.53	47.14
0.01	83.75	69.17	62.38	64.54	55.71	48.75	71.11	53.36	50.29
0.001	82.93	69.16	61.95	67.54	57.62	50.84	68.61	51.83	48.12
0.0001	84.25	69.75	63.07	62.89	54.54	48.09	67.82	52.51	48.32

下载: 导出CSV

表 2 各处理部分对3D目标检测AP值的影响

正则化损失	Self-Attention		Car	Pedestrian				Cyclist
正则化损失	Self-Attention	Easy	Moderate	Hard	Easy	Moderate	Hard	Easy	Moderate	Hard
–	–	82.18	68.93	61.02	64.45	55.36	48.43	68.52	50.53	47.14
✓	–	82.93	69.16	61.95	67.54	57.62	50.84	68.61	51.83	48.12
–	✓	84.79	71.50	63.54	67.18	58.15	51.25	77.04	57.91	54.16
✓	✓	84.10	71.01	63.39	67.07	58.08	51.21	80.77	58.61	54.37

下载: 导出CSV

表 3 本模型与其他模型的3D目标检测AP值对比(仅汽车类别)

Method	Car
Method	Easy	Moderate	Hard
F-PointNet(v2)	83.76	70.92	63.56
PointFusion	77.92	63.00	53.27
RT3D	72.85	61.64	64.38
MV3D	71.29	62.68	56.56
VoxelNet	81.97	65.46	62.85
Ours	84.79	71.50	63.54

下载: 导出CSV

参考文献(29)

[1]	薛培林, 吴愿, 殷国栋, 等. 基于信息融合的城市自主车辆实时目标识别[J]. 机械工程学报, 2020, 56(12): 165-173. doi: 10.3901/JME.2020.12.165 XUE Peilin, WU Yuan, YIN Guodong, et al. Real-time target recognition of urban autonomous vehicles based on information fusion[J]. Chinese Journal of Mechanical Engineering, 2020, 56(12): 165-173. doi: 10.3901/JME.2020.12.165
[2]	彭育辉, 郑玮鸿, 张剑锋. 基于深度学习的道路障碍物检测方法[J]. 计算机应用, 2020, 40(8): 2428-2433. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202008040.htm PENG Yuhui, ZHENG Weihong, ZHANG Jianfeng. Road obstacle detection method based on deep learning[J]. Journal of Computer Applications, 2020, 40(8): 2428-2433. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202008040.htm
[3]	WANG D L, POSNER I. Voting for voting in online point cloud object detection[C]//Robotics: Science and Systems Xi, Sapienza Univ Rome: MIT PRESS, 2015: 13-22.
[4]	ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE Comp Soc, 2018: 4490-4499.
[5]	YAN Yan, MAO Yuxing, LI Bo. SECOND: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337-3354. doi: 10.3390/s18103337
[6]	KUANG Hongwu, WANG Bei, AN Jianping, et al. Voxel-FPN: Multi-scale voxel feature aggregation for 3d object detection from lidar point clouds[J]. Sensors, 2020, 20(3): 704-723. doi: 10.3390/s20030704
[7]	ENGELCKE M, RAO D, ZENG D, et al. Vote3Deep: fast object detection in 3d point clouds using efficient convolutional neural ntworks[C]//2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore: IEEE, 2017: 1355-1361.
[8]	B. 3D fully convolutional network for vehicle detection in point cloud[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Vancouver: IEEE, 2017: 1513-1518.
[9]	QI C R, SU Hao, MO Kaichun, et al. PointNet: Deep learning on point sets for 3d classification and segmentation[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 77-85.
[10]	QI C R, YI Li, SU Hao, et al. PointNet plus plus : Deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA: NIPS, 2017: 5099-5108.
[11]	LI Yangyan, BU Rui, SUN Mingchao, et al. PointCNN: Convolution on x-transformed points[C]//Proceedings of Advances in Neural Information Processing Systems 31, Montreal: NIPS, 2018: 820-830.
[12]	DENG Haowen, BIRDAL T, IlIE S, et al. PPFNet: Global context aware local features for robust 3D point matching[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE, 2018: 195-205.
[13]	MEYER G P, LADDHA A, KEE E, et al. LaserNet: An efficient probabilistic 3D object detector for autonomous driving[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach CA: IEEE, 2019: 12669-12678.
[14]	YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle: IEEE, 2020: arXiv: 2002.10187.
[15]	LI Bo, ZHANG Tianlei, XIA Tian. Vehicle detection from 3 D lidar using fully convolutional network[C]//Proceedings of Robotics: Science and Systems (RSS), Ann Arbor: MIT PRESS, 2016: 42-50.
[16]	CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3 D object detection network for autonomous driving[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 6526-6534.
[17]	KU J, MOZIFIAN M, LEE J, et al. Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Madrid: IEEE, 2018: 5750-5757.
[18]	QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT: IEEE, 2018: 918-927.
[19]	WANG Zhixin, JIA Kui. Frustum convNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau: IEEE, 2019: 1742-1749.
[20]	LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach CA: IEEE, 2019: 7337-7345.
[21]	LIANG Ming, YANG Bin, WANG Shenlong, et al. Deep continuous fusion for multi-sensor 3D object detection[C]//15th European Conference on Computer Vision (ECCV), Munich: Springer-Verlag Berlin, 2018: 663-678.
[22]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 2016, 36(6): 1137-1149.
[23]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu: IEEE, 2017: 936-944.
[24]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//15th European Conference on Computer Vision (ECCV), Munich: SPRINGER-VERLAG BERLIN, 2018: 3-19.
[25]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA: NIPS, 2017: 1049-1064.
[26]	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The kitti dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237. doi: 10.1177/0278364913491297
[27]	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of Advances in Neural Information Processing Systems 28, Montreal: NIPS, 2015: 2017-2025.
[28]	XU Danfei, ANGUELOV D, JAIN A. PointFusion: deep sensor fusion for 3d bounding box estimation[C]//31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City UT: IEEE, 2018: 244-253.
[29]	ZENG Yiming, HU Yu, LIU Shice, et al. RT3D: Real-time 3D vehicle detection in lidar point cloud for autonomous driving[J]. IEEE Robotics And Automation Letters, 2018, 3(4): 3434-3440. doi: 10.1109/LRA.2018.2852843