中文题名: | 基于毫米波雷达和视觉融合的道路目标检测研究 |
姓名: | |
学号: | SZ2016105 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085404 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位: | 专业学位硕士 |
学校: | 南京航空航天大学 |
院系: | |
专业: | |
第一导师姓名: | |
第一导师单位: | |
完成日期: | 2023-01-04 |
答辩日期: | 2023-03-17 |
外文题名: |
Research on Road Target Detection Based on Millimeter Wave Radar and Vision Fusion |
中文关键词: | |
外文关键词: | Machine Vision ; Multi-sensor fusion ; Millimeter-wave radar ; Attention mechanism ; Road target detection |
中文摘要: |
在如今复杂的交通环境下,自动驾驶技术的发展对于保障人们的乘车安全十分重要。道路感知作为自动驾驶技术中重要的一环,受到了工业和学术界的重视。目标检测是道路感知的关键技术之一,用于识别道路上的车辆和行人目标。特斯拉,百度Apollo等大多数驾驶辅助系统的主流感知传感器是毫米波雷达,视觉摄像机和激光雷达。其中,由于毫米波雷达良好的环境适应性和视觉摄像机分辨率的优越性,将二者采集到的信息进行融合,对减少复杂道路环境和天气变化时单一传感器的目标漏检、误检具有重要意义。然而,毫米波雷达点云和视觉图像在内在性质和表现形式上的巨大差异给融合带来了重大挑战。因此,本文从多模态数据形式统一和多模态数据对齐两方面考虑了现有特征级融合方案存在的不足,提出了两种基于毫米波雷达和摄像机的道路目标检测方法。 第一, 针对点云投影为图像过程中损失三维空间信息而影响检测识别率的问题,提出了一种基于体素的点云特征提取方法。该方法充分考虑了点云稀疏且无序的特性,通过将三维的不规则、无序点云转化为规则有序的二维特征图来统一点云和图像的表示形式。同时,针对因远距离目标在图像中占比小,可视化信息少导致有效特征难以提取的问题,提出了基于点云的空间注意力模块RSA。通过将二维点云特征图作为该模块的输入获得潜在目标空间位置的表征,并将表征结果用于增强图像目标的特征表达,从而提升远距离目标的视觉检测精度。进一步地,通过在RSA模块中增加不同尺寸的卷积层提取特征图中不同的感受野信息,获得更强的图像表征能力,提出了基于点云的多尺度空间注意力聚合模块M-RSA。最后,本研究将点云经过上述处理后的输出作为额外分支和图像输入到RetinaNet网络进行目标检测。 第二, 针对标定误差可能导致点云和图像数据硬关联失败而影响检测识别率的问题,提出了一种粗细结合的点云-像素对齐模块。该模块包括两个部分:(1)根据图像上下文信息对投影生成的点云图像进行前景和背景分割,实现粗对齐;(2)进一步地,引入自注意力机制对前景点云和图像像素进行关系推理,以获得最优的关联结果。经过两个层次的一致性增强,保证了点云与像素的有效对齐。最后,本研究在CenterNet网络基础上增加该模块构建了雷视融合的道路目标检测算法。 在公开数据集nuScenes上的相关实验结果表明了本文提出的两种方法在道路目标检测任务中具有更好的准确性和鲁棒性。 |
外文摘要: |
In today's complex traffic environment, the development of autonomous driving technology is very important to ensure people's safety. Road perception is an important component of autonomous driving technology and has received attention from industry and academia. Target detection is one of the key technologies in road perception, which is used to identify vehicles and pedestrians on the road. The mainstream sensing sensors in most driver assistance systems such as Tesla, Baidu and Apollo are millimeter wave radar, vision camera and lidar. Among them, due to the good environmental adaptability of millimeter wave radar and the superior resolution of visual camera, the integration of the information collected by the two is of great significance for reducing the missed and false detection of targets by a single sensor under complex road environment and weather changes. However, the vast differences in intrinsic property and manifestation of millimeter wave radar point cloud and visual image pose a major challenge to fusion. Thus, we consider the shortcomings of existing feature-level fusion methods from formal unification of multimodal data and alignment of multimodal data, and propose two methods for road target detection based on millimeter wave radar and camera in this thesis. Firstly, a voxel-based point cloud feature extraction method is proposed to solve the problem that the loss of 3D spatial information affects the detection recognition rate when point clouds are projected onto image. This method fully considers the sparse and disordered characteristics of point clouds, and unifies the representation of point clouds and images by transforming 3D irregular and disordered point clouds into a regular and ordered 2D feature map. Meanwhile, the point cloud based spatial attention module RSA is proposed for the small proportion of distant targets in image and little visualization information, which makes it difficult to extract effective features. The 2D point cloud feature map is used as the input of RSA to obtain a representation of the spatial location of potential targets for enhancing the feature representation of image targets, thus improving the visual detection accuracy of distant targets. Further, the point cloud based multi-scale spatial attention aggregation module M-RSA is proposed by adding convolutional layers of different sizes to RSA for extracting different perceptual field information in feature maps and obtaining stronger image characterization. Finally, the output of the point cloud after the above processing is used as an additional branch with image input to RetinaNet for target detection in this thesis. Secondly, a coarse-fine combined point cloud-pixel alignment module is proposed to solve the problem that calibration errors may lead to the failure of hard association between point cloud and image data which affect the detection recognition rate. The module consists of two components:(1) image context information is used to achieve the foreground and background segmentation of the projected point cloud image for coarse alignment; (2) further, self-attention mechanism is introduced to reason about the associations between the foreground point clouds and the image pixels to obtain the optimal association results. After two consistency enhancements, effective alignment of the point clouds with the pixels is ensured. Finally, the module proposed is added to CenterNet for constructing a road target detection algorithm based on millimeter wave radar and vision fusion. Relevant experimental results on the publicly available dataset nuScenes show that the two methods proposed in this thesis have better accuracy and robustness in the road target detection task. |
中图分类号: | U471.15 |
馆藏号: | 2023-016-0286 |
开放日期: | 2023-10-03 |