MFLM-FPN and GAFF-driven Underwater Target Detection Algorithms and Class Balancing Strategies
-
摘要: 针对水下目标特征信息匮乏的问题, 文中提出多特征层映射特征金字塔与全局注意力特征融合机制。该机制将每个建议框分别映射至不同特征层, 经感兴趣区域池化后得到4个尺寸一致、信息互补的特征层, 再通过全局注意力实现特征融合, 可充分利用各层特征信息, 有效缓解水下目标特征稀缺的问题。针对水下数据集类别不平衡问题, 设计复制粘贴类别平衡策略, 提升神经网络对海参、海星、扇贝等稀缺类别的关注程度。针对损失函数惩罚力度不足导致检测精度下降的问题, 在平滑L1损失函数中引入预测框与目标框的归一化距离作为惩罚项, 显著提高水下多尺度目标的定位精度。实验结果表明, 在全国水下机器人大赛数据集上, 所提方法的识别准确率达81.93%, 相较于基线模型Faster R-CNN提升5.71%, 有效改善了水下复杂环境下目标的漏检与误检现象。Abstract: To address the problem of scarce feature information for underwater targets, this paper proposes a feature pyramid mapping mechanism combined with global attention. This mechanism maps each proposal box to different feature layers, resulting in four feature layers of consistent size and complementary information after region-of-interest pooling. Global attention is then used to achieve feature fusion, fully utilizing the feature information from each layer and effectively alleviating the problem of feature scarcity for underwater targets. To address the class imbalance problem in underwater datasets, a copy-paste class balancing strategy is designed to enhance the neural network's attention to scarce categories such as sea cucumbers, starfish, and scallops. To address the issue of insufficient penalty in the loss function leading to decreased detection accuracy, the normalized distance between the predicted and target boxes is introduced as a penalty term in the smoothed L1 loss function, significantly improving the localization accuracy of underwater multi-scale targets. Experimental results show that on the National Underwater Robotics Competition dataset, the proposed method achieves a recognition accuracy of 81.93%, a 5.71% improvement over the baseline model Faster R-CNN, effectively reducing false negatives and false positives in complex underwater environments.
-
Key words:
- deep learning /
- underwater target detection /
- feature pyramid /
- feature fusion /
- attention mechanism /
- computer vision
-
表 1 增强前后数据集对比
Table 1. Comparison of datasets before and after enhancement
类别 原始数据 增强后 海参 5 537 16 972 海胆 2 2343 22 343 海星 6 841 18 280 扇贝 6 720 18 125 表 2 不同融合方式精度对比
Table 2. Accuracy comparison of different fusion methods
% ResNet50+FPN 相加 拼接 GAFF mAPsmallIoU=
0.50:0.95mAP mediumIoU=
0.50:0.95mAPlargeIoU=
0.50:0.95mAPallIoU=
0.50√ 18.0 35.1 45.6 75.07 √ √ 19.0 37.4 47.8 77.54 √ √ 18.2 37.0 47.1 77.12 √ √ 19.8 37.9 48.6 78.46 表 3 消融实验
Table 3. Ablation experiment
% 算法及模型 海胆 海参 扇贝 海星 精确率 召回率 mAPall
(IoU=0.50)Faster R-CNN 86.23 64.07 69.12 80.86 78.2 73.5 75.07 1 88.50 65.80 71.30 81.40 80.1 75.8 76.75 2 90.70 67.13 72.58 83.43 82.3 77.5 78.46 3 87.23 64.87 70.93 81.73 79.4 74.8 76.19 4 90.30 68.43 74.38 85.40 83.9 78.6 79.62 5 91.1 70.74 78.10 87.78 85.2 80.3 81.93 表 4 不同检测算法精度对比
Table 4. Comparison of accuracy of different detection algorithms
% 算法 海胆 海参 扇贝 海星 mAParea=all
(IoU=0.50)YOLOv4 88.60 61.10 66.80 85.10 75.40 YOLOv5 86.60 65.80 71.00 86.60 77.50 SA-FPN 74.10 74.24 83.67 75.96 76.99 RefineDet 86.10 67.10 71.80 81.10 71.80 FERNet 92.00 71.90 52.70 82.50 74.70 YOLOv11n 87.90 69.80 72.70 81.8 78.05 DETR 88.60 71.10 75.20 80.9 78.95 Faster R-CNN 86.83 64.67 69.72 81.46 76.82 文中算法 89.80 77.37 76.90 83.73 81.93 表 5 简单场景下单张水下图像的小目标检测数量
Table 5. Small object detection counts on a single underwater image in a simple scene
表 6 复杂场景下水下图像的多类别检测总数
Table 6. Total detection counts per category on underwater images in complex scenes
算法 海胆 海参 扇贝 海星 真实标签 17+1
(漏标)6 27 1+1
(漏标)YOLOv4 16 4 14 2 YOLOv5 16 4 22 2 SA-FPN 16 3 24 1 RefineDet 18 6 26 2 FERNet 18 4 23 1 Faster R-CNN 18 1 30 2 文中算法 18 4 24 2 表 7 基于TrashCan数据集的泛化性实验
Table 7. Generalization experiments based on the TrashCan dataset
% 算法 精确率 召回率 mAParea=all
(IoU=0.50)Faster R-CNN 87.1 74.2 87.1 文中算法 92.3 80.1 93.4 -
[1] 魏楠, 杨万扣, 周伟杰, 等. 基于小波变换特征增强的水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 204-211.Wei N, Yang W K, Zhou W J, et al. Underwater object detection method with enhanced wavelet transform features[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 204-211. [2] 焦文沛, 李杰, 张春燕, 等. 声呐图像智能感知算法综述[J]. 水下无人系统学报, 2025, 33(3): 559-572.Jiao W P, Li J, Zhang C Y, et al. Intelligent perception algorithms for sonar images: A review[J]. Journal of Unmanned Undersea Systems, 2025, 33(3): 559-572. [3] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587. [4] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448. [5] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28: 1440-1448. doi: 10.1109/tpami.2016.2577031 [6] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision, 2016: 21-37. [7] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788. [8] 张贺民, 王欣宇, 温显斌, 等. REL-YOLO: 融合边缘增强与多尺度注意力的水下目标检测方法[J/OL]. 光电子·激光. [2026-01-31]. https://link.cnki.net/urlid/12.1182.o4.20260130.1241.004. [9] 梁秀满, 张腾, 于海峰, 等. 基于改进YOLOv8的水下目标检测算法[J]. 计算机工程与设计, 2025, 46(9): 2599-2607.Liang X M, Zhang T, YU H F, et al. Underwater object detection algorithm based on improved YOLOv8[J]. Computer Engineering and Design, 2025, 46(9): 2599-2607. [10] 王若男, 冯春, 赵政钦, 等. 水下低分辨率小目标检测算法分析[J]. 船舶工程, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12Wang R N, Feng C, Zhao Z Q, et al. Analysis of detection algorithm for underwater low-resolution small targets[J]. Ship Engineering, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12 [11] 李海龙, 黄孙港, 饶兴昌. 跨尺度特征融合的自适应水下目标检测算法[J]. 电子测量技术, 2025, 48(13): 129-138.Li J L, Huang S G, Rao X C. Adaptive cross-scale feature fusion for underwater object detection algorithm[J]. Electronic Measurement Technology, 2025, 48(13): 129-138. [12] 沈学利, 李东峰. 频域重标定与自适应稀疏金字塔水下实时目标检测[J/OL]. 激光与光电子学进展, [2026-01-31]. https://link.cnki.net/urlid/31.1690.TN.20260121.1736.048. [13] 张红瑞, 冯威铭, 杨潞霞, 等. 基于YOLO11改进的水下小目标检测算法CSAF-YOLO[J/OL]. 计算机应用, [2026-01-31]. https://link.cnki.net/urlid/51.1307.TP.20260108.1256.004. [14] HE K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969. [15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [16] Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 7794-7803. [17] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 7132-7141. [18] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011: 315-323. [19] Ioffe S, SzegedY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning, 2015: 448-456. [20] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[PP/OL]. V1. arXiv (2020-04-23)[2026-02-07]. https://doi.org/10.48550/arXiv.2004.10934. [21] Glenn J. YOLOv5·Github repository[EB/OL]. (2020-06-09)[2021-07-09]. https: //github. com/ultralytics/yolov5. [22] Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 4203-4212. [23] Fan B, Chen W, Cong Y, et al. Dual refinement underwater object detection network[C]//European Conference on Computer Vision, 2020: 275-291. -

下载: