基于MFLM-FPN 与 GAFF 的水下目标检测算法及类别平衡策略

赵岩; 李金鑫; 贾如建

doi:10.11993/j.issn.2096-3920.2026-0007

基于MFLM-FPN 与 GAFF 的水下目标检测算法及类别平衡策略

doi: 10.11993/j.issn.2096-3920.2026-0007

天津鹰眼智能有限公司, 天津, 300010

详细信息

作者简介:
赵岩：赵　岩(1993-), 男, 硕士, 人工智能中级工程师, 主要研究方向为计算机视觉、目标检测、语义分割及无监督缺陷检测

中图分类号: TJ630.34; U674.941
计量
- 文章访问数: 66
- HTML全文浏览量: 27
- PDF下载量: 54
- 被引次数: 0
出版历程
- 收稿日期: 2026-01-08
- 修回日期: 2026-02-08
- 录用日期: 2026-03-04
- 网络出版日期: 2026-05-19

MFLM-FPN and GAFF-driven Underwater Target Detection Algorithms and Class Balancing Strategies

Tianjin Falconix Technology Co., Ltd., Tianjin 300010, China

摘要

摘要: 针对水下目标特征信息匮乏的问题, 文中提出多特征层映射特征金字塔与全局注意力特征融合机制。该机制将每个建议框分别映射至不同特征层, 经感兴趣区域池化后得到4个尺寸一致、信息互补的特征层, 再通过全局注意力实现特征融合, 可充分利用各层特征信息, 有效缓解水下目标特征稀缺的问题。针对水下数据集类别不平衡问题, 设计复制粘贴类别平衡策略, 提升神经网络对海参、海星、扇贝等稀缺类别的关注程度。针对损失函数惩罚力度不足导致检测精度下降的问题, 在平滑L1损失函数中引入预测框与目标框的归一化距离作为惩罚项, 显著提高水下多尺度目标的定位精度。实验结果表明, 在全国水下机器人大赛数据集上, 所提方法的识别准确率达81.93%, 相较于基线模型Faster R-CNN提升5.71%, 有效改善了水下复杂环境下目标的漏检与误检现象。
- 水下目标检测 /
- 深度学习 /
- 特征金字塔 /
- 特征融合 /
- 注意力机制 /
- 计算机视觉
Abstract: To address the problem of scarce feature information for underwater targets, this paper proposes a feature pyramid mapping mechanism combined with global attention. This mechanism maps each proposal box to different feature layers, resulting in four feature layers of consistent size and complementary information after region-of-interest pooling. Global attention is then used to achieve feature fusion, fully utilizing the feature information from each layer and effectively alleviating the problem of feature scarcity for underwater targets. To address the class imbalance problem in underwater datasets, a copy-paste class balancing strategy is designed to enhance the neural network's attention to scarce categories such as sea cucumbers, starfish, and scallops. To address the issue of insufficient penalty in the loss function leading to decreased detection accuracy, the normalized distance between the predicted and target boxes is introduced as a penalty term in the smoothed L1 loss function, significantly improving the localization accuracy of underwater multi-scale targets. Experimental results show that on the National Underwater Robotics Competition dataset, the proposed method achieves a recognition accuracy of 81.93%, a 5.71% improvement over the baseline model Faster R-CNN, effectively reducing false negatives and false positives in complex underwater environments.
- deep learning /
- underwater target detection /
- feature pyramid /
- feature fusion /
- attention mechanism /
- computer vision

HTML全文

图 1 特征金字塔结构

Figure 1. Feature pyramid structure

下载: 全尺寸图片幻灯片

图 2 多特征融合水下目标检测算法

Figure 2. Underwater target detection algorithm based on multi-feature fusion

下载: 全尺寸图片幻灯片

图 3 GAFF特征融合方案

Figure 3. GAFF feature fusion scheme

下载: 全尺寸图片幻灯片

图 4 数据增强

Figure 4. data enhancement

下载: 全尺寸图片幻灯片

图 5 预测框与目标框距离示意图

Figure 5. The distance between the predicted and target boxes

下载: 全尺寸图片幻灯片

图 6 增强后数据展示

Figure 6. Enhanced data presentation

下载: 全尺寸图片幻灯片

图 7 检测效果可视化

Figure 7. Visualization of detection effect

下载: 全尺寸图片幻灯片

图 8 模型可视化对比

Figure 8. Visual comparison between models

下载: 全尺寸图片幻灯片

表 1 增强前后数据集对比

Table 1. Comparison of datasets before and after enhancement

类别	原始数据	增强后
海参	5 537	16 972
海胆	2 2343	22 343
海星	6 841	18 280
扇贝	6 720	18 125

下载: 导出CSV

表 2 不同融合方式精度对比

Table 2. Accuracy comparison of different fusion methods %

ResNet50+FPN	相加	拼接	GAFF	mAPsmallIoU= 0.50:0.95	mAP mediumIoU= 0.50:0.95	mAPlargeIoU= 0.50:0.95	mAPallIoU= 0.50
√				18.0	35.1	45.6	75.07
√	√			19.0	37.4	47.8	77.54
√		√		18.2	37.0	47.1	77.12
√			√	19.8	37.9	48.6	78.46

下载: 导出CSV

表 3 消融实验

Table 3. Ablation experiment %

算法及模型	海胆	海参	扇贝	海星	精确率	召回率	mAPall (IoU=0.50)
Faster R-CNN	86.23	64.07	69.12	80.86	78.2	73.5	75.07
1	88.50	65.80	71.30	81.40	80.1	75.8	76.75
2	90.70	67.13	72.58	83.43	82.3	77.5	78.46
3	87.23	64.87	70.93	81.73	79.4	74.8	76.19
4	90.30	68.43	74.38	85.40	83.9	78.6	79.62
5	91.1	70.74	78.10	87.78	85.2	80.3	81.93

下载: 导出CSV

表 4 不同检测算法精度对比

Table 4. Comparison of accuracy of different detection algorithms %

算法	海胆	海参	扇贝	海星	mAParea=all (IoU=0.50)
YOLOv4	88.60	61.10	66.80	85.10	75.40
YOLOv5	86.60	65.80	71.00	86.60	77.50
SA-FPN	74.10	74.24	83.67	75.96	76.99
RefineDet	86.10	67.10	71.80	81.10	71.80
FERNet	92.00	71.90	52.70	82.50	74.70
YOLOv11n	87.90	69.80	72.70	81.8	78.05
DETR	88.60	71.10	75.20	80.9	78.95
Faster R-CNN	86.83	64.67	69.72	81.46	76.82
文中算法	89.80	77.37	76.90	83.73	81.93

下载: 导出CSV

表 5 简单场景下单张水下图像的小目标检测数量

Table 5. Small object detection counts on a single underwater image in a simple scene

算法	海胆	海参	扇贝	海星
真实标签	3	3+2(漏标)	0	2+3(漏标)
YOLOv4^[20]	5	6	0	4
YOLOv5^[21]	3	3	0	1
SA-FPN	4	4	0	4
RefineDet^[23]	4	3	0	3
FERNet^[24]	4	4	0	3
Faster R-CNN	4	4	0	4
文中算法	5	4	0	5

下载: 导出CSV

表 6 复杂场景下水下图像的多类别检测总数

Table 6. Total detection counts per category on underwater images in complex scenes

算法	海胆	海参	扇贝	海星
真实标签	17+1 (漏标)	6	27	1+1 (漏标)
YOLOv4	16	4	14	2
YOLOv5	16	4	22	2
SA-FPN	16	3	24	1
RefineDet	18	6	26	2
FERNet	18	4	23	1
Faster R-CNN	18	1	30	2
文中算法	18	4	24	2

下载: 导出CSV

表 7 基于TrashCan数据集的泛化性实验

Table 7. Generalization experiments based on the TrashCan dataset %

算法	精确率	召回率	mAParea=all (IoU=0.50)
Faster R-CNN	87.1	74.2	87.1
文中算法	92.3	80.1	93.4

下载: 导出CSV

参考文献(23)

[1]	魏楠, 杨万扣, 周伟杰, 等. 基于小波变换特征增强的水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 204-211. Wei N, Yang W K, Zhou W J, et al. Underwater object detection method with enhanced wavelet transform features[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 204-211.
[2]	焦文沛, 李杰, 张春燕, 等. 声呐图像智能感知算法综述[J]. 水下无人系统学报, 2025, 33(3): 559-572. Jiao W P, Li J, Zhang C Y, et al. Intelligent perception algorithms for sonar images: A review[J]. Journal of Unmanned Undersea Systems, 2025, 33(3): 559-572.
[3]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
[4]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[5]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28: 1440-1448. doi: 10.1109/tpami.2016.2577031
[6]	Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision, 2016: 21-37.
[7]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.
[8]	张贺民, 王欣宇, 温显斌, 等. REL-YOLO: 融合边缘增强与多尺度注意力的水下目标检测方法[J/OL]. 光电子·激光. [2026-01-31]. https://link.cnki.net/urlid/12.1182.o4.20260130.1241.004.
[9]	梁秀满, 张腾, 于海峰, 等. 基于改进YOLOv8的水下目标检测算法[J]. 计算机工程与设计, 2025, 46(9): 2599-2607. Liang X M, Zhang T, YU H F, et al. Underwater object detection algorithm based on improved YOLOv8[J]. Computer Engineering and Design, 2025, 46(9): 2599-2607.
[10]	王若男, 冯春, 赵政钦, 等. 水下低分辨率小目标检测算法分析[J]. 船舶工程, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12 Wang R N, Feng C, Zhao Z Q, et al. Analysis of detection algorithm for underwater low-resolution small targets[J]. Ship Engineering, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12
[11]	李海龙, 黄孙港, 饶兴昌. 跨尺度特征融合的自适应水下目标检测算法[J]. 电子测量技术, 2025, 48(13): 129-138. Li J L, Huang S G, Rao X C. Adaptive cross-scale feature fusion for underwater object detection algorithm[J]. Electronic Measurement Technology, 2025, 48(13): 129-138.
[12]	沈学利, 李东峰. 频域重标定与自适应稀疏金字塔水下实时目标检测[J/OL]. 激光与光电子学进展, [2026-01-31]. https://link.cnki.net/urlid/31.1690.TN.20260121.1736.048.
[13]	张红瑞, 冯威铭, 杨潞霞, 等. 基于YOLO11改进的水下小目标检测算法CSAF-YOLO[J/OL]. 计算机应用, [2026-01-31]. https://link.cnki.net/urlid/51.1307.TP.20260108.1256.004.
[14]	HE K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[15]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[16]	Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 7794-7803.
[17]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 7132-7141.
[18]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011: 315-323.
[19]	Ioffe S, SzegedY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning, 2015: 448-456.
[20]	Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[PP/OL]. V1. arXiv (2020-04-23)[2026-02-07]. https://doi.org/10.48550/arXiv.2004.10934.
[21]	Glenn J. YOLOv5·Github repository[EB/OL]. (2020-06-09)[2021-07-09]. https: //github. com/ultralytics/yolov5.
[22]	Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 4203-4212.
[23]	Fan B, Chen W, Cong Y, et al. Dual refinement underwater object detection network[C]//European Conference on Computer Vision, 2020: 275-291.