基于MFLM-FPN 与 GAFF 的水下目标检测算法及类别平衡策略

赵岩; 李金鑫; 贾如建

doi:10.11993/j.issn.2096-3920.2026-0007

基于MFLM-FPN 与 GAFF 的水下目标检测算法及类别平衡策略

doi: 10.11993/j.issn.2096-3920.2026-0007

天津鹰眼智能有限公司, 天津, 300010

详细信息

通讯作者:
贾如建(1997-), 男, 硕士, 人工智能中级工程师, 主要研究方向为计算机视觉、目标检测、语义分割及无监督缺陷检测

中图分类号: TJ630.34; U674.941
计量
- 文章访问数: 182
- HTML全文浏览量: 82
- PDF下载量: 104
- 被引次数: 0
出版历程
- 收稿日期: 2026-01-08
- 修回日期: 2026-02-08
- 录用日期: 2026-03-04
- 网络出版日期: 2026-05-19

MFLM-FPN and GAFF-driven Underwater Target Detection Algorithms and Class Balancing Strategies

Tianjin Falconix Technology Co., Ltd., Tianjin 300010, China

摘要

摘要: 针对水下目标特征信息匮乏的问题, 文中提出融合多特征层映射特征金字塔网络(MFLM-FPN)与全局注意力特征融合(GAFF)机制的水下目标检测算法。首先构建MFLM-FPN, 将每个建议框分别映射至不同特征层, 经感兴趣区域池化后得到4个尺寸一致、信息互补的特征层, 再通过GAFF实现特征融合, 可充分利用各层特征信息, 有效缓解水下目标特征稀缺的问题。针对水下数据集类别不平衡问题, 设计复制粘贴类别平衡策略, 提升神经网络对海参、海星、扇贝等稀缺类别的关注程度。针对损失函数惩罚力度不足导致检测精度下降的问题, 在平滑L1损失函数中引入预测框与目标框的归一化距离作为惩罚项, 显著提高水下多尺度目标的定位精度。基于全国水下机器人大赛数据集开展实验验证, 所提算法的识别准确率达81.93%, 相较于基线模型Faster R-CNN提升5.71%, 有效改善了水下复杂环境下目标的漏检与误检现象。
- 水下目标检测 /
- 特征金字塔网络 /
- 全局注意力特征融合 /
- 类别平衡 /
- 损失函数
Abstract: To address the problem of scarce feature information for underwater targets, this paper proposed an underwater target detection algorithm combined with a multi-feature-layer map feature pyramid network(MFLM-FPN) and global attention feature fusion(GAFF) mechanism. Firstly, MFLM-FPN was built to map each proposal box to different feature layers, and four feature layers of consistent size and complementary information were obtained after region-of-interest pooling. Then, GAFF was used to realize feature fusion, which could make full use of feature information of each layer and effectively alleviate the problem of insufficient features of underwater targets. To address the class imbalance problem in underwater datasets, a copy-paste class balancing strategy was designed to enhance the neural network’s attention to scarce categories such as sea cucumbers, starfish, and scallops. To address the issue of insufficient penalty in the loss function leading to decreased detection accuracy, the normalized distance between the predicted and target boxes was introduced as a penalty term in the smoothed L1 loss function, significantly improving the localization accuracy of underwater multi-scale targets. Experimental results show that on the National Underwater Robotics Competition dataset, the proposed method achieves a recognition accuracy of 81.93%, a 5.71% improvement over the baseline model Faster R-CNN, effectively reducing false negatives and false positives in complex underwater environments.
- underwater target detection /
- feature pyramid network /
- globalattention feature fusion /
- class imbalance /
- loss function

HTML全文

图 1 特征金字塔网络结构

Figure 1. Structure of feature pyramid network

下载: 全尺寸图片幻灯片

图 2 多特征融合水下目标检测算法

Figure 2. Underwater target detection algorithm based on multi-feature fusion

下载: 全尺寸图片幻灯片

图 3 GAFF特征融合方案

Figure 3. GAFF feature fusion scheme

下载: 全尺寸图片幻灯片

图 4 数据增强

Figure 4. Data enhancement

下载: 全尺寸图片幻灯片

图 5 预测框与真实框距离

Figure 5. The distance between the predicted and ground-truth boxes

下载: 全尺寸图片幻灯片

图 6 增强后数据展示

Figure 6. Enhanced data presentation

下载: 全尺寸图片幻灯片

图 7 检测效果可视化

Figure 7. Visualization of detection effect

下载: 全尺寸图片幻灯片

图 8 模型可视化对比

Figure 8. Visual comparison of models

下载: 全尺寸图片幻灯片

表 1 数据增强前后各样本数量对比

Table 1. Comparison of sample quantities before and after enhancement

类别	原始数据	增强后数据
海参	5 537	16 972
海胆	22343	22 343
海星	6 841	18 280
扇贝	6 720	18 125

下载: 导出CSV

表 2 不同融合方式精度对比

Table 2. Accuracy comparison of different fusion methods %

ResNet50+FPN	相加	拼接	GAFF	mAP_smallIoU= 0.50:0.95	mAP_mediumIoU= 0.50:0.95	mAP_large IoU= 0.50:0.95	mAP_allIoU= 0.50
√				18.0	35.1	45.6	75.07
√	√			19.0	37.4	47.8	77.54
√		√		18.2	37.0	47.1	77.12
√			√	19.8	37.9	48.6	78.46

下载: 导出CSV

表 3 消融实验结果

Table 3. Ablation experiment results %

模型	海胆AP	海参AP	扇贝AP	海星AP	精确率	召回率	mAP_all IoU=0.50
基准模型	86.23	64.07	69.12	80.86	78.2	73.5	75.07
1	88.50	65.80	71.30	81.40	80.1	75.8	76.75
2	90.70	67.13	72.58	83.43	82.3	77.5	78.46
3	87.23	64.87	70.93	81.73	79.4	74.8	76.19
4	90.30	68.43	74.38	85.40	83.9	78.6	79.62
5	91.1	70.74	78.10	87.78	85.2	80.3	81.93

下载: 导出CSV

表 4 不同检测算法精度对比

Table 4. Comparison of accuracy of different detection algorithms %

算法	海胆AP	海参AP	扇贝AP	海星AP	mAP_all IoU=0.50
YOLOv4	88.60	61.10	66.80	85.10	75.40
YOLOv5	86.60	65.80	71.00	86.60	77.50
SA-FPN	74.10	74.24	83.67	75.96	76.99
RefineDet	86.10	67.10	71.80	81.10	71.80
FERNet	92.00	71.90	52.70	82.50	74.70
YOLOv11n	87.90	69.80	72.70	81.8	78.05
DETR	88.60	71.10	75.20	80.9	78.95
Faster R-CNN	86.83	64.67	69.72	81.46	76.82
文中算法	89.80	77.37	76.90	83.73	81.93

下载: 导出CSV

表 5 简单场景下单张水下图像小目标检测数量

Table 5. Small object detection counts on a single underwater image in a simple scene

算法	海胆	海参	扇贝	海星
真实标签	3	3+2(漏标)	0	2+3(漏标)
YOLOv4^[18]	5	6	0	4
YOLOv5^[19]	3	3	0	1
SA-FPN	4	4	0	4
RefineDet^[20]	4	3	0	3
FERNet^[21]	4	4	0	3
Faster R-CNN	4	4	0	4
文中算法	5	4	0	5

下载: 导出CSV

表 6 复杂场景下水下图像多类别检测总数

Table 6. Total detection counts on underwater images in complex scenes

算法	海胆	海参	扇贝	海星
真实标签	17+1(漏标)	6	27	1+1(漏标)
YOLOv4	16	4	14	2
YOLOv5	16	4	22	2
SA-FPN	16	3	24	1
RefineDet	18	6	26	2
FERNet	18	4	23	1
Faster R-CNN	18	1	30	2
文中算法	18	4	24	2

下载: 导出CSV

表 7 基于TrashCan数据集的泛化性实验

Table 7. Generalization experiments based on the TrashCan dataset %

算法	精确率	召回率	mAP_all IoU=0.50
Faster R-CNN	87.1	74.2	87.1
文中算法	92.3	80.1	93.4

下载: 导出CSV

参考文献(21)

[1]	魏楠, 杨万扣, 周伟杰, 等. 基于小波变换特征增强的水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 204-211. Wei N, Yang W K, Zhou W J, et al. Underwater object detection method with enhanced wavelet transform features[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 204-211.
[2]	焦文沛, 李杰, 张春燕, 等. 声呐图像智能感知算法综述[J]. 水下无人系统学报, 2025, 33(3): 559-572. Jiao W P, Li J, Zhang C Y, et al. Intelligent perception algorithms for sonar images: A review[J]. Journal of Unmanned Undersea Systems, 2025, 33(3): 559-572.
[3]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[4]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[5]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. 2015: 91-99.
[6]	Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision, 2016: 21-37.
[7]	张贺民, 王欣宇, 温显斌, 等. REL-YOLO: 融合边缘增强与多尺度注意力的水下目标检测方法[J/OL]. 光电子·激光. 2026-01-31. https://link.cnki.net/urlid/12.1182.o4.20260130.1241.004.
[8]	梁秀满, 张腾, 于海峰, 等. 基于改进YOLOv8的水下目标检测算法[J]. 计算机工程与设计, 2025, 46(9): 2599-2607. Liang X M, Zhang T, YU H F, et al. Underwater object detection algorithm based on improved YOLOv8[J]. Computer Engineering and Design, 2025, 46(9): 2599-2607.
[9]	王若男, 冯春, 赵政钦, 等. 水下低分辨率小目标检测算法分析[J]. 船舶工程, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12 Wang R N, Feng C, Zhao Z Q, et al. Analysis of detection algorithm for underwater low-resolution small targets[J]. Ship Engineering, 2026, 48(2): 98-108. doi: 10.13788/j.cnki.cbgc.2026.02.12
[10]	李海龙, 黄孙港, 饶兴昌. 跨尺度特征融合的自适应水下目标检测算法[J]. 电子测量技术, 2025, 48(13): 129-138. Li H L, Huang S G, Rao X C. Adaptive cross-scale feature fusion for underwater object detection algorithm[J]. Electronic Measurement Technology, 2025, 48(13): 129-138.
[11]	沈学利, 李东峰. 频域重标定与自适应稀疏金字塔水下实时目标检测[J/OL]. 激光与光电子学进展, 2026-01-31. https://link.cnki.net/urlid/31.1690.TN.20260121.1736.048.
[12]	张红瑞, 冯威铭, 杨潞霞, 等. 基于YOLO11改进的水下小目标检测算法CSAF-YOLO[J/OL]. 计算机应用, 2026-01-31. https://link.cnki.net/urlid/51.1307.TP.20260108.1256.004.
[13]	He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[14]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[15]	Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803.
[16]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[17]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011: 315-323.
[18]	Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[PP/OL]. V1. arXiv (2020-04-23)[2026-02-07]. https://doi.org/10.48550/arXiv.2004.10934.
[19]	Glenn J. YOLOv5·Github repository[EB/OL]. (2020-06-09)[2021-07-09]. https: //github. com/ultralytics/yolov5.
[20]	Fan B, Chen W, Cong Y, et al. Dual refinement underwater object detection network[C]//European Conference on Computer Vision, 2020: 275-291.
[21]	Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4203-4212.