• 中国科技核心期刊
  • JST收录期刊
  • Scopus收录期刊
  • DOAJ收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于轻量化门控卷积网络的实时Transformer水下目标检测方法

李瑜辉 崔慧霞 李耀敏 贾森平

李瑜辉, 崔慧霞, 李耀敏, 等. 基于轻量化门控卷积网络的实时Transformer水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 229-237 doi: 10.11993/j.issn.2096-3920.2024-0182
引用本文: 李瑜辉, 崔慧霞, 李耀敏, 等. 基于轻量化门控卷积网络的实时Transformer水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 229-237 doi: 10.11993/j.issn.2096-3920.2024-0182
LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 229-237. doi: 10.11993/j.issn.2096-3920.2024-0182
Citation: LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 229-237. doi: 10.11993/j.issn.2096-3920.2024-0182

基于轻量化门控卷积网络的实时Transformer水下目标检测方法

doi: 10.11993/j.issn.2096-3920.2024-0182
基金项目: 机器人学国家重点实验室开放基金资助(2024-O23).
详细信息
    作者简介:

    李瑜辉(2001-), 男, 在读硕士, 主要研究方向为水下目标检测

  • 中图分类号: TJ630.34; U674.941

Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network

  • 摘要: 针对水下目标检测算法图像特征处理困难、模型结构冗余以及参数量庞大等问题, 提出一种基于轻量化门控卷积网络的实时Transformer水下目标检测方法。该方法首先基于门控思想构建卷积门控线性单元, 动态调节特征的传递, 并以此为基础提出门控通道交互模块, 该模块通过完全解耦token mixer(词元混合器)和channel mixer(通道混合器), 并针对token mixer部分引入结构重新参数化技术, 极大降低了模型在推理过程中的计算成本。混合编码器针对门控骨干网络提取的3个特征分别进行尺度内信息交互和多尺度特征融合, 实现浅层高频率信息和深层语义空间信息之间的高度融合。文中模型在多个不同模态数据集上进行了大量实验, 实验结果显示, 模型的mAP@0.5达到了0.849, 整体参数量为23.3×106, 检测帧率为136.8。该模型在保持优秀检测精度的同时, 实现了较小的模型参数量和较高的检测速度, 整体性能优于其他模型。结果表明, 与一系列优秀的目标检测模型相比, 文中模型具备较高的检测性能和高效的实时检测能力。

     

  • 图  1  GC-DETR模型结果图

    Figure  1.  Structure of GC-DETR

    图  2  门控线性单元原理图

    Figure  2.  Schematic diagram of GLU

    图  3  GCIM结构

    Figure  3.  Structure of GCIM

    图  4  门控骨干网络结构

    Figure  4.  Structure of GLbone

    图  5  GC-DETR可视化混淆矩阵

    Figure  5.  Confusion matrix of GC-DETR on datasets

    图  6  水下目标检测可视化结果图

    Figure  6.  Visual results of underwater object detection

    图  7  水下目标检测专门算法可视化结果图

    Figure  7.  Visual results of specialized algorithms for underwater object detection

    图  8  目标特征可视化结果图

    Figure  8.  Visual results of object feature

    表  1  在DUO数据集上对比实验结果

    Table  1.   Comparative experimental results on the DUO dataset

    模型 mAP@0.5 mAP@
    0.5∶0.95
    Params FLOPs FPS
    /(帧/s)
    Faster
    R-CNN
    0.819 0.613 41.14×106 63.26×109 19.2
    Cascade
    R-CNN
    0.839 0.500 69.00×106 72.00×109 27.3
    RetinaNet 0.704 0.461 36.17×106 52.62×109 39.2
    GFL 0.837 0.655 74.20×106 47.50×109 19.8
    YOLOv5s 0.813 0.492 7.03×106 16.00×109 164.2
    YOLOv7 0.801 0.559 6.00×106 13.30×109 191.0
    YOLOv8 0.812 0.573 3.20×106 8.70×109 205.6
    Deformable
    DETR
    0.844 0.637 41.30×106 193.00×109 16.0
    RTDETR 0.844 0.635 38.60×106 57.00×109 131.1
    RTDETR+
    GLbone
    0.849 0.639 23.30×106 30.60×109 136.8
    下载: 导出CSV

    表  2  水下目标检测算法在DUO数据集上对比实验

    Table  2.   The underwater target detection algorithms are compared on DUO dataset

    模型mAP@0.5精确率召回率F1-score
    HolothurianEchinusScallopStarfishHolothurianEchinusScallopStarfish
    Boosting-R-CNN0.8320.7100.7450.4160.7930.9030.9190.7460.9100.754
    RoIA0.8280.8390.7310.4510.8490.7500.8960.4220.8430.723
    Rtmdet0.8460.7550.8250.5890.7940.8700.8930.6440.8940.781
    GCC-Net0.8390.8390.7310.4520.8490.7500.8960.4220.8430.723
    文中方法0.8490.8420.8640.7830.8570.8000.8590.7830.8570.811
    下载: 导出CSV

    表  3  Trashcan数据集对比实验结果

    Table  3.   Comparative experimental results on the Trashcan dataset

    模型 mAP@0.5 mAP@
    0.5∶0.95
    Params FLOPs FPS/
    (帧/s)
    Faster
    R-CNN
    0.553 0.312 41.14×106 63.26×109 19.2
    Cascade
    R-CNN
    0.543 0.341 42.00×106 270.30×109 9.6
    Dino 0.286 0.183 47.56×106 226.30×109 61.2
    YOLOv7 0.434 0.241 6.00×106 13.30×109 191.0
    Deformable
    DETR
    0.569 0.361 41.30×106 193.00×109 16.0
    RTDETR 0.578 0.387 38.60×106 57.00×109 131.1
    RTDETR+
    GLbone
    0.577 0.384 23.30×106 30.60×109 136.8
    下载: 导出CSV

    表  4  URPC数据集各类别实验结果

    Table  4.   Experimental results on URPC dataset

    目标精确率召回率mAP@0.5
    0.9290.9380.945
    圆形笼子0.9430.9470.936
    立方体0.9630.9990.995
    圆柱体0.9460.8460.902
    人体0.9010.8980.907
    金属桶0.7710.9310.928
    方形笼子0.8940.8810.837
    轮胎0.9600.7280.783
    下载: 导出CSV
  • [1] XU S, ZHANG M, SONG W, et al. A systematic review and analysis of deep learning-based underwater object detection[J]. Neurocomputing, 2023, 527: 204-232. doi: 10.1016/j.neucom.2023.01.056
    [2] YEH C H, LIN C H, KANG L W, et al. Lightweight deep neural network for joint learning of underwater object detection and color conversion[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(11): 6129-43.
    [3] KAUR R, SINGH S. A comprehensive review of object detection with deep learning[J]. Digital Signal Processing, 2023, 132: 103812. doi: 10.1016/j.dsp.2022.103812
    [4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 580-587.
    [5] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-16. doi: 10.1109/TPAMI.2015.2389824
    [6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-48.
    [7] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2020.
    [8] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-49. doi: 10.1109/TPAMI.2016.2577031
    [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Computer Vision-ECCV 2016: 14th European Conference. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
    [10] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE, 2016: 779-788.
    [11] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA : IEEE, 2017: 7263-7271.
    [12] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. [2025-02-13]. http://arxiv.org/abs/2004.10934.
    [13] DOSOVITSKIY A. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2025-02-13]. https://arxiv.org/abs/2010.11929.
    [14] 刘麒东, 沈鑫, 刘海路, 等. 基于GPA+CBAM的域自适应水下目标检测方法[J]. 水下无人系统学报, 2024, 32(5): 846-854.
    [15] 徐凤强. 水下机器人视域中小目标检测方法研究[D]. 大连: 大连海事大学, 2021.
    [16] KHAN A, FOUDA M M, DO D T, et al. Underwater target detection using deep learning: methodologies, challenges, applications and future evolution[J]. IEEE Access, 2024, 12: 12618-35.
    [17] DAI L, LIU H, SONG P, et al. A gated cross-domain collaborative network for underwater object detection[J]. Pattern Recognition, 2024, 149: 110222. doi: 10.1016/j.patcog.2023.110222
    [18] FANG P, ZHENG M, FEI L, et al. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images[J]. Expert Systems with Applications, 2021, 182: 115306. doi: 10.1016/j.eswa.2021.115306
    [19] GAO J, ZHANG Y, GENG X, et al. PE-Transformer: Path enhanced transformer for improving underwater object detection[J]. Expert Systems with Applications, 2024, 246: 123253. doi: 10.1016/j.eswa.2024.123253
    [20] KNAUSGÅRD K M, WIKLUND A, SØRDALEN T K, et al. Temperate fish detection and classification: A deep learning based approach[J]. Applied Intelligence, 2022, 52(6): 6988-7001. doi: 10.1007/s10489-020-02154-9
    [21] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
    [22] ZHANG L, YANG K, HAN Y, et al. TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving[J]. Engineering Applications of Artificial Intelligence, 2025, 139: 109536. doi: 10.1016/j.engappai.2024.109536
    [23] ZHAO Y, LV W, XU S, et al. Detrs beat YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024: 16965-74.
    [24] WANG A, CHEN H, LIN Z, et al. Repvit: Revisiting mobile cnn from vit perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024: 15909-20.
    [25] DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks[J]. The Journal of Machine Learning Research, 2017, 70: 933-941.
    [26] YU W, LUO M, ZHOU P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 10819-29.
    [27] SHI D. TransNeXt: Robust foveal visual perception for vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2024: 17773-83.
  • 加载中
图(8) / 表(4)
计量
  • 文章访问数:  90
  • HTML全文浏览量:  59
  • PDF下载量:  30
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-31
  • 修回日期:  2025-03-12
  • 录用日期:  2025-03-13
  • 网络出版日期:  2025-03-20

目录

    /

    返回文章
    返回
    服务号
    订阅号