• 中国科技核心期刊
  • JST收录期刊
  • Scopus收录期刊
  • DOAJ收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于轻量化门控卷积网络的实时Transformer水下目标检测方法

李瑜辉 崔慧霞 李耀敏 贾森平

李瑜辉, 崔慧霞, 李耀敏, 等. 基于轻量化门控卷积网络的实时Transformer水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 1-10 doi: 10.11993/j.issn.2096-3920.2024-0182
引用本文: 李瑜辉, 崔慧霞, 李耀敏, 等. 基于轻量化门控卷积网络的实时Transformer水下目标检测方法[J]. 水下无人系统学报, 2025, 33(2): 1-10 doi: 10.11993/j.issn.2096-3920.2024-0182
LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Lightweight Real-Time Underwater Object Detection Transformer Based on Gated Convolution Network[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2024-0182
Citation: LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Lightweight Real-Time Underwater Object Detection Transformer Based on Gated Convolution Network[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2024-0182

基于轻量化门控卷积网络的实时Transformer水下目标检测方法

doi: 10.11993/j.issn.2096-3920.2024-0182
基金项目: 机器人学国家重点实验室开放基金资助(2024-O23).
详细信息
    作者简介:

    李瑜辉(2001-), 男, 在读硕士, 主要研究方向为水下目标检测

  • 中图分类号: TJ630.34; U674.941

Lightweight Real-Time Underwater Object Detection Transformer Based on Gated Convolution Network

  • 摘要: 针对水下目标检测算法图像特征处理困难、模型结构冗余、参数量庞大等问题, 提出基于轻量化门控卷积网络的实时Transformer水下目标检测方法, 该方法首先基于门控思想构建卷积门控线性单元, 动态调节特征的传递, 并以此为基础提出门控通道交互模块, 该模块通过完全解耦token mixer和channel mixer, 并针对token mixer部分引入结构重新参数化技术, 极大降低模型在推理过程中的计算成本。混合编码器针对门控骨干网络提取的3个特征分别进行尺度内信息交互和多尺度特征融合, 实现浅层高频率信息和深层语义空间信息之间的高度融合。文中模型在多个不同模态数据集上进行了大量实验, 其中mAP@0.5达到了0.849, 整体参数量为23.3, FPS检测帧率为136.8, 保持该系列模型优秀检测精度的同时, 实现了较小的模型参数量和较高的检测帧率, 整体优于其他模型。结果表明, 文中模型与一系列优秀的目标检测模型相比具有优秀的检测性能和高效的实时检测能力。

     

  • 图  1  GC-DETR模型结果图

    Figure  1.  The structure of GC-DETR

    图  2  门控线性单元原理图

    Figure  2.  Schematic diagram of GLU

    图  3  门控通道交互模块图

    Figure  3.  The structure of GCIM

    图  4  门控骨干网络结构图

    Figure  4.  The structure of GLbone

    图  5  可视化混淆矩阵

    Figure  5.  Confusion matrix of GC-DETR on datasets

    图  6  水下目标检测可视化结果图

    Figure  6.  Visual results of underwater object detection

    图  7  水下目标检测可视化结果图

    Figure  7.  Visual results of underwater object detection

    图  8  目标特征可视化结果图

    Figure  8.  Visual results of object feature

    表  1  在DUO数据集上对比实验结果

    Table  1.   Comparative experimental results on the DUO dataset

    模型 mAP@0.5 mAP@
    0.5:0.95
    Params
    (M)
    FLOPs
    (G)
    FPS
    Faster R-CNN 0.819 0.613 41.14 63.26 19.2
    Cascade R-CNN 0.839 0.500 69.0 72.0 27.3
    RetinaNet 0.704 0.461 36.17 52.62 39.2
    GFL 0.837 0.655 74.2 47.5 19.8
    YOLOv5s 0.813 0.492 7.03 16.0 164.2
    YOLOv7 0.801 0.559 6.0 13.3 191.0
    YOLOv8 0.812 0.573 3.2 8.7 205.6
    Deformable DETR 0.844 0.637 41.3 193.0 16.0
    RTDETR 0.844 0.635 38.6 57.0 131.1
    RTDETR+GLbone 0.849 0.639 23.3 30.6 136.8
    下载: 导出CSV

    表  2  水下目标检测算法在DUO数据集上对比实验

    Table  2.   The underwater target detection algorithms are compared on DUO dataset

    模型mAP@0.5精确率召回率F1-score
    HolothurianEchinusScallopStarfishHolothurianEchinusScallopStarfish
    Boosting-R-CNN0.8320.7100.7450.4160.7930.9030.9190.7460.9100.754
    RoIA0.8280.8390.7310.4510.8490.7500.8960.4220.8430.723
    Rtmdet0.8460.7550.8250.5890.7940.8700.8930.6440.8940.781
    GCC-Net0.8390.8390.7310.4520.8490.7500.8960.4220.8430.723
    文中方法0.8490.8420.8640.7830.8570.8000.8590.7830.8570.811
    下载: 导出CSV

    表  3  在Trashcan数据集上对比实验结果

    Table  3.   Comparative experimental results on the Trashcan dataset

    模型 mAP@0.5 mAP@
    0.5:0.95
    Params
    /Mb
    FLOPs
    /Gb
    FPS
    Faster R-CNN 0.553 0.312 41.14 63.26 19.2
    Cascade R-CNN 0.543 0.341 42.00 270.30 9.6
    Dino 0.286 0.183 47.56 226.30 61.2
    YOLOv7 0.434 0.241 6.00 13.30 191.0
    Deformable DETR 0.569 0.361 41.30 193.00 16.0
    RTDETR 0.578 0.387 38.60 57.00 131.1
    RTDETR+GLbone 0.577 0.384 23.30 30.60 136.8
    下载: 导出CSV

    表  4  在URPC数据集上各类别实验结果

    Table  4.   Experimental results on URPC dataset

    目标精确率召回率mAP@0.5
    Ball0.9290.9380.945
    Circle cage0.9430.9470.936
    Cube0.9630.9990.995
    Cylinder0.9460.8460.902
    Human body0.9010.8980.907
    Metal bucket0.7710.9310.928
    Square cage0.8940.8810.837
    Tyre0.9600.7280.783
    下载: 导出CSV
  • [1] XU S, ZHANG M, SONG W, et al. A systematic review and analysis of deep learning-based underwater object detection[J]. Neurocomputing, 2023, 527: 204-232. doi: 10.1016/j.neucom.2023.01.056
    [2] YEH C H, LIN C H, KANG L W, et al. Lightweight deep neural network for joint learning of underwater object detection and color conversion[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(11): 6129-43.
    [3] KAUR R, SINGH S. A comprehensive review of object detection with deep learning[J]. Digital Signal Processing, 2023, 132: 103812. doi: 10.1016/j.dsp.2022.103812
    [4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
    [5] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-16. doi: 10.1109/TPAMI.2015.2389824
    [6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-48.
    [7] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2020.
    [8] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-49. doi: 10.1109/TPAMI.2016.2577031
    [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Computer Vision-ECCV 2016: 14th European Conference. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
    [10] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE, 2016: 779-788.
    [11] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA : IEEE, 2017: 7263-7271.
    [12] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. [2025-02-13]. http://arxiv.org/abs/2004.10934.
    [13] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2025-02-13]. https://arxiv.org/abs/2010.11929
    [14] 刘麒东, 沈鑫, 刘海路, 等. 基于GPA+CBAM的域自适应水下目标检测方法[J]. 水下无人系统学报, 2024, 32(5): 846-854.

    LIU Q, SHEN X, LIU H, et al. Domain-adaptive underwater target detection method based on GPA+ CBAM[J]. Journal of Unmanned Undersea Systems, 2024, 32(5): 846-854.
    [15] 徐凤强. 水下机器人视域中小目标检测方法研究[D]. 大连: 大连海事大学, 2021.
    [16] KHAN A, FOUDA M M, Do D T, et al. Underwater target detection using deep learning: methodologies, challenges, applications and future evolution[J]. IEEE Access, 2024.
    [17] DAI L, LIU H, SONG P, et al. A gated cross-domain collaborative network for underwater object detection[J]. Pattern Recognition, 2024, 149: 110222. doi: 10.1016/j.patcog.2023.110222
    [18] FANG P, ZHENG M, FEI L, et al. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images[J]. Expert Systems with Applications, 2021, 182: 115306. doi: 10.1016/j.eswa.2021.115306
    [19] GAO J, ZHANG Y, GENG X, et al. PE-Transformer: Path enhanced transformer for improving underwater object detection[J]. Expert Systems with Applications, 2024, 246: 123253. doi: 10.1016/j.eswa.2024.123253
    [20] KNAUSGÅRD K M, WIKLUND A, SØRDALEN T K, et al. Temperate fish detection and classification: A deep learning based approach[J]. Applied Intelligence, 2022, 52(6): 6988-7001. doi: 10.1007/s10489-020-02154-9
    [21] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.
    [22] ZHANG L, YANG K, HAN Y, et al. TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving[J]. Engineering Applications of Artificial Intelligence, 2025, 139: 109536. doi: 10.1016/j.engappai.2024.109536
    [23] ZHAO Y, LV W, XU S, et al. Detrs beat YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 16965-16974.
    [24] WANG A, CHEN H, LIN Z, et al. Repvit: Revisiting mobile cnn from vit perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 15909-15920.
    [25] DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks[J]. The Journal of Machine Learning Research, 2017, 70: 933-941.
    [26] YU W, LUO M, ZHOU P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10819-29.
    [27] SHI D. TransNeXt: Robust foveal visual perception for vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 17773-83.
  • 加载中
图(8) / 表(4)
计量
  • 文章访问数:  16
  • HTML全文浏览量:  9
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-31
  • 修回日期:  2025-03-12
  • 录用日期:  2025-03-13
  • 网络出版日期:  2025-03-20

目录

    /

    返回文章
    返回
    服务号
    订阅号