• 中国科技核心期刊
  • Scopus收录期刊
  • DOAJ收录期刊
  • JST收录期刊
  • Euro Pub收录期刊
Volume 33 Issue 2
May  2025
Turn off MathJax
Article Contents
LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 229-237. doi: 10.11993/j.issn.2096-3920.2024-0182
Citation: LI Yuhui, CUI Huixia, LI Yaomin, JIA Senping. Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network[J]. Journal of Unmanned Undersea Systems, 2025, 33(2): 229-237. doi: 10.11993/j.issn.2096-3920.2024-0182

Real-Time Transformer Detection of Underwater Objects Based on Lightweight Gated Convolutional Network

doi: 10.11993/j.issn.2096-3920.2024-0182
  • Received Date: 2024-12-31
  • Accepted Date: 2025-03-13
  • Rev Recd Date: 2025-03-12
  • Available Online: 2025-03-20
  • To address the challenges in underwater object detection algorithms, including difficult image feature processing, redundant model architectures, and excessive parameter numbers, this paper proposed a real-time Transformer detection method for underwater objects based on a lightweight gated convolutional network. This method first constructed a convolutional gated linear unit based on the gating mechanism to dynamically modulate feature transmission. Furthermore, on this basis, a gated channel interaction module was proposed to fully decouple the token mixer from the channel mixer. Additionally, for the token mixer, the structural reparameterization technique was introduced to significantly reduce the computational cost of the model during inference. The hybrid encoder conducted the intra-scale information exchange and multi-scale feature fusion of the three features extracted by the gated backbone network, thus realizing the high fusion of shallow high-frequency information and deep semantic spatial information. The proposed model carried out a large number of experiments on different modal datasets. The results show that the model’s mAP@0.5 reaches 0.849, the overall number of parameters is 23.3×106, and the FPS detection frame rate is 136.8. While maintaining excellent detection accuracy, this model achieves a smaller number of model parameters and higher detection speed, with better overall performance than other models. The results reveal that compared to a series of excellent object detection models, the proposed model features sound detection performance and efficient real-time detection.

     

  • loading
  • [1]
    XU S, ZHANG M, SONG W, et al. A systematic review and analysis of deep learning-based underwater object detection[J]. Neurocomputing, 2023, 527: 204-232. doi: 10.1016/j.neucom.2023.01.056
    [2]
    YEH C H, LIN C H, KANG L W, et al. Lightweight deep neural network for joint learning of underwater object detection and color conversion[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(11): 6129-43.
    [3]
    KAUR R, SINGH S. A comprehensive review of object detection with deep learning[J]. Digital Signal Processing, 2023, 132: 103812. doi: 10.1016/j.dsp.2022.103812
    [4]
    GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 580-587.
    [5]
    HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-16. doi: 10.1109/TPAMI.2015.2389824
    [6]
    GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-48.
    [7]
    HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2020.
    [8]
    REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-49. doi: 10.1109/TPAMI.2016.2577031
    [9]
    LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Computer Vision-ECCV 2016: 14th European Conference. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
    [10]
    REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE, 2016: 779-788.
    [11]
    REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA : IEEE, 2017: 7263-7271.
    [12]
    BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. [2025-02-13]. http://arxiv.org/abs/2004.10934.
    [13]
    DOSOVITSKIY A. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2025-02-13]. https://arxiv.org/abs/2010.11929.
    [14]
    刘麒东, 沈鑫, 刘海路, 等. 基于GPA+CBAM的域自适应水下目标检测方法[J]. 水下无人系统学报, 2024, 32(5): 846-854.
    [15]
    徐凤强. 水下机器人视域中小目标检测方法研究[D]. 大连: 大连海事大学, 2021.
    [16]
    KHAN A, FOUDA M M, DO D T, et al. Underwater target detection using deep learning: methodologies, challenges, applications and future evolution[J]. IEEE Access, 2024, 12: 12618-35.
    [17]
    DAI L, LIU H, SONG P, et al. A gated cross-domain collaborative network for underwater object detection[J]. Pattern Recognition, 2024, 149: 110222. doi: 10.1016/j.patcog.2023.110222
    [18]
    FANG P, ZHENG M, FEI L, et al. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images[J]. Expert Systems with Applications, 2021, 182: 115306. doi: 10.1016/j.eswa.2021.115306
    [19]
    GAO J, ZHANG Y, GENG X, et al. PE-Transformer: Path enhanced transformer for improving underwater object detection[J]. Expert Systems with Applications, 2024, 246: 123253. doi: 10.1016/j.eswa.2024.123253
    [20]
    KNAUSGÅRD K M, WIKLUND A, SØRDALEN T K, et al. Temperate fish detection and classification: A deep learning based approach[J]. Applied Intelligence, 2022, 52(6): 6988-7001. doi: 10.1007/s10489-020-02154-9
    [21]
    CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
    [22]
    ZHANG L, YANG K, HAN Y, et al. TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving[J]. Engineering Applications of Artificial Intelligence, 2025, 139: 109536. doi: 10.1016/j.engappai.2024.109536
    [23]
    ZHAO Y, LV W, XU S, et al. Detrs beat YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024: 16965-74.
    [24]
    WANG A, CHEN H, LIN Z, et al. Repvit: Revisiting mobile cnn from vit perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024: 15909-20.
    [25]
    DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks[J]. The Journal of Machine Learning Research, 2017, 70: 933-941.
    [26]
    YU W, LUO M, ZHOU P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022: 10819-29.
    [27]
    SHI D. TransNeXt: Robust foveal visual perception for vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2024: 17773-83.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(4)

    Article Metrics

    Article Views(943) PDF Downloads(56) Cited by()
    Proportional views
    Related
    Service
    Subscribe

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return