Intelligent Perception Algorithms for Sonar Images: A Review
-
摘要: 声呐图像智能感知算法在海洋探测与水下救援中具有至关重要的作用。近年来, 深度学习技术在声呐图像智能感知任务中取得了显著进展。文中对该领域进行了全面梳理, 从声呐图像数据集与数据增强技术、经典的声呐图像处理算法以及基于深度学习的声呐图像处理方法3个方面进行探讨。首先, 归纳了不同任务的开源数据集与常用的数据增强技术, 为后续研究提供数据支撑; 其次, 系统回顾了从经典算法到基于深度学习的先进算法在不同任务中的应用与发展现状, 旨在为研究者提供系统的领域概览; 最后, 展望了未来的研究方向, 指出可通过获取更大规模的声呐图像数据、设计更强健的算法以及开发更适用于真实水下场景的任务设置等方面, 进一步提升声呐图像的解译能力。Abstract: Intelligent perception algorithms for sonar images are vital in ocean exploration and underwater rescue. In recent years, deep learning has achieved remarkable advancements in intelligent perception tasks related to sonar images. This paper provided a comprehensive review of the field, focusing on sonar image datasets and data augmentation techniques, classical sonar image processing algorithms, and deep learning-based sonar image processing methods. By summarizing open-source datasets and commonly used data augmentation techniques, the paper provided data support for future research efforts. Additionally, this paper systematically analyzed the application and evolution of both classical and deep learning-based algorithms across various tasks, offering researchers an overview of the current state of the field. Finally, based on the research progress, the paper predicted future research directions. It was pointed out that the interpretation ability of sonar images could be further improved by obtaining larger-scale sonar image data, designing more robust algorithms, and developing task settings that are more suitable for real-world underwater environments.
-
Key words:
- sonar image /
- intelligent perception /
- ocean exploration /
- deep learning
-
表 1 现有声呐图像分类数据集信息汇总表
Table 1. Summary of existing sonar image classification datasets information
数据集 类别 数量 总数 种类 不平衡因子 Seabed Objects-KLSG 飞机 62 447 侧扫 6.21 沉船 385 FLSMDD 瓶子 449 2 364 前视 6.91 罐子 367 链条 226 饮料瓶 349 钩子 133 螺旋桨 137 洗发水瓶 99 直立状瓶子 65 轮胎 331 阀门 208 NKSID 大螺旋桨 203 2 617 前视 47.55 圆柱体 288 渔网 20 浮球 951 钢管 112 小螺旋桨 94 软管 115 轮胎 834 表 2 现有声呐图像目标检测数据集信息汇总表
Table 2. Summary of existing sonar image target detection datasets information
数据集 类别 数量 总数 种类 FLS-DD 飞机 1 105 3 752 前视 沉船 884 溺水者 1 363 SCTD 飞机 57 357 侧扫 沉船 34 溺水者 266 UATD 立方体 2 987 9 200 前视 球体 3 463 圆柱体 657 溺水者 1 434 轮胎 1 368 圆形网箱 860 方形网箱 1 318 金属桶 487 飞机 1 065 水下航行器 1 000 -
[1] HUO G, WU Z, LI J. Underwater object classification in sidescan sonar images using deep transfer learning and semisynthetic training data[J]. IEEE Access, 2020, 8: 47407-47418. doi: 10.1109/ACCESS.2020.2978880 [2] GERG I D, MONGA V. Structural prior driven regularized deep learning for sonar image classification[EB/OL]. IEEE Transactions on Geoscience and Remote Sensing, (2020-10-26)[2025-05-29]. http://arXiv:2010.13317 (cs.CV). [3] NGUYEN H T, LEE E H, LEE S. Study on the classification performance of underwater sonar image classification based on convolutional neural networks for detecting a submerged human body[J]. Sensors, 2020, 20(1): 94. [4] HUO G, YANG S X, LI Q, et al. A robust and fast method for sidescan sonar image segmentation using nonlocal despeckling and active contour model[J]. IEEE Transactions on Cybernetics, 2016, 47(4): 855-872. [5] YUAN F, XIAO F, ZHANG K, et al. Noise reduction for sonar images by statistical analysis and fields of experts[J]. Journal of Visual Communication and Image Representation, 2021, 74: 102995. doi: 10.1016/j.jvcir.2020.102995 [6] MCKAY J, MONGA V, RAJ R G. Robust sonar ATR through Bayesian pose-corrected sparse classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(10): 5563-5576. doi: 10.1109/TGRS.2017.2710040 [7] YE X, LI C, ZHANG S, et al. Research on side-scan sonar image target classification method based on transfer learning[C]//OCEANS MTS/IEEE Charleston. Harbin, China: IEEE, 2018. [8] OCHAL M, VAZQUEZ J, PETILLOT Y, et al. A comparison of few-shot learning methods for underwater optical and sonar image classification[C]//Global Oceans 2020. Singapore: IEEE, 2020: 1-10. [9] NOVAKOVIC J . Using information gain attribute evaluation to classify sonar targets[J]. Telecommunications Forum Telfor, 2009: 1351-1354. [10] REED S, PETILLOT Y, BELL J. An automatic approach to the detection and extraction of mine features in sidescan sonar[J]. IEEE Journal of Oceanic Engineering, 2003, 28(1): 90-105. doi: 10.1109/JOE.2002.808199 [11] KUMAR N, TAN Q F, NARAYANAN S S. Object classification in sidescan sonar images with sparse representation techniques[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Kyoto, Japan: ICASSP, 2012: 1333-1336. [12] MCKAY J, GERG I, MONGA V, et al. What’s mine is yours: Pretrained CNNs for limited training sonar ATR[C]//Oceans 2017-Anchorage. Anchorage, USA: IEEE, 2017: 1-4. [13] WANG X, JIAO J, YIN J, et al. Underwater sonar image classification using adaptive weights convolutional neural network[J]. Applied Acoustics, 2019, 146(3): 145-154. [14] VALDENEGRO-TORO M. Real-time convolutional networks for sonar image classification in low-power embedded systems[EB/OL]. (2017-09-06)[2025-05-29]. https://arxiv.org/abs/1709.02153. [15] WILLIAMS D P. On the use of tiny convolutional neural networks for human-expert-level classification performance in sonar imagery[J]. IEEE Journal of Oceanic Engineering, 2020, 46(1): 236-260. [16] ZHU P, ISAACS J, FU B, et al. Deep learning feature extraction for target recognition and classification in underwater sonar images[C]//2017 IEEE 56th Annual Conference on Decision and Control(CDC). Melbourne, Australia: IEEE, 2017: 2724-2731. [17] SINGH D, VALDENEGRO-TORO M. Deep neural networks for marine debris detection in sonar images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 3741-3749. [18] CUI Y, JIA M, LIN T Y, et al. Class-balanced loss based on effective number of samples[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: CVPR, 2019: 9268-9277. [19] JIAO W P, ZHANG J L, ZHANG C Y. Open-set recognition with long-tail sonar images[J]. Expert Systems with Applications, 2024, 249: 123495. [20] ZHU X, LIANG Y, ZHANG J, et al. STAFNet: Swin transformer based anchor-free network for detection of forward-looking sonar imagery[C]//Proceedings of the 2022 International Conference on Multimedia Retrieval. Newark, USA: ACM, 2022: 443-450. [21] ZHANG P, TANG J, ZHONG H, et al. Self-trained target detection of radar and sonar images using automatic deep learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-14. [22] XIE K, YANG J, QIU K. A dataset with multibeam forward-looking sonar for underwater object detection[J]. Scientific Data, 2022, 9(1): 739. doi: 10.1038/s41597-022-01854-w [23] LIANG Y, ZHU X, ZHANG J. MiTU-Net: An efficient mix transformer U-like network for forward-looking sonar image segmentation[C]//2022 IEEE 2nd International Conference on Computer Communication and Artificial Intelligence. Changsha, China: CCAI, 2022: 149-154. [24] CHENG Z, HUO G, LI H. A multi-domain collaborative transfer learning method with multi-scale repeated attention mechanism for underwater side-scan sonar image classification[J]. Remote Sensing, 2022, 14(2): 355. doi: 10.3390/rs14020355 [25] JIAO W, ZHANG J. Sonar images classification while facing long-tail and few-shot[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-20. [26] ZHU M, SONG Y, GUO J, et al. PCA and kernel-based extreme learning machine for side-scan sonar image classification[C]//2017 IEEE Underwater Technology(UT). Busan, Korea: IEEE, 2017: 1-4. [27] KUMAR N, MITRA U, NARAYANAN S S. Robust object classification in underwater sidescan sonar images by using reliability-aware fusion of shadow features[J]. IEEE Journal of Oceanic Engineering, 2014, 40(3): 592-606. [28] WANG X, LIU X, JAPKOWICZ N, et al. Automated approach to classification of mine-like objects using multiple-aspect sonar images[J]. Journal of Artificial Intelligence and Soft Computing Research, 2014, 4(2): 133-148. doi: 10.1515/jaiscr-2015-0004 [29] MYERS V, FAWCETT J. A template matching procedure for automatic target recognition in synthetic aperture sonar imagery[J]. IEEE Signal Processing Letters, 2010, 17(7): 683-686. doi: 10.1109/LSP.2010.2051574 [30] TAO W, LIU Y. Combined imaging matching method of side scan sonar images with prior position knowledge[J]. IET Image Processing, 2018, 12(2): 194-199. doi: 10.1049/iet-ipr.2017.0172 [31] LIU H, YE X. Forward-looking sonar image stitching based on midline template matching in polar image[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62: 1-10. [32] LANE D M, STONER J P. Automatic interpretation of sonar imagery using qualitative feature matching[J]. IEEE Journal of Oceanic Engineering, 1994, 19(3): 391-405. doi: 10.1109/48.312915 [33] AYKIN M D, NEGAHDARIPOUR S. On feature matching and image registration for two-dimensional forward-scan sonar imaging[J]. Journal of Field Robotics, 2013, 30(4): 602-623. doi: 10.1002/rob.21461 [34] AYKIN M, NEGAHDARIPOUR S. On feature extraction and region matching for forward scan sonar imaging[C]//2012 Oceans. Yeosu, Korea: IEEE, 2012: 1-9. [35] TUELLER P, KASTNER R, DIAMANT R. A comparison of feature detectors for underwater sonar imagery[C]//OCEANS 2018 MTS/IEEE Charleston. South Carolina, Charleston, USA: IEEE & MTS, 2018: 1-6. [36] LU Z, SHI Y. A Novel target detector of marine radar based on HOG feature[C]//2021 IEEE International Conference on Mechatronics and Automation(ICMA). Takamatsu, Japan: IEEE, 2021: 727-732. [37] FAKIRIS E, PAPATHEODOROU G. Sonar class: A MATLAB toolbox for the classification of side scan sonar imagery, using local textural and reverberational characteristics[C]//International Conference on Advanced Concepts for Intelligent Vision Systems(ACIVS 2006). Antwerp, Belgium: ACIVS, 2006: 488-499. [38] HE J, CHEN J, XU H, et al. SonarNet: Hybrid CNN-transformer-HOG framework and multifeature fusion mechanism for forward-looking sonar image segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-17. [39] WANG X, WANG J, YANG F, et al. Target detection in colorful imaging sonar based on HOG[C]//2018 IEEE International Conference on Signal Processing, Communications and Computing(ICSPCC). Qingdao, China: IEEE, 2018: 1-5. [40] LIN Y, LIU B. Underwater image bidirectional matching for localization based on SIFT[J]. Journal of Marine Science and Application, 2014, 13(2): 225-229. doi: 10.1007/s11804-014-1252-z [41] XU H, YUAN H. An SVM-based AdaBoost cascade classifier for sonar image[J]. IEEE Access, 2020, 8: 115857-115864. doi: 10.1109/ACCESS.2020.3004473 [42] ERKMEN B, YLDRM T. Improving classification performance of sonar targets by applying general regression neural network with PCA[J]. Expert Systems with Applications, 2008, 35(1-2): 472-475. doi: 10.1016/j.eswa.2007.07.021 [43] JI X, YANG B, TANG Q. Seabed sediment classification using multibeam backscatter data based on the selecting optimal random forest model[J]. Applied Acoustics, 2020, 167: 107387. doi: 10.1016/j.apacoust.2020.107387 [44] NAIT-CHABANE A, ZERR B, LE CHENADEC G. Sidescan sonar imagery segmentation with a combination of texture and spectral analysis[C]//2013 MTS/IEEE OCEANS. Bergen, Norway: IEEE & MTS, 2013: 1-6. [45] FEBRIAWAN H, HELMHOLZ P, PARNUM I. Support vector machine and decision tree based classification of side-scan sonar mosaics using textural features[J]. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2019, 42(1): 27-34. [46] WANG J , XIAO L D , ZHOU P. Current situation and review of image segmentation[J]. Recent Patents on Computer Science, 2017, 10(1): 70-79. [47] LU Z, CHEN Y C, ZHANG T D, et al. A sonar image segmentation algorithm based on two-dimensional spatio-temporal fuzzy entropy[C]//2018 IEEE 8th International Conference on Underwater System Technology: Theory and Applications. Wuhan, China: IEEE, 2018: 1-5. [48] LIU L, BIAN H, YAGI S-I, et al. A prior-knowledge-based threshold segmentation method of forward-looking sonar images for underwater linear object detection[J]. Japanese Journal of Applied Physics, 2016, 55(7): 07KG06. [49] YUAN X, MARTNEZ J-F, ECKERT M, et al. Án improved Otsu threshold segmentation method for underwater simultaneous localization and mapping-based navigation[J]. Sensors, 2016, 16(7): 1148. doi: 10.3390/s16071148 [50] TAN X, WANG C, CHEN T, et al. Forward looking sonar image segmentation based on empirical mode decomposition[J]. Journal of Physics: Conference Series, 2022: 012063. [51] SONG S , SI B , FENG X, et al. Prior parameter estimation for Ising-MRF-based sonar image segmentation by local center-encoding[C]//OCEANS 2015-Genova. Genova, Italy: IEEE, 2015: 1-5. [52] SONG S, SI B, FENG X, et al. Label field initialization for MRF-based sonar image segmentation by selective autoencoding[C]//OCEANS 2016-Shanghai. Shanghai, China: IEEE, 2016: 1-5. [53] BANERJEE S, RAY R, SHOME S N, et al. Noise induced feature enhancement and object segmentation of forward looking SONAR image[J]. Procedia Technology, 2014, 14: 125-132. doi: 10.1016/j.protcy.2014.08.017 [54] LIU G Y, PANG Y J, BIAN H Y, et al. Sonar image segmentation using the level set method without re-initialization[J]. Advanced Materials Research, 2014, 981: 368-371. doi: 10.4028/www.scientific.net/AMR.981.368 [55] OJALA T, PIETIKAINEN M, HARWOOD D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions[C]//Proceedings of 12th International Conference on Pattern Recognition. Lisbon, Portugal: IEEE, 1994: 582-585. [56] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). California, San Diego, USA: IEEE, 2005: 886-893. [57] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations (ICLR), San Diego, USA: ICRL, 2015: 1-14. [58] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: 2016: 770-778. [59] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nevada, Las Vegas, USA: IEEE, 2016: 2818-2826. [60] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, Honolulu, USA: IEEE, 2017: 4700-4708. [61] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-01)[2025-05-06]. http://arXivpreprint arXiv:1704.04861. [62] TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. Vienna, Austria: PMLR, 2019: 6105-6114. [63] DOSOVITSKIY A. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2020-12-01)[2025-05-06]. http://arxivpreprint arxiv:2010.11929. [64] LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 10012-10022. [65] RUTLEDGE J, YUAN W, WU J, et al. Intelligent shipwreck search using autonomous underwater vehicles[C]//2018 IEEE International Conference on Robotics and Automation(ICRA). Brisbane, Australia: IEEE, 2018: 6175-6182. [66] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nevada, Las Vegas, USA: IEEE, 2016: 779-788. [67] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Computer Vision-ECCV 2016: 14th European Conference. Amsterdam, The Netherlands: ECCV, 2016: 21-37. [68] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Amsterdam, Netherlands: IEEE, 2017: 2980-2988. [69] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016, 39(6): 1137 - 1149. [70] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969. [71] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Glasgow, Scotland: ECCV, 2020: 213-229. [72] SUN P, ZHANG R, JIANG Y, et al. Sparse R-CNN: End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Tennessee, Nashville, USA: IEEE, 2021: 14454-14463. [73] ZHU X, SU W, LU L, et al. Deformable DETR: Deformable transformers for end-to-end object detection[C]//International Conference on Learning Representations. Addis Ababa, Ethiopia: ICLR, 2020: 1-12. [74] WANG Z, GUO J, ZENG L, et al. MLFFNet: Multilevel feature fusion network for object detection in sonar images[C]//IEEE Transactions on Geoscience and Remote Sensing. New York, USA: IEEE, 2022: 1-19. [75] KVASIC I, MIŠKOVIC N, VUKIC Z. Convolutional neural network architectures for sonar-based diver detection and tracking[C]//OCEANS 2019-Marseille. Marseille, France: IEEE, 2019: 1-6. [76] SUNG M, CHO H, KIM T, et al. Crosstalk removal in forward scan sonar image using deep learning for object detection[J]. IEEE Sensors Journal, 2019, 19(21): 9929-9944. doi: 10.1109/JSEN.2019.2925830 [77] CHEN G, MAO Z, WANG K, et al. HTDet: A hybrid transformer-based approach for underwater small object detection[J]. Remote Sensing, 2023, 15(4): 1-16. [78] ZHOU X, TIAN K, ZHOU Z, et al. SID-TGAN: A transformer-based generative adversarial network for sonar image despeckling[J]. Remote Sensing, 2023, 15(20): 5072. doi: 10.3390/rs15205072 [79] CHEN R, ZHAN S, CHEN Y. Underwater target detection algorithm based on YOLO and Swin transformer for sonar images[C]//OCEANS 2022. Virginia, Virginia Beach, USA: IEEE, 2022: 1-7. [80] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA: IEEE, 2015: 3431-3440. [81] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1097-1105. [82] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615 [83] RONNEBERGER O, FISCHER P, BROX T. U-NET: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Aassisted Intervention-MICCAI 2015: 18th International Conference. Munich, Germany: Springer, 2015, 18(3): 234-241. [84] OKTAY O, SCHLEMPER J, FOLGOC L L, et al. Attention U-Net: Learning where to look for the pancreas[C]// Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2022. [85] ZHOU Z, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE Transactions on Medical Imaging, 2019, 39(6): 1856-1867. [86] HUANG H, LIN L, TONG R, et al. UNet 3+: A full-scale connected UNET for medical image segmentation[C]//ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Barcelona, Spain: IEEE, 2020: 1055-1059. [87] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, Honolulu, USA: IEEE, 2017: 2881-2890. [88] LIANG-CHIEH C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS[C]//International Conference on Learning Representations. San Diego,California, USA: IEEE, 2015. [89] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [90] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-01)[2025-05-06]. http://arXiv preprint arXiv: 1706.05587. [91] ZHANG T, LIU S, HE X, et al. Underwater target tracking using forward-looking sonar for autonomous underwater vehicles[J]. Sensors, 2019, 20(1): 1-16. doi: 10.3390/s20010001 [92] GUSSEN C M, DINIZ P S, CAMPOS M L, et al. A survey of underwater wireless communication technologies[J]. J. Commun. Inf. Sys, 2016, 31(1): 242-255. [93] LIANG Y, ZHU X, ZHANG J. Maanu-Net: Multi-level attention and atrous pyramid nested U-Net for wrecked objects segmentation in forward-looking sonar images[C]//2022 IEEE International Conference on Image Processing (ICIP). Bordeaux, France: IEEE, 2022: 736-740. -