Style Transfer-Based Augmentation for Side-Scan Sonar Images

BAI Zhongyu; XU Hongli; RU Jingyu; QIU Shaoxiong

doi:10.11993/j.issn.2096-3920.2025-0045

Volume 33 Issue 4

Aug 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Unmanned Undersea Systems > 2025 > 33(4): 599-605

BAI Zhongyu, XU Hongli, RU Jingyu, QIU Shaoxiong. Style Transfer-Based Augmentation for Side-Scan Sonar Images[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 599-605. doi: 10.11993/j.issn.2096-3920.2025-0045

Citation:

BAI Zhongyu, XU Hongli, RU Jingyu, QIU Shaoxiong. Style Transfer-Based Augmentation for Side-Scan Sonar Images[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 599-605. doi: 10.11993/j.issn.2096-3920.2025-0045

Citation:

PDF( 2530 KB)

Style Transfer-Based Augmentation for Side-Scan Sonar Images

doi: 10.11993/j.issn.2096-3920.2025-0045

Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China

Received Date: 2025-03-13
Accepted Date: 2025-04-16
Rev Recd Date: 2025-04-07

Available Online: 2025-07-07

Abstract

Abstract

Side-scan sonar(SSS) has been extensively adopted in ocean exploration because of its stability and efficiency when deployed on autonomous undersea vehicles(AUVs). Nevertheless, the difficulty in acquiring SSS images and the limited availability of training samples severely constrain the performance of the deep neural network(DNN)-based SSS image classification. To mitigate this limitation, this paper proposed a multi-scale attention network(MSANet) that utilized optical-acoustic image pairs for data augmentation to enhance the generalization capacity of SSS image classification models. First, shallow and deep features were extracted from multiple encoder layers to comprehensively capture both content and style information. Next, a multi-scale attention module(MSAM) was introduced to extract both local and global contextual information of style features along the channel dimension. These style features were then effectively fused with optical features to achieve precise spatial alignment of optical and acoustic features. Finally, the fused multi-scale features were aligned and input to a decoder to generate high-fidelity SSS images that were subsequently used to train the classification network. Extensive experiments on real-world SSS datasets demonstrate that the proposed style transfer-based augmentation strategy can effectively generate high-quality simulated SSS image samples, thereby improving the performance of SSS image classification based on DNN.
- side-scan sonar,
- style transfer,
- multi-scale attention network,
- mage augmentation,
- image classification

FullText(HTML)

References(18)

References

[1]	郝紫霄, 王琦. 基于声呐图像的水下目标检测研究综述[J]. 水下无人系统学报, 2023, 31(2): 339-348. HAO Z X, WANG Q. Underwater target detection based on sonar image[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 339-348.
[2]	朱兆彤, 付学志, 胡友峰. 一种利用迁移学习训练卷积神经网络的声呐图像识别方法[J]. 水下无人系统学报, 2020, 28(1): 89-96. ZHU Z T, FU X Z, HU Y F. A sonar image recognition method based on convolutional neural network trained through transfer learning[J]. Journal of Unmanned Undersea Systems, 2020, 28(1): 89-96.
[3]	ZHU B, WANG X, CHU Z, et al. Active learning for recognition of shipwreck target in side-scan sonar image[J]. Remote Sensing, 2019, 11(3): 243. doi: 10.3390/rs11030243
[4]	LI C, YE X, CAO D, et al. Zero shot objects classification method of side scan sonar image based on synthesis of pseudo samples[J]. Applied Acoustics, 2021, 173: 107691. doi: 10.1016/j.apacoust.2020.107691
[5]	NAYAK N, NARA M, GAMBIN T, et al. Machine learning techniques for AUV side-scan sonar data feature extraction as applied to intelligent search for underwater archaeological sites[C]//Field and Service Robotics: Results of the 12th International Conference. Singapore: Springer, 2021: 219-233.
[6]	JIANG Y, KU B, KIM W, et al. Side-scan sonar image synthesis based on generative adversarial network for images in multiple frequencies[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(9): 1505-1509.
[7]	HUANG C, ZHAO J, YU Y, et al. Comprehensive sample augmentation by fully considering SSS imaging mechanism and environment for shipwreck detection under zero real samples[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-14.
[8]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2025-03-10]. https://arxiv.org/abs/1409.1556.
[9]	DAI Y, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//2021 IEEE Winter Conference on Applications of Computer Vision(WACV). Waikoloa, USA: WACV, 2021: 3559-3568.
[10]	CAO Y, XU J, LIN S, et al. GCNET: Non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop(ICCVW). Seoul, Korea(South): ICCVW, 2019: 1971-1980.
[11]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, NV, USA: CVPR, 2016: 770-778.
[12]	PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach, CA, USA: CVPR, 2019: 5873-5881.
[13]	NOMAN M, STANKOVIC V, TAWFIK A. Object detection techniques: Overview and performance comparison[C]//2019 IEEE International Symposium on Signal Processing and Information Technology(ISSPIT). Ajman, United Arab Emirates: ISSPIT, 2019: 1-5.
[14]	PHILLIPS F, MACKINTOSH B. Wiki Art Gallery, inc.: A case for critical thinking[J]. Issues in Accounting Education, 2011, 26(3): 593-608. doi: 10.2308/iace-50038
[15]	LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[J]. Advances in Neural Information Processing Systems, 2017, 386-396.
[16]	HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//2017 IEEE International Conference on Computer Vision(ICCV). Venice, Italy: ICCV, 2017: 1510-1519.
[17]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2025-03-10]. https://arxiv.org/abs/2010.11929.
[18]	TAN M, LE Q. Efficientnetv2: Smaller models and faster training[EB/OL]. (2021-06-23) [2025-03-10]. https://arxiv.org/abs/2104.00298.