• 中国科技核心期刊
  • Scopus收录期刊
  • DOAJ收录期刊
  • JST收录期刊
  • Euro Pub收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于时频交错注意力的海洋声音分离算法

王禹迪 杨明忠 刘立昕

王禹迪, 杨明忠, 刘立昕. 基于时频交错注意力的海洋声音分离算法[J]. 水下无人系统学报, 2026, 34(1): 1-9 doi: 10.11993/j.issn.2096-3920.2025-0127
引用本文: 王禹迪, 杨明忠, 刘立昕. 基于时频交错注意力的海洋声音分离算法[J]. 水下无人系统学报, 2026, 34(1): 1-9 doi: 10.11993/j.issn.2096-3920.2025-0127
WANG Yudi, YANG Mingzhong, LIU Lixin. Ocean sound separation algorithm based on time-frequency interleaved attention[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0127
Citation: WANG Yudi, YANG Mingzhong, LIU Lixin. Ocean sound separation algorithm based on time-frequency interleaved attention[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0127

基于时频交错注意力的海洋声音分离算法

doi: 10.11993/j.issn.2096-3920.2025-0127
基金项目: 2024年广东省海洋经济发展项目资助(GDNRC[2024]44).
详细信息
    作者简介:

    王禹迪(2001-), 男, 在读硕士, 主要研究方向为水下音频信号处理

    通讯作者:

    刘立昕(1985-), 男, 博士, 副研究员, 主要方向为水下计算机视觉、信号与图像处理.

  • 中图分类号: TB561; TN911.72

Ocean sound separation algorithm based on time-frequency interleaved attention

  • 摘要: 海洋声音蕴含着丰富信息, 而复杂海洋声学环境与水下目标信号多变的特性, 给水下声音特征精细化感知与分辨能力带来了严峻挑战。为真实还原水下感兴趣目标的声音, 文中提出基于集成滤波模块(IFM)的海洋声源分离算法。采用频带划分策略, 使用编码器将混合音频转换至时频谱, 利用多尺度注意力机制交叉提取时频增益, 并通过IFM提高水下声音分离能力。其中, IFM采用自适应加权机制, 将多尺度卷积空间滤波通路与自注意力特征依赖通路所提取的特征与原始特征进行高效融合, 并将融合后的特征输入解码器以重建高质量的纯净目标音频, 在增强目标信号细节的同时有效滤除背景噪声和干扰。在海洋典型声音数据集上的实验结果表明, 文中所提算法能够显著提升感兴趣目标音频分离性能, 在座头鲸与客船、虎鲸与客船的音频分离实验中, 源失真比改善量(SDRi)分别达到8.56 dB和10.74 dB, 在其他多种指标上也优于现有的基线模型。

     

  • 图  1  BSS基本模型

    Figure  1.  Basic model of BSS

    图  2  TFIIF-Net模型总体框架

    Figure  2.  The overall pipeline of TFIIF-Net

    图  3  分离器模块示意图

    Figure  3.  Schematic diagram of separator module

    图  4  IFM示意图

    Figure  4.  Schematic diagram of integrated filtration module

    图  5  座头鲸音频分离结果时频谱图

    Figure  5.  Time-frequency spectrogram of humpback whale audio separation results

    图  6  虎鲸音频分离结果的时频谱图

    Figure  6.  Time-frequency spectrogram of killer whale audio separation results

    图  7  真实海洋音频分离结果的时频谱图

    Figure  7.  Time-frequency spectrograms of real ocean audio separation results

    图  8  海洋生物音频分离结果的时频谱图

    Figure  8.  Time-frequency spectrograms of audio separation results from marine organisms

    表  1  音频分离模型的性能指标对比实验

    Table  1.   Comparison of performance metrics of audio separation models experiment

    实验类别处理方法SAR/dBSIR/dBSDR/dBSDRi/dB
    实验1Tas-Net10.0910.686.716.17
    Conv-TasNet11.3411.847.967.42
    DPRNN11.3312.768.327.78
    DPTNet11.4311.037.637.09
    TDANet10.9811.607.717.17
    TFIIF-Net11.9813.329.108.56
    实验2TasNet10.0911.317.117.01
    Conv-TasNet11.6313.589.048.94
    DPRNN11.0413.448.598.49
    DPTNet7.116.102.282.18
    TDANet10.9912.498.087.98
    TFIIF-Net12.8016.2110.8410.74
    下载: 导出CSV

    表  2  消融实验

    Table  2.   Ablation experiment

    MethodsSAR/dBSIR/dBSDR/dBSDRi/dB
    Base11.8313.188.978.43
    Spatial11.9513.309.068.52
    Feature11.9312.838.848.30
    Equalweight12.6811.558.678.13
    TFIIF-Net11.9813.329.108.56
    下载: 导出CSV
  • [1] BAYRAKCI G, KLINGELHOEFER F. An introduction to the ocean soundscape[M]. Noisy Oceans: Monitoring Seismic and Acoustic Signals in the Marine Environment. Hoboken, NJ: Wiley, 2024.
    [2] DUARTE C M, CHAPUIS L, COLLIN S P, et al. The soundscape of the Anthropocene ocean[J]. Science, 2021, 371(6529): eaba4658. doi: 10.1126/science.aba4658
    [3] LI D, WU M, YU L, et al. Single-channel blind source separation of underwater acoustic signals using improved NMF and FastICA[J]. Frontiers in Marine Science, 2023, 9: 1097003. doi: 10.3389/fmars.2022.1097003
    [4] 谢加武. 基于深度学习的水下声源分离技术研究[D]. 成都: 电子科技大学, 2019.
    [5] WANG M, ZHANG W, SHAO M, et al. Separation and extraction of compound-fault signal based on multi-constraint non-negative Matrix factorization[J]. Entropy, 2024, 26(7): 583. doi: 10.3390/e26070583
    [6] SCHULZE F K, RICHARD G, KELLEY L, et al. Unsupervised music source separation using differentiable parametric source models[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1276-1289. doi: 10.1109/TASLP.2023.3252272
    [7] ANSARI S, ALATRANY A S, ALNAJJAR K A, et al. A survey of artificial intelligence approaches in blind source separation[J]. Neurocomputing, 2023, 561: 126895. doi: 10.1016/j.neucom.2023.126895
    [8] CHANDNA P, CUESTA H, PETERMANN D, et al. A deep-learning based framework for source separation, analysis, and synthesis of choral ensembles[J]. Frontiers in Signal Processing, 2022, 2: 808594. doi: 10.3389/frsip.2022.808594
    [9] LUO Y, MESGARANI N. Tasnet: time-domain audio separation network for real-time, single-channel speech separation[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Calgary, AB, Canada: IEEE, 2018: 696-700.
    [10] LUO Y, MESGARANI N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266. doi: 10.1109/TASLP.2019.2915167
    [11] LUO Y, CHEN Z, YOSHIOKA T. Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Barcelona, Spain: IEEE, 2020: 46-50.
    [12] CHEN J, MAO Q, LIU D. Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation[EB/OL]. [2025-11-25] https://arxiv.org/abs/2007.13975.
    [13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Advances in Neural Information Processing Systems, 2017(30): 5998-6008.
    [14] ZHANG W, LI X, ZHOU A, et al. Underwater acoustic source separation with deep Bi-LSTM networks[C]//2021 4th International Conference on Information Communication and Signal Processing (ICICSP). Shanghai, China: IEEE, 2021: 254-258.
    [15] HE Q, WANG H, ZENG X, et al. Ship-radiated noise separation in underwater acoustic environments using a deep time-domain network[J]. Journal of Marine Science and Engineering, 2024, 12(6): 885. doi: 10.3390/jmse12060885
    [16] LIU Y, JIANG L. Passive underwater acoustic signal separation based on feature decoupling dual-path network[EB/OL]. [2025-09-25] https://arxiv.org/abs/2504.08371
    [17] XU M, LI K, CHEN G, et al. Tiger: Time-frequency interleaved gain extraction and reconstruction for efficient speech separation[EB/OL]. [2025-09-25] https://arxiv.org/abs/2410.01469
    [18] SAYIGH L, DAHER M A, ALLEN J, et al. The Watkins marine mammal sound database: an online, freely accessible resource[C]//Proceedings of Meetings on Acoustics. Acoustical Society of America. Dublin, Ireland: POMA, 2016: 040013.
    [19] SANTOS-DOMÍNGUEZ D, TORRES-GUIJARRO S, CARDENAL-LÓPEZ A, et al. ShipsEar: An underwater vessel noise database[J]. Applied Acoustics, 2016, 113: 64-69. doi: 10.1016/j.apacoust.2016.06.008
    [20] YUREN B. Research on music source separation technology based on deep learning[J]. Computer Science and Application, 2022, 12: 2788. doi: 10.12677/CSA.2022.1212283
  • 加载中
计量
  • 文章访问数:  31
  • HTML全文浏览量:  14
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-15
  • 修回日期:  2025-09-30
  • 录用日期:  2025-10-10
  • 网络出版日期:  2026-01-14
图(8) / 表(2)

目录

    /

    返回文章
    返回
    服务号
    订阅号