• 中国科技核心期刊
  • JST收录期刊
  • Scopus收录期刊
  • DOAJ收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的多水下目标拦截策略研究

干文浩 彭云飞 乔磊

干文浩, 彭云飞, 乔磊. 基于深度强化学习的多水下目标拦截策略研究[J]. 水下无人系统学报, 2025, 33(2): 1-9 doi: 10.11993/j.issn.2096-3920.2025-0004
引用本文: 干文浩, 彭云飞, 乔磊. 基于深度强化学习的多水下目标拦截策略研究[J]. 水下无人系统学报, 2025, 33(2): 1-9 doi: 10.11993/j.issn.2096-3920.2025-0004
GAN Wenhao, PENG Yunfei, QIAO Lei. Research on Multi-Underwater Targets Interception Strategy Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0004
Citation: GAN Wenhao, PENG Yunfei, QIAO Lei. Research on Multi-Underwater Targets Interception Strategy Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0004

基于深度强化学习的多水下目标拦截策略研究

doi: 10.11993/j.issn.2096-3920.2025-0004
基金项目: 国家自然科学基金项目(52101365), 中国科协青年人才托举工程项目(2021QNRC001), 上海市青年科技英才扬帆计划项目(21YF1419800).
详细信息
    作者简介:

    干文浩(1998-), 男, 在读博士, 主要研究方向为水下智能博弈

    通讯作者:

    乔 磊(1989-), 男, 副教授, 博士生导师, 主要研究方向为海洋智能机器人和无人系统.

  • 中图分类号: U674.941; TP242.6

Research on Multi-Underwater Targets Interception Strategy Based on Deep Reinforcement Learning

  • 摘要: 在多自主水下航行器(AUV)拦截水下目标时, AUV需在竞争与合作的双重挑战下根据敌友信息做出精准决策。现有研究多集中于简单环境下的单目标拦截, 缺乏对复杂环境下多目标拦截协作机制的深入探讨。针对这一问题, 文中提出一种多智能体深度强化学习框架, 帮助AUV在具有复杂障碍和时变海流的环境中学习拦截策略, 重点开发其在多对多态势下的协作机制。首先设计了一种分层机动框架, 通过3层循环增强AUV决策能力。然后基于多智能体近端策略优化算法, 构建可伸缩的状态和动作空间, 设计复合奖励函数, 提高AUV拦截效率和协同能力。最后在集中训练-分布式执行架构下, 提出种群扩展-课程式学习训练方案, 帮助AUV掌握具有泛化性的协同策略。训练结果表明, 所提框架下的拦截策略能快速收敛, 保障高成功率。仿真实验表明, 训练得到的AUV团队可在多种种群配置下使用同一套模型, 在避开障碍物的同时通过合作有效地拦截多个入侵目标。

     

  • 图  1  多对多AUV拦截场景示意图

    Figure  1.  Scene of multi-to-multi AUVs interception

    图  2  用于AUV拦截任务的分层决策机动框架

    Figure  2.  Hierarchical decision-making maneuvering framework for AUV interception tasks

    图  3  AUV 防御团队的拦截策略

    Figure  3.  Interception strategy for the AUV defensive team

    图  4  基于Unity3D搭建的拦截场景

    Figure  4.  Interception scene built on Unity3D

    图  5  拦截胜率在训练过程中的变化情况(阴影区域表示95%的置信区间)

    Figure  5.  Change of interception success rate during training process (the shaded area represents the 95% confidence interval)

    图  6  AUV拦截策略的训练效果(实线表示五个随机种子下得到的指标均值, 阴影区表示对应标准差)

    Figure  6.  Training results of AUV interception strategy (the shaded area represent the mean values from five random seeds, with the shaded area indicating the standard deviation)

    图  7  2防御者对1进攻者的仿真结果

    Figure  7.  Simulation results of 2 defenders vs 1 attacker

    图  8  3防御者对抗2进攻者的仿真结果

    Figure  8.  Simulation results of 3 defenders vs 2 attacker

    图  9  2防御者对抗3进攻者的仿真结果

    Figure  9.  Simulation results of 2 defenders vs 3 attacker

    图  10  AUV拦截仿真实验的统计结果

    Figure  10.  Statistical results of AUV interception simulation experiments

    表  1  奖励及环境相关参数

    Table  1.   Parameters of Simulation Experiment

    名称 参数值
    奖励系数$ ({k_{rp}},{k_s},{k_I}) $ (1.5, 0.4, 50)
    防御AUV数量$ {N_D} $ 1~3
    进攻AUV数量$ {N_A} $ 1~3
    涡心位置 (−60 m, −30 m), (0 m, 80 m)
    涡旋强度$ \Gamma $ 8
    涡旋半径$ \delta $ 80 m
    安全半径$ R_T^{{\text{safe}}} $ 5 m
    下载: 导出CSV
  • [1] 胡桥, 赵振轶, 冯豪博, 等. AUV 智能集群协同任务研究进展[J]. 水下无人系统学报, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002

    HU Q, ZHAO Z Y, FENG H B, et al. Progress of AUV intelligent swarm collaborative task[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002
    [2] 梁晓龙, 杨爱武, 张佳强, 等. 无人集群博弈对抗系统仿真验证及决策关键技术综述[J]. 系统仿真学报, 2024, 36(4): 805-816.

    LIANG X L, YANG A W, ZHANG J Q, et al. Simulation verification and decision-making key technologies of unmanned swarm game confrontation: A survey[J]. Journal of System Simulation, 2024, 36(4): 805-816.
    [3] SUN S, SONG B, WANG P, et al. Real-time mission-motion planner for multi-UUVs cooperative work using tri-level programing[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23(2): 1260-1273.
    [4] ANTONIONI E, SURIANI V, RICCIO F, et al. Game strategies for physical robot soccer players: A survey[J]. IEEE Transactions on Games, 2021, 13(4): 342-357. doi: 10.1109/TG.2021.3075065
    [5] 赵伟, 叶军, 王邠. 基于人工智能的智能化指挥决策和控制[J]. 信息安全与通信保密, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001

    ZHAO W, YE J, WANG B. Intelligentized command and control based on artificial intelligence[J]. Information Security and Communications Privacy, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001
    [6] 秦家虎, 马麒超, 李曼, 等. 多智能体协同研究进展综述: 博弈和控制交叉视角[J]. 自动化学报, 2024, 51: 1-21.

    QIN J H, MA Q C, LI M, et al. Recent advances on multi-agent collaboration: A cross-perspective of game and control theory[J]. Acta Automatica Sinica, 2024, 51: 1-21.
    [7] 罗彪, 胡天萌, 周育豪, 等. 多智能体强化学习控制与决策研究综述[J]. 自动化学报, 2024, 51: 1-30.

    LUO B, HU T M, ZHOU Y H, et al. Survey on multi-agent reinforcement learning for control and decision-making[J]. Acta Automatica Sinica, 2024, 51: 1-30.
    [8] HOU Y, HAN G, ZHANG F, et al. Distributional soft actor-critic-based multi-AUV cooperative pursuit for maritime security protection[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(6): 6049-6060. doi: 10.1109/TITS.2023.3341034
    [9] XU J, ZHANG Z, WANG J, et al. Multi-AUV Pursuit-Evasion Game in The Internet of Underwater Things: An Efficient Training Framework via Offline Reinforcement Learning[J]. IEEE Internet of Things Journal, 2024, 11(19): 31273-31286. doi: 10.1109/JIOT.2024.3416616
    [10] ZHANG C, CHENG P, LIN B, et al. DRL-based target interception strategy design for an underactuated USV without obstacle collision[J]. Ocean Engineering, 2023, 280: 114443. doi: 10.1016/j.oceaneng.2023.114443
    [11] 于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 79-86. doi: 10.11993/j.issn.2096-3920.2023-0159

    YU C D, LIU X Y, CHEN C, et al. Research on game confrontation of unmanned surface vehicles swarm based on multi-agent deep reinforcement learning[J]. Journal of Unmanned Undersea Systems, 2024, 32(1): 79-86. doi: 10.11993/j.issn.2096-3920.2023-0159
    [12] 夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法[J]. 控制与决策, 2023, 38(5): 1438-1447.

    XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38(5): 1438-1447.
    [13] 孙兵, 戚国亮, 张威, 等. 基于粒子群优化-人工势场的多AUV拦截技术研究[J]. 控制工程, 2024, 31(5): 769-777.

    SUN B, QI G L, ZHANG W, et al. Research on Multi-AUV Interception Technology Based on Particle Swarm Optimization - Artificial Potential Field Method[J]. Control Engineering, 2024, 31(5): 769-777.
    [14] SUN B, MA H, ZHU D. A fusion designed improved elastic potential field method in auv underwater target interception[J]. IEEE Journal of Oceanic Engineering, 2023, 48(3): 640-648. doi: 10.1109/JOE.2023.3258068
    [15] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624.
    [16] JANOSOV M, VIRÁGH C, VÁSÁRHELYI G, et al. Group chasing tactics: how to catch a faster prey[J]. New Journal of Physics, 2017, 19(5): 053003. doi: 10.1088/1367-2630/aa69e7
    [17] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[EB/OL]. arXiv preprint arXiv: 1506.02438, 2015. Available at: https://arxiv.org/abs/1506.02438. Accessed: 2025-02-20.
    [18] BAO H, ZHU H. Modeling and trajectory tracking model predictive control novel method of AUV based on CFD data[J]. Sensors, 2022, 22(11): 4234. doi: 10.3390/s22114234
  • 加载中
图(10) / 表(1)
计量
  • 文章访问数:  24
  • HTML全文浏览量:  15
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-01-08
  • 修回日期:  2025-02-06
  • 录用日期:  2025-02-08
  • 网络出版日期:  2025-03-07

目录

    /

    返回文章
    返回
    服务号
    订阅号