Research on Multi-Underwater Targets Interception Strategy Based on Deep Reinforcement Learning
-
摘要: 在多自主水下航行器(AUV)拦截水下目标时, AUV需在竞争与合作的双重挑战下根据敌友信息做出精准决策。现有研究多集中于简单环境下的单目标拦截, 缺乏对复杂环境下多目标拦截协作机制的深入探讨。针对这一问题, 文中提出一种多智能体深度强化学习框架, 帮助AUV在具有复杂障碍和时变海流的环境中学习拦截策略, 重点开发其在多对多态势下的协作机制。首先设计了一种分层机动框架, 通过3层循环增强AUV决策能力。然后基于多智能体近端策略优化算法, 构建可伸缩的状态和动作空间, 设计复合奖励函数, 提高AUV拦截效率和协同能力。最后在集中训练-分布式执行架构下, 提出种群扩展-课程式学习训练方案, 帮助AUV掌握具有泛化性的协同策略。训练结果表明, 所提框架下的拦截策略能快速收敛, 保障高成功率。仿真实验表明, 训练得到的AUV团队可在多种种群配置下使用同一套模型, 在避开障碍物的同时通过合作有效地拦截多个入侵目标。Abstract: In the context of autonomous undersea vehicles (AUVs) intercepting targets, precise decision-making is required based on both enemy and friendly information, navigating the challenges of competition and cooperation. Existing research typically focuses on single-target interception in simple environments and lacks detailed exploration of collaborative mechanisms for multi-target interception in complex environments. This paper proposes a multi-agent deep reinforcement learning framework for AUVs to learn interception strategies in dynamic and obstacle-rich environments, with a focus on cooperation in many-to-many game scenarios. First, a hierarchical maneuvering framework is introduced to improve decision-making. Next, the multi-agent proximal policy optimization algorithm is used to construct a scalable state-action space and reward function, enhancing interception efficiency and cooperation. Finally, a population expansion-curriculum learning approach is incorporated within a centralized training-distributed execution architecture to help the team master generalizable strategies. Training results show rapid convergence and high success rates, with simulations demonstrating effective cooperation and obstacle avoidance across diverse configurations.
-
表 1 奖励及环境相关参数
Table 1. Parameters of Simulation Experiment
名称 参数值 奖励系数$ ({k_{rp}},{k_s},{k_I}) $ (1.5, 0.4, 50) 防御AUV数量$ {N_D} $ 1~3 进攻AUV数量$ {N_A} $ 1~3 涡心位置 (−60 m, −30 m), (0 m, 80 m) 涡旋强度$ \Gamma $ 8 涡旋半径$ \delta $ 80 m 安全半径$ R_T^{{\text{safe}}} $ 5 m -
[1] 胡桥, 赵振轶, 冯豪博, 等. AUV 智能集群协同任务研究进展[J]. 水下无人系统学报, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002HU Q, ZHAO Z Y, FENG H B, et al. Progress of AUV intelligent swarm collaborative task[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002 [2] 梁晓龙, 杨爱武, 张佳强, 等. 无人集群博弈对抗系统仿真验证及决策关键技术综述[J]. 系统仿真学报, 2024, 36(4): 805-816.LIANG X L, YANG A W, ZHANG J Q, et al. Simulation verification and decision-making key technologies of unmanned swarm game confrontation: A survey[J]. Journal of System Simulation, 2024, 36(4): 805-816. [3] SUN S, SONG B, WANG P, et al. Real-time mission-motion planner for multi-UUVs cooperative work using tri-level programing[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23(2): 1260-1273. [4] ANTONIONI E, SURIANI V, RICCIO F, et al. Game strategies for physical robot soccer players: A survey[J]. IEEE Transactions on Games, 2021, 13(4): 342-357. doi: 10.1109/TG.2021.3075065 [5] 赵伟, 叶军, 王邠. 基于人工智能的智能化指挥决策和控制[J]. 信息安全与通信保密, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001ZHAO W, YE J, WANG B. Intelligentized command and control based on artificial intelligence[J]. Information Security and Communications Privacy, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001 [6] 秦家虎, 马麒超, 李曼, 等. 多智能体协同研究进展综述: 博弈和控制交叉视角[J]. 自动化学报, 2024, 51: 1-21.QIN J H, MA Q C, LI M, et al. Recent advances on multi-agent collaboration: A cross-perspective of game and control theory[J]. Acta Automatica Sinica, 2024, 51: 1-21. [7] 罗彪, 胡天萌, 周育豪, 等. 多智能体强化学习控制与决策研究综述[J]. 自动化学报, 2024, 51: 1-30.LUO B, HU T M, ZHOU Y H, et al. Survey on multi-agent reinforcement learning for control and decision-making[J]. Acta Automatica Sinica, 2024, 51: 1-30. [8] HOU Y, HAN G, ZHANG F, et al. Distributional soft actor-critic-based multi-AUV cooperative pursuit for maritime security protection[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(6): 6049-6060. doi: 10.1109/TITS.2023.3341034 [9] XU J, ZHANG Z, WANG J, et al. Multi-AUV Pursuit-Evasion Game in The Internet of Underwater Things: An Efficient Training Framework via Offline Reinforcement Learning[J]. IEEE Internet of Things Journal, 2024, 11(19): 31273-31286. doi: 10.1109/JIOT.2024.3416616 [10] ZHANG C, CHENG P, LIN B, et al. DRL-based target interception strategy design for an underactuated USV without obstacle collision[J]. Ocean Engineering, 2023, 280: 114443. doi: 10.1016/j.oceaneng.2023.114443 [11] 于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 79-86. doi: 10.11993/j.issn.2096-3920.2023-0159YU C D, LIU X Y, CHEN C, et al. Research on game confrontation of unmanned surface vehicles swarm based on multi-agent deep reinforcement learning[J]. Journal of Unmanned Undersea Systems, 2024, 32(1): 79-86. doi: 10.11993/j.issn.2096-3920.2023-0159 [12] 夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法[J]. 控制与决策, 2023, 38(5): 1438-1447.XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38(5): 1438-1447. [13] 孙兵, 戚国亮, 张威, 等. 基于粒子群优化-人工势场的多AUV拦截技术研究[J]. 控制工程, 2024, 31(5): 769-777.SUN B, QI G L, ZHANG W, et al. Research on Multi-AUV Interception Technology Based on Particle Swarm Optimization - Artificial Potential Field Method[J]. Control Engineering, 2024, 31(5): 769-777. [14] SUN B, MA H, ZHU D. A fusion designed improved elastic potential field method in auv underwater target interception[J]. IEEE Journal of Oceanic Engineering, 2023, 48(3): 640-648. doi: 10.1109/JOE.2023.3258068 [15] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624. [16] JANOSOV M, VIRÁGH C, VÁSÁRHELYI G, et al. Group chasing tactics: how to catch a faster prey[J]. New Journal of Physics, 2017, 19(5): 053003. doi: 10.1088/1367-2630/aa69e7 [17] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[EB/OL]. arXiv preprint arXiv: 1506.02438, 2015. Available at: https://arxiv.org/abs/1506.02438. Accessed: 2025-02-20. [18] BAO H, ZHU H. Modeling and trajectory tracking model predictive control novel method of AUV based on CFD data[J]. Sensors, 2022, 22(11): 4234. doi: 10.3390/s22114234 -