• 中国科技核心期刊
  • JST收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体深度强化学习的无人艇集群博弈对抗研究

于长东 刘新阳 陈聪 刘殿勇 梁霄

于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 1-8 doi: 10.11993/j.issn.2096-3920.2023-0159
引用本文: 于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 1-8 doi: 10.11993/j.issn.2096-3920.2023-0159
YU Changdong, LIU Xinyang, CHEN Cong, LIU Dianyong, LIANG Xiao. Research on the Swarm Game Confrontation of Unmanned Surface Vehicles Based on Multi-Agent Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2023-0159
Citation: YU Changdong, LIU Xinyang, CHEN Cong, LIU Dianyong, LIANG Xiao. Research on the Swarm Game Confrontation of Unmanned Surface Vehicles Based on Multi-Agent Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2023-0159

基于多智能体深度强化学习的无人艇集群博弈对抗研究

doi: 10.11993/j.issn.2096-3920.2023-0159
基金项目: 国家自然科学基金项目(52271302); 国家基础科研计划项目(JCKY2022410C012); 辽宁省应用基础研究计划项目(2023JH2/101300198); 大连市科技创新基金项目(2021JJ12GX017); 中央高校基本科研业务费专项资金资助(3132023512).
详细信息
    作者简介:

    于长东(1996-), 男, 博士, 讲师, 主要研究方向为机器学习, 群体智能

    通讯作者:

    梁 霄(1980-), 男, 博士, 教授, 主要研究方向为海上无人系统技术

  • 中图分类号: TP181; TP13

Research on the Swarm Game Confrontation of Unmanned Surface Vehicles Based on Multi-Agent Deep Reinforcement Learning

  • 摘要: 基于未来现代化的海上作战背景, 提出了利用多智能体深度强化学习MADDPG方案来完成无人艇群博弈对抗中的协同围捕任务。首先, 根据不同的作战模式和应用场景, 提出一种基于分布式执行的多智能体深度确定性策略梯度算法, 并对其原理进行了介绍; 其次, 模拟具体作战场景平台, 设计多智能体网络模型、奖励函数机制以及训练策略; 实验结果表明, 本文提出的方法可以有效解决面对敌方无人艇的协同围捕决策问题, 在不同作战场景下具有较高的效率, 为未来复杂作战场景下无人艇智能决策的研究提供了理论和参考价值

     

  • 图  1  无人艇围捕场景示意图

    Figure  1.  Round-up scene of unmanned surface vehicles

    图  2  无人艇与环境交互过程示意图

    Figure  2.  Schematic diagram of the interaction process between the USV and environment

    图  3  DDPG算法数据传递结构示意图

    Figure  3.  Structure of data transfer of DDPG algorithm

    图  4  MADDPG算法数据传递结构示意图

    Figure  4.  Structure of data transfer of MADDPG algorithm

    图  5  MADDPG算法具体执行流程

    Figure  5.  Execution process of MADDPG algorithm

    图  6  3对1时各艇回报值

    Figure  6.  The USV return values at 3 vs 1

    图  7  3对1仿真结果示意图

    Figure  7.  Simulation results of 3 vs 1

    图  8  6对2时我方各艇回报值

    Figure  8.  Our USV’s return values at 6 vs 2

    图  9  6对2时敌方各艇回报值

    Figure  9.  Enemy USV’s return values at 6 vs 2

    图  10  6对2仿真结果示意图

    Figure  10.  Simulation results of 6 vs 2

  • [1] 林龙信, 张比升. 水面无人作战系统技术发展与作战应用[J]. 水下无人系统学报, 2018, 26(2): 107-114.

    Lin Longxin, Zhang Bisheng. Technical development and operational application of unmanned surface combat system[J]. Journal of Unmanned Undersea Systems, 2018, 26(2): 107-114.
    [2] 胡桥, 赵振轶, 冯豪博, 等. AUV智能集群协同任务研究进展[J]. 水下无人系统学报, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002

    Hu Qiao, Zhao Zhenyi, Feng Haobo, et al. Progress of AUV intelligent swarm collaborative task[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 189-200. doi: 10.11993/j.issn.2096-3920.2023-0002
    [3] Hu D, Yang R, Zuo J, et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat[J]. IEEE Access, 2021, 9: 32282-32297. doi: 10.1109/ACCESS.2021.3060426
    [4] 高霄鹏, 刘冬雨, 霍聪. 水面无人艇运动规划研究综述[J]. 舰船科学技术, 2023, 45(16): 1-6. doi: 10.3404/j.issn.1672-7649.2023.16.001

    Gao Xiaopeng, Liu Dongyu, Huo Cong. A review of research on motion planning of unmanned surface vehicles[J]. Ship Science and Technology, 2023, 45(16): 1-6. doi: 10.3404/j.issn.1672-7649.2023.16.001
    [5] 刘鹏, 赵建新, 张宏映, 等. 基于改进型MADDPG的多智能体对抗策略算法[J]. 火力与指挥控制, 2023, 48(3): 132-138,145. doi: 10.3969/j.issn.1002-0640.2023.03.020

    Liu Peng, Zhao Jianxin, Zhang Hongying, et al. Multi-agent Confrontation Strategy Algorithm Based on Improved MADDPG[J]. Fire Control & Command Control, 2023, 48(3): 132-138,145. doi: 10.3969/j.issn.1002-0640.2023.03.020
    [6] Wang N, Xu H. Dynamics-constrained global-local hybrid path planning of an autonomous surface vehicle[J]. IEEE Transactions on Vehicular Technology, 2020, 69(7): 6928-6942. doi: 10.1109/TVT.2020.2991220
    [7] Hua X, Liu J, Zhang J, et al. An Apollonius circle based game theory and Q-learning for cooperative hunting in unmanned aerial vehicle cluster[J]. Computers and Electrical Engineering, 2023, 110: 108876. doi: 10.1016/j.compeleceng.2023.108876
    [8] 李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策[J]. 宇航学报, 2021, 42(6): 757-765.

    Li Bo, Yue Kaiqiang, Gan Zhigang, et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient[J]. Journal of Astronautics, 2021, 42(6): 757-765.
    [9] 刘菁, 华翔, 张金金. 一种改进博弈学习的无人机集群协同围捕方法[J]. 西安工业大学学报, 2023, 43(3): 277-286.

    Liu Jing, Hua Xiang, Zhang Jinjin. Improved game learning method for UAV swarm cooperative hunting[J]. Journal of Xi'an Technological University, 2023, 43(3): 277-286.
    [10] Zhan G, Zhang X, Li Z, et al. Multiple-UAV reinforcement learning algorithm based on improved PPO in ray framework[J]. Drones, 2022, 6(7): 166. doi: 10.3390/drones6070166
    [11] 赵伟, 叶军, 王邠. 基于人工智能的智能化指挥决策和控制[J]. 信息安全与通信保密, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001

    Zhao Wei, Ye Jun, Wang Bin. Intelligentized command and control based on artificial intelligence[J]. Information Security and Communications Privacy, 2022(2): 2-8. doi: 10.3969/j.issn.1009-8054.2022.02.001
    [12] 苏震, 张钊, 陈聪, 等. 基于深度强化学习的无人艇集群博弈对抗[J]. 兵器装备工程学报, 2022, 43(9): 9-14. doi: 10.11809/bqzbgcxb2022.09.002

    Su Zhen, Zhang Zhao, Chen Cong, et al. Deep reinforcement learning based swarm game confrontation of unmanned surface vehicles[J]. Journal of Ordnance Equipment Engineering, 2022, 43(9): 9-14. doi: 10.11809/bqzbgcxb2022.09.002
    [13] 夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法[J]. 控制与决策, 2023, 38(5): 1438-1447.

    Xia Jiawei, Zhu Xufang, Zhang Jianqiang, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38(5): 1438-1447.
    [14] Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach California, USA: NIPS, 2017.
    [15] Wu C H, Sofge D A, Lofaro D M. Crafting a robotic swarm pursuit–evasion capture strategy using deep reinforcement learning[J]. Artificial Life and Robotics, 2022, 27(2): 355-364. doi: 10.1007/s10015-022-00761-y
    [16] 蔺向阳, 邢清华, 邢怀玺. 基于MADDPG的无人机群空中拦截作战决策研究[J]. 计算机科学, 2023, 50(S1): 98-104.

    Lin Xiangyang, Xing Qinghua, Xing Huaixi. Study on intelligent decision making of aerial interception combat of UAV group based on MADDPG[J]. Computer Science, 2023, 50(S1): 98-104.
  • 加载中
图(10)
计量
  • 文章访问数:  18
  • HTML全文浏览量:  6
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-02
  • 修回日期:  2022-11-02
  • 录用日期:  2024-01-16
  • 网络出版日期:  2024-01-29

目录

    /

    返回文章
    返回
    服务号
    订阅号