• 中国科技核心期刊
  • JST收录期刊
  • Scopus收录期刊
  • DOAJ收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的矢量推进AUV路径规划方法

庞舟岐 林晓波 郝程鹏 程文鑫

庞舟岐, 林晓波, 郝程鹏, 等. 基于深度强化学习的矢量推进AUV路径规划方法[J]. 水下无人系统学报, 2025, 33(4): 638-647 doi: 10.11993/j.issn.2096-3920.2025-0005
引用本文: 庞舟岐, 林晓波, 郝程鹏, 等. 基于深度强化学习的矢量推进AUV路径规划方法[J]. 水下无人系统学报, 2025, 33(4): 638-647 doi: 10.11993/j.issn.2096-3920.2025-0005
PANG Zhouqi, LIN Xiaobo, HAO Chengpeng, CHENG Wenxin. Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 638-647. doi: 10.11993/j.issn.2096-3920.2025-0005
Citation: PANG Zhouqi, LIN Xiaobo, HAO Chengpeng, CHENG Wenxin. Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 638-647. doi: 10.11993/j.issn.2096-3920.2025-0005

基于深度强化学习的矢量推进AUV路径规划方法

doi: 10.11993/j.issn.2096-3920.2025-0005
详细信息
    作者简介:

    庞舟岐(1998-), 男, 博士, 主要研究方向为自主水下航行器路径规划

  • 中图分类号: TJ630; U674.941

Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning

  • 摘要: 为了赋予自主水下航行器(AUV)更强的路径规划能力, 文中提出了舵板+矢量推进器的联合控制方法, 并利用深度强化学习技术分配舵板和矢量喷管的使用比例, 用以在舵板控制的高能效性和矢量喷管控制的高机动性之间进行权衡, 使得AUV以较低能耗到达目标点。一方面, 构建起矢量推进AUV的动力学模型, 验证了矢量推进器能提高AUV的机动性, 但同时存在降低AUV能效的问题; 另一方面, 文中利用改进的近端策略优化算法(IPPO)解决联合控制背景下的路径规划问题, 该方法首先针对问题的动作空间有界的特点利用Beta分布建模策略分布, 并根据矢量推进器的特性在奖励函数中加大对矢量喷管控制的惩罚; 其次改进了近端策略优化算法(PPO)的参数更新策略并引入了“回滚机制”提高了算法的收敛效率。仿真结果验证了文中所提算法能够在联合控制的背景和复杂的环境中完成路径规划任务, 且在收敛速度及路径的最优性上优于未改进的算法。

     

  • 图  1  矢量推进AUV路径规划示意图

    Figure  1.  Schematic diagram of path planning for AUV with vector propulsion

    图  2  不同推进模式下路径示意图及达到同样速度时的相对功率

    Figure  2.  Schematic diagrams of paths under different propulsion modes and the relative power at the same speed

    图  3  AUV声呐模型示意图与某次声呐探测数据示例

    Figure  3.  Schematic diagram of AUV sonar model and an example of sonar detection data from a certain operation

    图  4  算法总体框图

    Figure  4.  Overall framework of the algorithm

    图  5  策略网络结构图

    Figure  5.  Architecture diagram of policy network

    图  6  参数更新流程伪代码

    Figure  6.  Pseudo code of parameter update

    图  7  算法流程伪代码图

    Figure  7.  Pseudo code of algorithm flow

    图  8  不同更新比例下算法性能

    Figure  8.  Algorithm performance under different update ratios

    图  9  不同算法奖励函数曲线

    Figure  9.  Reward function curves of different algorithms

    图  10  采用不同组件时算法性能

    Figure  10.  Algorithm performance when using different components

    图  11  环境1下3种算法的路径和控制量变化

    Figure  11.  Paths and control quantity changes of three algorithms in environment 1

    图  12  环境2下3种算法的路径和控制量变化

    Figure  12.  Paths and control quantity changes of three algorithms in environment 2

    图  13  环境3下2种控制方式的路径和控制量变化

    Figure  13.  Paths and control quantity changes of two control modes in environment 3

    图  14  环境4下2种控制方式路径和控制量变化

    Figure  14.  Paths and control quantity changes of two control modes in environment 4

    表  1  算法主要超参数表

    Table  1.   Main hyper-parameters of the algorithm

    超参数 数值 超参数 数值
    奖励$ {r}_{{\textit{z}}} $超参M 100 隐藏层神经元数 100
    奖励$ {r}_{s} $超参b 1 循环次数K 1
    奖励$ {r}_{d} $超参d 0.5 折扣因子$ \gamma $ 0.95
    奖励$ {r}_{s} $超参a 1 $ {L}^{\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{e}}\left(\theta \right) $超参$ \delta $ 0.05
    奖励$ {r}_{d} $超参c 0.2 经验池容量$ {M}_{b} $ 2 048
    $ \hat{{A}_{t}} $超参$ \lambda $ 0.9 探索程度$ {{c}}_{0} $ 0.01
    $ {L}^{\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{e}}\left(\theta \right) $超参$ \alpha $ 5 拼接神经元数 16
    批处理量$ {b}_{s} $ 64
    下载: 导出CSV

    表  2  2种环境下3种算法路径特征

    Table  2.   Path feature for three algorithms in two environments

    环境 算法 规划时长/s 规划长度/m
    环境1 PPO 220 1 477.6
    IPPO 218 1 471.4
    A2C 227 1 533.5
    环境2 PPO 165 1 107.5
    IPPO 155 1 046.7
    A2C 164 1 083.7
    下载: 导出CSV

    表  3  2种环境下2种控制方式路径特征

    Table  3.   Path feature of two control modes in two environments

    环境 控制方式 规划时长/s 规划长度/m
    环境3 舵角 115.0 706.7
    舵板+矢量推进 18.7 106.9
    环境4 舵角
    舵板+矢量推进 95 621.9
    下载: 导出CSV
  • [1] 崔荣鑫, 徐德民, 严卫生. 一种自主水下航行器路径规划算法[J]. 系统仿真学报, 2006, 18(12): 3373-3376. doi: 10.3969/j.issn.1004-731X.2006.12.012

    CUI R X, XU D M, YAN W S. A path planning algorithm for autonomous underwater vehicles[J]. Journal of System Simulation, 2006, 18(12): 3373-3376. doi: 10.3969/j.issn.1004-731X.2006.12.012
    [2] FAN X J, GUO Y J, LIU H, et al. Improved artificial potential field method applied for AUV path planning[J]. Mathematical Problems in Engineering, 2020, 4(1): 1-21.
    [3] ZHANG W, WANG N, WU W. A hybrid path planning algorithm considering AUV dynamic constraints based on improved A* algorithm and APF algorithm[J]. Ocean Engineering, 2023, 285(10): 115333.
    [4] LI J, YANG C. AUV path planning based on improved RRT and Bezier curve optimization[C]//2020 IEEE International Conference on Mechatronics and Automation (ICMA). China, Beijing: IEEE, 2020: 1359-1364.
    [5] HAO K, ZHAO J, LI Z, et al. Dynamic path planning of a three-dimensional underwater AUV based on an adaptive genetic algorithm[J]. Ocean Engineering, 2022, 263(4): 112421.
    [6] MINH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
    [7] HUANG H Q, JIN C. A novel particle swarm optimization algorithm based on reinforcement learning mechanism for AUV path planning[J]. Complexity, 2021, 12(1): 1-13.
    [8] ZHANG H B, SHI X P. An improved quantum behaved particle swarm optimization algorithm combined with reinforcement learning for AUV path planning[J]. Journal of Robotics, 2023, 5(1): 8821906.
    [9] 李佩娟, 颜庭武, 杨书涛, 等. 基于强化学习的无人水面艇能耗最优路径规划算法[J]. 水下无人系统学报, 2023, 31(2): 237-243.

    LI P J, YAN T W, YANG S T, et al. Optimization path planning algorithm for energy consumption of unmanned surface vessels based on reinforcement learning[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 237-243.
    [10] XI M, YANG J C, WEN J B, et al. Comprehensive ocean information-enabled AUV path planning via reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(18): 17440-17451. doi: 10.1109/JIOT.2022.3155697
    [11] YANG J, NI J, XI M, et al, Intelligent path planning of underwater robot based on reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2022, 20(3): 1983-1996.
    [12] CHU Z Z, WANG F L, LEI T J, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2022, 8(1), 108-120.
    [13] YUAN J, WANG H, ZHANG H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. J. Mar. Sci. Eng, 2021, 9(11): 1166.
    [14] HE Z, DONG L, SUN C. Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(5): 2757-2769.
    [15] WANG Z, ZHANG S, FENG X, et al. Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning[C]//Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 2021, 235(10): 1787-1796.
    [16] TANG Z C, CAO X, ZHOU Z H, et al, Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning[J]. Ocean Engineering, 2024, 301(11): 117547.
    [17] LYU X, SUN Y, WANG L, et al. End-to-end AUV local motion planning method based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2023, 11(9): 1796.
    [18] BEHNAZ H, ALIREZA K, POURIA S, Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle[J]. Applied Ocean Research, 2022, 129(10), 103326.
    [19] WANG H, GAO W, WANG Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. J. Mar. Sci. Eng, 2024, 12(1): 63.
    [20] SUN Y, CHENG J, ZHANG G, et al. Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning[J]. Journal of Intelligent & Robotic Systems, 2019, 96(5): 591-601.
    [21] JIANG D, FANG Z, CHENG C, et al. Action guidance-based deep interactive reinforcement learning for AUV path planning[C]//2022 International Conference on Machine Learning, Control, and Robotics(MLCR). Suzhou, China: IEEE, 2022: 158-165.
    [22] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2025-08-01]. http://arXiv.1707.06347.
    [23] SAMARTH S, HOMANGA B, D2RL: Deep dense architectures in reinforcement learning[C]//NeurIPS. Vancouver, Canada: IEEE, 2020.
    [24] 屈裕安, 谢寿生, 宋志平. 矢量喷管控制对发动机性能的影响[J]. 航空动力学报, 2004, 19(3): 300-304.
    [25] WANG Y H, HE H, TAN X Y, Truly proximal policy optimization[C]//Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. Tel Aviv, Israel: PMLR, 2020(115): 113-122.
    [26] 郭可建, 林晓波, 郝程鹏, 等. 基于神经网络状态估计器的高速AUV强化学习控制[J]. 水下无人系统学报, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002

    GUO K J, LIN X B, HAO C P, et al. High speed AUV reinforcement learning control based on neural network state estimator[J]. Journal of Unmanned Underaea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002
  • 加载中
图(14) / 表(3)
计量
  • 文章访问数:  63
  • HTML全文浏览量:  9
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-01-09
  • 修回日期:  2025-02-06
  • 录用日期:  2025-02-12
  • 网络出版日期:  2025-07-30

目录

    /

    返回文章
    返回
    服务号
    订阅号