Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning

PANG Zhouqi; LIN Xiaobo; HAO Chengpeng; CHENG Wenxin

doi:10.11993/j.issn.2096-3920.2025-0005

Volume 33 Issue 4

Aug 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Unmanned Undersea Systems > 2025 > 33(4): 638-647

PANG Zhouqi, LIN Xiaobo, HAO Chengpeng, CHENG Wenxin. Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 638-647. doi: 10.11993/j.issn.2096-3920.2025-0005

Citation:

PANG Zhouqi, LIN Xiaobo, HAO Chengpeng, CHENG Wenxin. Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning[J]. Journal of Unmanned Undersea Systems, 2025, 33(4): 638-647. doi: 10.11993/j.issn.2096-3920.2025-0005

Citation:

PDF( 1689 KB)

Vector Propulsion AUV Path Planning Method Based on Deep Reinforcement Learning

doi: 10.11993/j.issn.2096-3920.2025-0005

1.
Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
2.
Program Management Center of Naval Armament Department, Beijing 100071, China

Received Date: 2025-01-09
Accepted Date: 2025-02-12
Rev Recd Date: 2025-02-06

Available Online: 2025-07-30

Abstract

Abstract

This study proposed a joint control method of “rudder + vector thruster” and utilized deep reinforcement learning technology to allocate the usage ratio of rudder and vector nozzle, enabling autonomous undersea vehicles(AUVs) to achieve enhanced path planning capabilities. This method balanced the high energy efficiency of rudder control and the high maneuverability of vector nozzle control, allowing the AUV to reach the target point with lower energy consumption. On the one hand, the study established a dynamic model of vector propulsion AUVs and verified that vector thrusters improved the maneuverability of AUVs but simultaneously reduced the AUV’s energy efficiency. On the other hand, the study employed an improved proximal policy optimization(IPPO) algorithm to solve the path planning problem under joint control mode. Firstly, considering the bounded nature of the action space for the problem, this method modeled the policy distribution using a Beta distribution and increased the penalty for vector nozzle control in the reward function based on the characteristics of the vector thruster. Secondly, the study improved the parameter update strategy of (proximal policy optimization)PPO and introduced a “rollback mechanism” to enhance the convergence efficiency of the algorithm. The simulation results verified that the proposed algorithm completed path planning tasks in complex environments under joint control, and it outperformed the unimproved algorithm in terms of convergence speed and path optimality.
- autonomous undersea vehicle,
- deep reinforcement learning,
- path planning,
- vector propulsion

FullText(HTML)

References(26)

References

[1]	崔荣鑫, 徐德民, 严卫生. 一种自主水下航行器路径规划算法[J]. 系统仿真学报, 2006, 18(12): 3373-3376. doi: 10.3969/j.issn.1004-731X.2006.12.012 CUI R X, XU D M, YAN W S. A path planning algorithm for autonomous underwater vehicles[J]. Journal of System Simulation, 2006, 18(12): 3373-3376. doi: 10.3969/j.issn.1004-731X.2006.12.012
[2]	FAN X J, GUO Y J, LIU H, et al. Improved artificial potential field method applied for AUV path planning[J]. Mathematical Problems in Engineering, 2020, 4(1): 1-21.
[3]	ZHANG W, WANG N, WU W. A hybrid path planning algorithm considering AUV dynamic constraints based on improved A* algorithm and APF algorithm[J]. Ocean Engineering, 2023, 285(10): 115333.
[4]	LI J, YANG C. AUV path planning based on improved RRT and Bezier curve optimization[C]//2020 IEEE International Conference on Mechatronics and Automation (ICMA). China, Beijing: IEEE, 2020: 1359-1364.
[5]	HAO K, ZHAO J, LI Z, et al. Dynamic path planning of a three-dimensional underwater AUV based on an adaptive genetic algorithm[J]. Ocean Engineering, 2022, 263(4): 112421.
[6]	MINH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[7]	HUANG H Q, JIN C. A novel particle swarm optimization algorithm based on reinforcement learning mechanism for AUV path planning[J]. Complexity, 2021, 12(1): 1-13.
[8]	ZHANG H B, SHI X P. An improved quantum behaved particle swarm optimization algorithm combined with reinforcement learning for AUV path planning[J]. Journal of Robotics, 2023, 5(1): 8821906.
[9]	李佩娟, 颜庭武, 杨书涛, 等. 基于强化学习的无人水面艇能耗最优路径规划算法[J]. 水下无人系统学报, 2023, 31(2): 237-243. LI P J, YAN T W, YANG S T, et al. Optimization path planning algorithm for energy consumption of unmanned surface vessels based on reinforcement learning[J]. Journal of Unmanned Undersea Systems, 2023, 31(2): 237-243.
[10]	XI M, YANG J C, WEN J B, et al. Comprehensive ocean information-enabled AUV path planning via reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(18): 17440-17451. doi: 10.1109/JIOT.2022.3155697
[11]	YANG J, NI J, XI M, et al, Intelligent path planning of underwater robot based on reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2022, 20(3): 1983-1996.
[12]	CHU Z Z, WANG F L, LEI T J, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2022, 8(1), 108-120.
[13]	YUAN J, WANG H, ZHANG H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. J. Mar. Sci. Eng, 2021, 9(11): 1166.
[14]	HE Z, DONG L, SUN C. Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(5): 2757-2769.
[15]	WANG Z, ZHANG S, FENG X, et al. Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning[C]//Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 2021, 235(10): 1787-1796.
[16]	TANG Z C, CAO X, ZHOU Z H, et al, Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning[J]. Ocean Engineering, 2024, 301(11): 117547.
[17]	LYU X, SUN Y, WANG L, et al. End-to-end AUV local motion planning method based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2023, 11(9): 1796.
[18]	BEHNAZ H, ALIREZA K, POURIA S, Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle[J]. Applied Ocean Research, 2022, 129(10), 103326.
[19]	WANG H, GAO W, WANG Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. J. Mar. Sci. Eng, 2024, 12(1): 63.
[20]	SUN Y, CHENG J, ZHANG G, et al. Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning[J]. Journal of Intelligent & Robotic Systems, 2019, 96(5): 591-601.
[21]	JIANG D, FANG Z, CHENG C, et al. Action guidance-based deep interactive reinforcement learning for AUV path planning[C]//2022 International Conference on Machine Learning, Control, and Robotics(MLCR). Suzhou, China: IEEE, 2022: 158-165.
[22]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2025-08-01]. http://arXiv.1707.06347.
[23]	SAMARTH S, HOMANGA B, D2RL: Deep dense architectures in reinforcement learning[C]//NeurIPS. Vancouver, Canada: IEEE, 2020.
[24]	屈裕安, 谢寿生, 宋志平. 矢量喷管控制对发动机性能的影响[J]. 航空动力学报, 2004, 19(3): 300-304.
[25]	WANG Y H, HE H, TAN X Y, Truly proximal policy optimization[C]//Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. Tel Aviv, Israel: PMLR, 2020(115): 113-122.
[26]	郭可建, 林晓波, 郝程鹏, 等. 基于神经网络状态估计器的高速AUV强化学习控制[J]. 水下无人系统学报, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002 GUO K J, LIN X B, HAO C P, et al. High speed AUV reinforcement learning control based on neural network state estimator[J]. Journal of Unmanned Underaea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002