Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO

ZHANG Tao; ZENG Xiangguang; LI Min; XIE Dijie; REN Wenzhe; PENG Bei

doi:10.11993/j.issn.2096-3920.2025-0154

Volume 34 Issue 2

Apr 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Unmanned Undersea Systems > 2026 > 34(2): 326-337, 362

ZHANG Tao, ZENG Xiangguang, LI Min, XIE Dijie, REN Wenzhe, PENG Bei. Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO[J]. Journal of Unmanned Undersea Systems, 2026, 34(2): 326-337, 362. doi: 10.11993/j.issn.2096-3920.2025-0154

Citation:

ZHANG Tao, ZENG Xiangguang, LI Min, XIE Dijie, REN Wenzhe, PENG Bei. Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO[J]. Journal of Unmanned Undersea Systems, 2026, 34(2): 326-337, 362. doi: 10.11993/j.issn.2096-3920.2025-0154

Citation:

PDF( 3298 KB)

Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO

doi: 10.11993/j.issn.2096-3920.2025-0154

1.
School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
2.
School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Received Date: 2025-11-11
Accepted Date: 2025-12-24
Rev Recd Date: 2025-12-12

Available Online: 2026-03-16

Abstract

Abstract

Efficient and safe dynamic obstacle avoidance is crucial for autonomous underwater vehicles(AUV) performing military missions. To address the high collision risk and slow convergence of conventional reinforcement learning-based approaches in AUV obstacle-avoidance training, this paper proposes a dynamic obstacle-avoidance algorithm for AUV, termed VO-PPO, which integrates an improved velocity obstacle(VO) method with proximal policy optimization(PPO). In the traditional VO framework, the algorithm introduces a safety margin and a time-window mechanism to enhance the safety and efficiency of obstacle-avoidance decisions. Meanwhile, by constructing a “discrete-check-continuous-execution” safe action mask, it embeds geometric safety constraints into the policy optimization process. Combined with state-space decoupling and a multi-objective reward design, the proposed method guides the learned policy to balance safety, efficiency, and trajectory smoothness. Simulation results show that, compared with the traditional VO method, VO-PPO generates smoother obstacle-avoidance paths that better match the motion characteristics of AUV; compared with a baseline PPO algorithm, it improves the obstacle-avoidance success rate by 53%, accelerates training convergence by 67.5%, and increases the accumulated reward by 56.7%, effectively mitigating the problems of high collision risk and slow convergence.
- autonomous underssea vehicle,
- dynamic obstacle avoidance,
- proximal policy optimization,
- velocity obstacle method,
- action mask

FullText(HTML)

References(30)

References

[1]	郭银景, 鲍建康, 刘琦, 等. AUV实时避障算法研究进展[J]. 水下无人系统学报, 2020, 28(4): 351-358, 369. Guo Y J, Bao J K, Liu Q, et al. Research progress of real-time obstacle avoidance algorithms for unmanned undersea vehicle: A review[J]. Journal of Unmanned Undersea Systems, 2020, 28(4): 351-358, 369.
[2]	朱仲本, 张嘉豪, 薛祎凡, 等. 洋流环境下基于DVFH+的AUV避障控制[J]. 水下无人系统学报, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077 Zhu Z B, Zhang J H, Xue Y F, et al. Obstacle avoidance control of autonomous undersea vehicle based on DVFH+ in ocean current environment[J]. Journal of Unmanned Undersea Systems, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077
[3]	侯海平, 钱家昌, 赵楠, 等. 自主式水下航行器水下生存力关键技术[J]. 舰船科学技术, 2023, 45(11): 98-101. Hou H P, Qian J C, Zhao N, et al. Key technologies of underwater survivability of AUV[J]. Ship Science and Technology, 2023, 45(11): 98-101.
[4]	Li C, Guo S, Guo J. Study on obstacle avoidance strategy using multiple ultrasonic sensors for spherical underwater robots[J]. IEEE Sensors Journal, 2022, 22(24): 24458-24470. doi: 10.1109/JSEN.2022.3220246
[5]	Hao L Y, Dong G G, Li T S, et al. Path-following control with obstacle avoidance of autonomous surface vehicles subject to actuator faults[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 956-964. doi: 10.1109/JAS.2023.123675
[6]	Lin C, Liu Y, Lin S. An adaptive dynamic window approach for UUV obstacle avoidance planning in 3D environments[J]. Journal of Physics: Conference Series, 2024, 2704: 012026. doi: 10.1088/1742-6596/2704/1/012026
[7]	唐意成. 基于改进动态窗口法的无人艇动态避障方法研究[J]. 通信与信息技术, 2025(2): 23-27.
[8]	Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The International Journal of Robotics Research, 1998, 17(7): 760-772. doi: 10.1177/027836499801700706
[9]	Sun Y, Luo X, Ran X, et al. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons[J]. Journal of Marine Science and Engineering, 2021, 9(3): 252. doi: 10.3390/jmse9030252
[10]	Sun Y, Ran X, Zhang G, et al. AUV 3D path planning based on the improved hierarchical deep Q network[J]. Journal of Marine Science and Engineering, 2020, 8(2): 145. doi: 10.3390/jmse8020145
[11]	Pang W, Zhu D, Sun C. Multi-AUV formation reconfiguration obstacle avoidance algorithm based on affine transformation and improved artificial potential field under ocean currents disturbance[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(2): 1469-1487. doi: 10.1109/TASE.2023.3245818
[12]	张艳, 李炳华, 霍涛, 等. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564. doi: 10.16182/j.issn1004731x.joss.24-0143
[13]	Zhang W, Wei S, Teng Y, et al. Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J]. Sensors, 2017, 17: 2742. doi: 10.3390/s17122742
[14]	许文瑶, 贺继林. 基于改进速度障碍法的水下机器人动态避障[J]. 电光与控制, 2021, 28(12): 86-90.
[15]	章飞, 胡春磊. 基于滚动速度障碍法的AUV动态避障路径规划[J]. 水下无人系统学报, 2021, 29(1): 30-38. Zhang F, Hu C L. Research on AUV dynamic obstacle avoidance path planning based on the rolling speed obstacle method[J]. Journal of Unmanned Undersea Systems, 2021, 29(1): 30-38.
[16]	Wang H, Gao W, Wang Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering, 2023, 12: 63. doi: 10.3390/jmse12010063
[17]	Xu J, Huang F, Wu D, et al. A learning method for AUV collision avoidance through deep reinforcement learning[J]. Ocean Engineering, 2022, 260: 112038. doi: 10.1016/j.oceaneng.2022.112038
[18]	潘云伟, 李敏, 曾祥光, 等. 基于形状离散层的多智能体编队控制[J]. 计算机科学, 2025, 52(10): 287-295. Pan Y W, Li M, Zeng X G, et al. Multi-agent formation control based on discrete layers of formation shapes[J]. Computer Science, 2025, 52(10): 287-295.
[19]	Yuan J Y, Wang H J, Zhang H H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9: 1166. doi: 10.3390/jmse9111166
[20]	Chu Z, Wang F, Lei T, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 108-120. doi: 10.1109/TIV.2022.3153352
[21]	Gao X, Yan L, Li Z, et al. Improved deep deterministic policy gradient for dynamic obstacle avoidance of mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(6): 3675-3682. doi: 10.1109/TSMC.2022.3230666
[22]	李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422
[23]	邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767. Xing L J, Li M, Zeng X G, et al. AUV path planning based on behavior cloning and improved DQN in partially unknown environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767.
[24]	宗律, 李立刚, 贺则昊, 等. 融合速度障碍法和DQN的无人船避障方法[J]. 电子测量技术, 2024, 47(20): 60-67.
[25]	Zhu G, Shen Z, Liu L, et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm[J]. IEEE Access, 2022, 10: 121340-121351. doi: 10.1109/ACCESS.2022.3223382
[26]	蔡泽, 胡耀光, 闻敬谦, 等. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020
[27]	潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300 Pan Y W, Li M, Zeng X G, et al. AUV obstacle avoidance and path planning based on artificial potential field and improved reinforcement learning[J]. Acta Armamentarii, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300
[28]	Helbing D, Farkas I, Vicsek T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023
[29]	Almeida A, Ramalho G, Santana H, et al. Recent advances on multi-agent patrolling[C]//17th Brazilian Symposium on Artificial Intelligence, SBIA 2004, 2004: 474-483.
[30]	Litimein H, Huang Z Y, Hamza A. A survey on techniques in the circular formation of multi-agent systems[J]. Electronics, 2021, 10(23): 2959. doi: 10.3390/electronics10232959