Abstract:
Efficient and safe dynamic obstacle avoidance is crucial for Autonomous Underwater Vehicles (AUV) performing military missions. To address the high collision risk and slow convergence of conventional reinforcement learning–based approaches in AUV obstacle-avoidance training, this paper proposes a dynamic obstacle-avoidance algorithm for AUV, termed VO-PPO, which integrates an improved velocity obstacle (VO) method with proximal policy optimization (PPO). In the traditional VO framework, the algorithm introduces a safety margin and a time-window mechanism to enhance the safety and efficiency of obstacle-avoidance decisions. Meanwhile, by constructing a “discrete-check–continuous-execution” safe action mask, it embeds geometric safety constraints into the policy optimization process. Combined with state-space decoupling and a multi-objective reward design, the proposed method guides the learned policy to balance safety, efficiency, and trajectory smoothness. Simulation results show that, compared with the traditional VO method, VO-PPO generates smoother obstacle-avoidance paths that better match the motion characteristics of AUV; compared with a baseline PPO algorithm, it improves the obstacle-avoidance success rate by 53%, accelerates training convergence by 67.5%, and increases the accumulated reward by 56.7%, effectively mitigating the problems of high collision risk and slow convergence.