Dynamic Obstacle Avoidance for Autonomous Underwater Vehicles via VO-PPO
-
摘要: 自主水下航行器(AUV)执行军事任务时, 高效、安全的动态避障能力至关重要。针对传统强化学习方法在AUV避障训练中存在碰撞风险高和收敛速度慢的缺陷, 提出了一种融合改进速度障碍(VO)法与近端策略优化(PPO)的AUV动态避障算法(VO-PPO)。该算法在传统VO框架中引入安全裕度和时间窗口机制, 提升了避障决策的安全性和高效性; 同时, 通过构建“离散检查-连续执行”的安全动作掩码, 将几何安全约束嵌入策略优化过程, 并结合状态空间解耦与多目标奖励设计, 引导策略兼顾安全性、效率和轨迹平滑性。仿真实验结果表明, 相比传统速度障碍法, VO-PPO能够生成更符合AUV运动特性的平滑避障路径; 相比基线PPO算法, 其避障成功率提高53%, 训练收敛速度加快67.5%, 累积奖励提高56.7%, 有效缓解了高碰撞风险和收敛缓慢的问题。Abstract: Efficient and safe dynamic obstacle avoidance is crucial for Autonomous Underwater Vehicles (AUV) performing military missions. To address the high collision risk and slow convergence of conventional reinforcement learning–based approaches in AUV obstacle-avoidance training, this paper proposes a dynamic obstacle-avoidance algorithm for AUV, termed VO-PPO, which integrates an improved velocity obstacle (VO) method with proximal policy optimization (PPO). In the traditional VO framework, the algorithm introduces a safety margin and a time-window mechanism to enhance the safety and efficiency of obstacle-avoidance decisions. Meanwhile, by constructing a “discrete-check–continuous-execution” safe action mask, it embeds geometric safety constraints into the policy optimization process. Combined with state-space decoupling and a multi-objective reward design, the proposed method guides the learned policy to balance safety, efficiency, and trajectory smoothness. Simulation results show that, compared with the traditional VO method, VO-PPO generates smoother obstacle-avoidance paths that better match the motion characteristics of AUV; compared with a baseline PPO algorithm, it improves the obstacle-avoidance success rate by 53%, accelerates training convergence by 67.5%, and increases the accumulated reward by 56.7%, effectively mitigating the problems of high collision risk and slow convergence.
-
表 1 算法主要参数设置表
Table 1. Hyperparameter setting table
序号 主要参数 符号 数值 1 折扣因子 $ {\delta }_{\mathrm{buffer}} $ 0.99 2 GAE系数 $ {\delta }_{\mathrm{buffer}} $ 0.95 3 裁剪参数 $ {\delta }_{\mathrm{buffer}} $ 0.2 4 Actor学习率 / 1×10−4 5 Critic学习率 / 1×10−4 6 价值函数权重 $ {\delta }_{\mathrm{buffer}} $ 1.0 7 熵系数 $ {\delta }_{\mathrm{buffer}} $ 0.01 8 回合最大步长 / 1 000 9 最大训练步数 / 15×105 10 安全裕度/m $ {\delta }_{\mathrm{buffer}} $ 1.5 11 缓冲余量/m $ {\delta }_{\mathrm{buffer}} $ 4 12 批大小 / 8 13 步长 $ n=1,2,3,4,5 $ 0.1 表 2 不同最近障碍物数量结果对比表
Table 2. Comparison results with different numbers of recent obstacles
n 成功率/% 平均路径长度/m 平均任务时长/s 1 84 113.4 63 2 85 114.1 64 3 88 116.6 65 4 82 120.2 68 5 77 121.6 71 表 3 不同时间窗口下的对比表
Table 3. Comparison tables under different time Windows
$ \tau $/s 成功率/% 平均路径长度/m 平均任务时长/s 4 73 111.4 62 5 78 115.9 64 6 88 116.6 65 7 88 120.2 69 8 86 122.6 71 表 4 不同场景下算法表现情况表
Table 4. Comparison table with the results of traditional algorithms
场景 成功率/% 平均路径长度/m 平均任务时长/s 1 94 106.7 61 2 93 108.4 62 3 88 116.6 65 4 82 121.4 68 5 56 136.6 75 表 5 算法结果对比表
Table 5. Comparison table of algorithm results
算法 成功率/% 平均路径长度/m 平均任务时长/s VO-PPO 88 116.6 65 SAC 41 118.6 71 VO 84 124.1 63 PPO 35 121.4 77 PPO-Decoupled 37 122.6 75 -
[1] 郭银景, 鲍建康, 刘琦, 等. AUV实时避障算法研究进展[J]. 水下无人系统学报, 2020, 28(4): 351-358,369.Guo Y J, Bao J K, Liu Q, et al. Research progress of real-time obstacle avoidance algorithms for unmanned undersea vehicle: A review[J]. Journal of Unmanned Undersea Systems, 2020, 28(4): 351-358,369. [2] 朱仲本, 张嘉豪, 薛祎凡, 等. 洋流环境下基于DVFH+的AUV避障控制[J]. 水下无人系统学报, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077Zhu Z B, Zhang J H, Xue Y F, et al. Obstacle avoidance control of autonomous undersea vehicle based on DVFH+ in ocean current environment[J]. Journal of Unmanned Undersea Systems, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077 [3] 侯海平, 钱家昌, 赵楠, 等. 自主式水下航行器水下生存力关键技术[J]. 舰船科学技术, 2023, 45(11): 98-101.HOU H P, QIAN J C, ZHAO N, et al. Key technologies of underwater survivability of AUV[J]. Ship Science and Technology, 2023, 45(11): 98-101. [4] LI C, GUO S, GUO J. Study on obstacle avoidance strategy using multiple ultrasonic sensors for spherical underwater robots[J]. IEEE Sensors Journal, 2022, 22(24): 24458-24470. doi: 10.1109/JSEN.2022.3220246 [5] Hao L Y, Dong G G, Li T S, et al. Path-following control with obstacle avoidance of autonomous surface vehicles subject to actuator faults[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 956-964. doi: 10.1109/JAS.2023.123675 [6] LIN C, LIU Y, LIN S. An adaptive dynamic window approach for UUV obstacle avoidance planning in 3D environments[J]. Journal of Physics: Conference Series, 2024, 2704: 012026. doi: 10.1088/1742-6596/2704/1/012026 [7] 唐意成. 基于改进动态窗口法的无人艇动态避障方法研究[J]. 通信与信息技术, 2025(2): 23-27.Tang Y C. Research on dynamic obstacle avoidance method of unmanned surface vehicle on improved dynamic window approach.[J]. Communication & Information Technology, 2025(2): 23-27. [8] Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The Inter-national Journal of Robotics Research, 1998, 17(7): 760-772. doi: 10.1177/027836499801700706 [9] Sun Y, Luo X, Ran X, et al. A 2D Optimal Path Planning Algorithm for Autonomous Underwater Vehicle Driving in Unknown Underwater Canyons[J]. Journal of Marine Science and Engineering, 2021, 9(3): 252. doi: 10.3390/jmse9030252 [10] Sun Y, Ran X, Zhang G, et al. AUV 3D path planning based on the improved hierarchical deep Q network[J]. Journal of Marine Science and Engineering, 2020, 8(2): 145. doi: 10.3390/jmse8020145 [11] Pang W, Zhu D, Sun C. Multi-AUV formation reconfiguration obstacle avoidance algorithm based on affine transformation and improved artificial potential field under ocean currents disturbance[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(2): 1469-1487. doi: 10.1109/TASE.2023.3245818 [12] 张艳, 李炳华, 霍涛, 等. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564. doi: 10.16182/j.issn1004731x.joss.24-0143Zhang Y, Li B H, Huo T, et al. Research on robot dynamic obstacle avoidance method based on improved A* and Dynamic window algorithm[J]. Journal of System Simulation, 2025, 37(6): 1555-1564 . doi: 10.16182/j.issn1004731x.joss.24-0143 [13] Zhang W, Wei S, Teng Y, et al. Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J]. Sensors, 2017, 17: 2742. doi: 10.3390/s17122742 [14] 许文瑶, 贺继林. 基于改进速度障碍法的水下机器人动态避障[J]. 电光与控制, 2021, 28(12): 86-90.Xu W Y, He J L. Dynamic obstacle avoidance for ROV based on improved velocity obstacle method[J]. Electronics Optics & Control, 2021, 28(12): 86-90. [15] 章飞, 胡春磊. 基于滚动速度障碍法的AUV动态避障路径规划[J]. 水下无人系统学报, 2021, 29(1): 30.Zhang F, Hu C L. Research on AUV dynamic obstacle avoidance path planning based on the rolling speed obstacle method[J]. Journal of Unmanned Undersea Systems, 2021, 29(1): 30–38. [16] Wang H, Gao W, Wang Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering, 2023, 12: 63. doi: 10.3390/jmse12010063 [17] Xu J, Huang F, Wu D, et al. A learning method for AUV collision avoidance through deep reinforcement learning[J]. Ocean Engineering, 2022, 260: 112038. doi: 10.1016/j.oceaneng.2022.112038 [18] 潘云伟, 李敏, 曾祥光, 等. 基于形状离散层的多智能体编队控制[J]. 计算机科学, 2025, 52(10): 287-295.Pan Y W, Li M, Zeng X G, et al. Multi-agent formation control based on discrete layers of formation shapes[J]. Computer Science, 2025, 52(10): 287-295. [19] Jianya Y, Wang H, Zhang H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9: 1166. doi: 10.3390/jmse9111166 [20] Chu Z, Wang F, Lei T, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 108-120. doi: 10.1109/TIV.2022.3153352 [21] Gao X, Yan L, Li Z, et al. Improved deep deterministic policy gradient for dynamic obstacle avoidance of mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(6): 3675-3682. doi: 10.1109/TSMC.2022.3230666 [22] 李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422Li M, Ye W Z, Yan J H. Path planning of desert robot based on deep reinforcement learning[J]. Journal of System Simulation, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422 [23] 邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767.Xing L J, Li M, Zeng X G, et al. AUV path planning based on behavior cloning and improved DQN in partially unknown environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767. [24] 宗律, 李立刚, 贺则昊, 等. 融合速度障碍法和DQN的无人船避障方法[J]. 电子测量技术, 2024, 47(20): 60-67.Zong L, Li L G, He Z H, et al. Obstacle avoidance method for USV combining velocity obstacle method and DQN[J]. Electronic Measurement Technology, 2024, 47(20): 60-67. [25] Zhu G, Shen Z, Liu L, et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm[J]. IEEE Access, 2022, 10: 121340-121351. doi: 10.1109/ACCESS.2022.3223382 [26] 蔡泽, 胡耀光, 闻敬谦, 等. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020Cai Z, Hu Y G, Wen J Q, et al. AGV obstacle avoidance method based on deep reinforcement learning in complex dynamic environment[J]. Computer Integrated Manufacturing Systems, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020 [27] 潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300Pan Y W, Li M, Zeng X G, et al. AUV obstacle avoidance and path planning based on artificial potential field and improved reinforcement learning[J]. Acta Armamentarii, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300 [28] Helbing D, Farkas I, Vicsek T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023 [29] Almeida A, Ramalho G, Santana H, et al. Recent advances on multi-agent patrolling[C]//17th Brazilian Symposium on Artificial Intelligence, SBIA 2004. 2004: 474-483. [30] Litimein H, Huang Z Y, Hamza A. A survey on techniques in the circular formation of multi-agent systems[J]. Electronics, 2021, 10(23): 2959. doi: 10.3390/electronics10232959 -

下载: