Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO
-
摘要: 自主水下航行器(AUV)执行军事任务时, 高效、安全的动态避障能力至关重要。针对传统强化学习方法在AUV避障训练中存在碰撞风险高和收敛速度慢的缺陷, 提出了一种融合改进速度障碍(VO)法与近端策略优化(PPO)的AUV动态避障算法(VO-PPO)。该算法在传统VO框架中引入安全裕度和时间窗口机制, 提升了避障决策的安全性和高效性; 同时, 通过构建“离散检查-连续执行”的安全动作掩码, 将几何安全约束嵌入策略优化过程, 并结合状态空间解耦与多目标奖励设计, 引导策略兼顾安全性、效率和轨迹平滑性。仿真实验结果表明, 相比传统速度障碍法, VO-PPO能够生成更符合AUV运动特性的平滑避障路径; 相比基线PPO算法, 其避障成功率提高53%, 训练收敛速度加快67.5%, 累积奖励提高56.7%, 有效缓解了高碰撞风险和收敛缓慢的问题。Abstract: Efficient and safe dynamic obstacle avoidance is crucial for autonomous underwater vehicles(AUV) performing military missions. To address the high collision risk and slow convergence of conventional reinforcement learning-based approaches in AUV obstacle-avoidance training, this paper proposes a dynamic obstacle-avoidance algorithm for AUV, termed VO-PPO, which integrates an improved velocity obstacle(VO) method with proximal policy optimization(PPO). In the traditional VO framework, the algorithm introduces a safety margin and a time-window mechanism to enhance the safety and efficiency of obstacle-avoidance decisions. Meanwhile, by constructing a “discrete-check-continuous-execution” safe action mask, it embeds geometric safety constraints into the policy optimization process. Combined with state-space decoupling and a multi-objective reward design, the proposed method guides the learned policy to balance safety, efficiency, and trajectory smoothness. Simulation results show that, compared with the traditional VO method, VO-PPO generates smoother obstacle-avoidance paths that better match the motion characteristics of AUV; compared with a baseline PPO algorithm, it improves the obstacle-avoidance success rate by 53%, accelerates training convergence by 67.5%, and increases the accumulated reward by 56.7%, effectively mitigating the problems of high collision risk and slow convergence.
-
表 1 算法主要参数设置
Table 1. Main parameters setting of the algorithm
序号 主要参数 符号 数值 1 折扣因子 $ \gamma $ 0.99 2 GAE系数 $ \lambda $ 0.95 3 裁剪参数 $ \varepsilon $ 0.2 4 Actor学习率 — 1×10−4 5 Critic学习率 — 1×10−4 6 价值函数权重 $ c_1 $ 1.0 7 熵系数 $ c_2 $ 0.01 8 回合最大步长 — 1 000 9 最大训练步数 — 15×105 10 安全裕度/m $ {\delta }_{\mathrm{safe}} $ 1.5 11 缓冲余量/m $ {\delta }_{\mathrm{buffer}} $ 4 12 批大小 — 8 13 步长 $ \Delta {{t}}$ 0.1 表 2 不同最近障碍物数量性能对比表
Table 2. Performance comparison with different numbers of the nearest obstacles
n 成功率/% 平均路径长度/m 平均任务时长/s 1 84 113.4 63 2 85 114.1 64 3 88 116.6 65 4 82 120.2 68 5 77 121.6 71 表 3 不同时间窗口参数性能对比
Table 3. Performance comparison under parameter settings of different time windows
$ \tau $/s 成功率/% 平均路径长度/m 平均任务时长/s 4 73 111.4 62 5 78 115.9 64 6 88 116.6 65 7 88 120.2 69 8 86 122.6 71 表 4 不同障碍物密度场景下性能对比
Table 4. Performance comparison under different obstacle density scenarios
场景 成功率/% 平均路径长度/m 平均任务时长/s 1 94 106.7 61 2 93 108.4 62 3 88 116.6 65 4 82 121.4 68 5 56 136.6 75 表 5 不同算法性能对比表
Table 5. Performance comparison of different algorithms
算法 成功率/% 平均路径长度/m 平均任务时长/s VO-PPO 88 116.6 65 SAC 41 118.6 71 VO 84 124.1 63 PPO 35 121.4 77 PPO+状态解耦 37 122.6 75 -
[1] 郭银景, 鲍建康, 刘琦, 等. AUV实时避障算法研究进展[J]. 水下无人系统学报, 2020, 28(4): 351-358, 369.Guo Y J, Bao J K, Liu Q, et al. Research progress of real-time obstacle avoidance algorithms for unmanned undersea vehicle: A review[J]. Journal of Unmanned Undersea Systems, 2020, 28(4): 351-358, 369. [2] 朱仲本, 张嘉豪, 薛祎凡, 等. 洋流环境下基于DVFH+的AUV避障控制[J]. 水下无人系统学报, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077Zhu Z B, Zhang J H, Xue Y F, et al. Obstacle avoidance control of autonomous undersea vehicle based on DVFH+ in ocean current environment[J]. Journal of Unmanned Undersea Systems, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077 [3] 侯海平, 钱家昌, 赵楠, 等. 自主式水下航行器水下生存力关键技术[J]. 舰船科学技术, 2023, 45(11): 98-101.Hou H P, Qian J C, Zhao N, et al. Key technologies of underwater survivability of AUV[J]. Ship Science and Technology, 2023, 45(11): 98-101. [4] Li C, Guo S, Guo J. Study on obstacle avoidance strategy using multiple ultrasonic sensors for spherical underwater robots[J]. IEEE Sensors Journal, 2022, 22(24): 24458-24470. doi: 10.1109/JSEN.2022.3220246 [5] Hao L Y, Dong G G, Li T S, et al. Path-following control with obstacle avoidance of autonomous surface vehicles subject to actuator faults[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 956-964. doi: 10.1109/JAS.2023.123675 [6] Lin C, Liu Y, Lin S. An adaptive dynamic window approach for UUV obstacle avoidance planning in 3D environments[J]. Journal of Physics: Conference Series, 2024, 2704: 012026. doi: 10.1088/1742-6596/2704/1/012026 [7] 唐意成. 基于改进动态窗口法的无人艇动态避障方法研究[J]. 通信与信息技术, 2025(2): 23-27. [8] Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The International Journal of Robotics Research, 1998, 17(7): 760-772. doi: 10.1177/027836499801700706 [9] Sun Y, Luo X, Ran X, et al. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons[J]. Journal of Marine Science and Engineering, 2021, 9(3): 252. doi: 10.3390/jmse9030252 [10] Sun Y, Ran X, Zhang G, et al. AUV 3D path planning based on the improved hierarchical deep Q network[J]. Journal of Marine Science and Engineering, 2020, 8(2): 145. doi: 10.3390/jmse8020145 [11] Pang W, Zhu D, Sun C. Multi-AUV formation reconfiguration obstacle avoidance algorithm based on affine transformation and improved artificial potential field under ocean currents disturbance[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(2): 1469-1487. doi: 10.1109/TASE.2023.3245818 [12] 张艳, 李炳华, 霍涛, 等. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564. doi: 10.16182/j.issn1004731x.joss.24-0143 [13] Zhang W, Wei S, Teng Y, et al. Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J]. Sensors, 2017, 17: 2742. doi: 10.3390/s17122742 [14] 许文瑶, 贺继林. 基于改进速度障碍法的水下机器人动态避障[J]. 电光与控制, 2021, 28(12): 86-90. [15] 章飞, 胡春磊. 基于滚动速度障碍法的AUV动态避障路径规划[J]. 水下无人系统学报, 2021, 29(1): 30-38.Zhang F, Hu C L. Research on AUV dynamic obstacle avoidance path planning based on the rolling speed obstacle method[J]. Journal of Unmanned Undersea Systems, 2021, 29(1): 30-38. [16] Wang H, Gao W, Wang Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering, 2023, 12: 63. doi: 10.3390/jmse12010063 [17] Xu J, Huang F, Wu D, et al. A learning method for AUV collision avoidance through deep reinforcement learning[J]. Ocean Engineering, 2022, 260: 112038. doi: 10.1016/j.oceaneng.2022.112038 [18] 潘云伟, 李敏, 曾祥光, 等. 基于形状离散层的多智能体编队控制[J]. 计算机科学, 2025, 52(10): 287-295.Pan Y W, Li M, Zeng X G, et al. Multi-agent formation control based on discrete layers of formation shapes[J]. Computer Science, 2025, 52(10): 287-295. [19] Yuan J Y, Wang H J, Zhang H H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9: 1166. doi: 10.3390/jmse9111166 [20] Chu Z, Wang F, Lei T, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 108-120. doi: 10.1109/TIV.2022.3153352 [21] Gao X, Yan L, Li Z, et al. Improved deep deterministic policy gradient for dynamic obstacle avoidance of mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(6): 3675-3682. doi: 10.1109/TSMC.2022.3230666 [22] 李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422 [23] 邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767.Xing L J, Li M, Zeng X G, et al. AUV path planning based on behavior cloning and improved DQN in partially unknown environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767. [24] 宗律, 李立刚, 贺则昊, 等. 融合速度障碍法和DQN的无人船避障方法[J]. 电子测量技术, 2024, 47(20): 60-67. [25] Zhu G, Shen Z, Liu L, et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm[J]. IEEE Access, 2022, 10: 121340-121351. doi: 10.1109/ACCESS.2022.3223382 [26] 蔡泽, 胡耀光, 闻敬谦, 等. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020 [27] 潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300Pan Y W, Li M, Zeng X G, et al. AUV obstacle avoidance and path planning based on artificial potential field and improved reinforcement learning[J]. Acta Armamentarii, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300 [28] Helbing D, Farkas I, Vicsek T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023 [29] Almeida A, Ramalho G, Santana H, et al. Recent advances on multi-agent patrolling[C]//17th Brazilian Symposium on Artificial Intelligence, SBIA 2004, 2004: 474-483. [30] Litimein H, Huang Z Y, Hamza A. A survey on techniques in the circular formation of multi-agent systems[J]. Electronics, 2021, 10(23): 2959. doi: 10.3390/electronics10232959 -

下载: