基于VO-PPO的自主水下航行器动态避障研究

张滔; 曾祥光; 李敏; 谢地杰; 任文哲; 彭倍

doi:10.11993/j.issn.2096-3920.2025-0154

基于VO-PPO的自主水下航行器动态避障研究

doi: 10.11993/j.issn.2096-3920.2025-0154

张滔^1,,
曾祥光^1,,
李敏^1,,
谢地杰¹,
任文哲^1,,
彭倍^2,

1.
西南交通大学机械工程学院, 四川成都, 610031
2.
电子科技大学机械与电气工程学院, 四川成都, 611731

基金项目: 国家自然科学基金项目 (52075456); 四川省科技厅重点研发计划项目 (2023YFG0285).

详细信息

作者简介:
张滔：张　滔(2000-), 男, 在读硕士, 主要研究方向为强化学习与智能控制

中图分类号: TJ630; U663
计量
- 文章访问数: 284
- HTML全文浏览量: 106
- PDF下载量: 150
- 被引次数: 0
出版历程
- 收稿日期: 2025-11-11
- 修回日期: 2025-12-12
- 录用日期: 2025-12-24
- 网络出版日期: 2026-03-16

Dynamic Obstacle Avoidance for Autonomous Undersea Vehicles via VO-PPO

1.
School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
2.
School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

摘要

摘要: 自主水下航行器(AUV)执行军事任务时, 高效、安全的动态避障能力至关重要。针对传统强化学习方法在AUV避障训练中存在碰撞风险高和收敛速度慢的缺陷, 提出了一种融合改进速度障碍(VO)法与近端策略优化(PPO)的AUV动态避障算法(VO-PPO)。该算法在传统VO框架中引入安全裕度和时间窗口机制, 提升了避障决策的安全性和高效性; 同时, 通过构建“离散检查-连续执行”的安全动作掩码, 将几何安全约束嵌入策略优化过程, 并结合状态空间解耦与多目标奖励设计, 引导策略兼顾安全性、效率和轨迹平滑性。仿真实验结果表明, 相比传统速度障碍法, VO-PPO能够生成更符合AUV运动特性的平滑避障路径; 相比基线PPO算法, 其避障成功率提高53%, 训练收敛速度加快67.5%, 累积奖励提高56.7%, 有效缓解了高碰撞风险和收敛缓慢的问题。
- 自主水下航行器 /
- 动态避障 /
- 近端策略优化 /
- 速度障碍法 /
- 动作掩码
Abstract: Efficient and safe dynamic obstacle avoidance is crucial for autonomous underwater vehicles(AUV) performing military missions. To address the high collision risk and slow convergence of conventional reinforcement learning-based approaches in AUV obstacle-avoidance training, this paper proposes a dynamic obstacle-avoidance algorithm for AUV, termed VO-PPO, which integrates an improved velocity obstacle(VO) method with proximal policy optimization(PPO). In the traditional VO framework, the algorithm introduces a safety margin and a time-window mechanism to enhance the safety and efficiency of obstacle-avoidance decisions. Meanwhile, by constructing a “discrete-check-continuous-execution” safe action mask, it embeds geometric safety constraints into the policy optimization process. Combined with state-space decoupling and a multi-objective reward design, the proposed method guides the learned policy to balance safety, efficiency, and trajectory smoothness. Simulation results show that, compared with the traditional VO method, VO-PPO generates smoother obstacle-avoidance paths that better match the motion characteristics of AUV; compared with a baseline PPO algorithm, it improves the obstacle-avoidance success rate by 53%, accelerates training convergence by 67.5%, and increases the accumulated reward by 56.7%, effectively mitigating the problems of high collision risk and slow convergence.
- autonomous underssea vehicle /
- dynamic obstacle avoidance /
- proximal policy optimization /
- velocity obstacle method /
- action mask

HTML全文

图 1 AUV坐标系示意图

Figure 1. Schematic diagram of AUV coordinate system

下载: 全尺寸图片幻灯片

图 2 声呐模型示意图

Figure 2. Schematic diagram of sonar model

下载: 全尺寸图片幻灯片

图 3 速度障碍锥构建示意图

Figure 3. Schematic diagram of the velocity obstacle cone construction

下载: 全尺寸图片幻灯片

图 4 不同情境下速度障碍锥的构建

Figure 4. Construction of velocity obstacle cone in different scenarios

下载: 全尺寸图片幻灯片

图 5 改进后的速度障碍锥示意图

Figure 5. Schematic diagram of the improved velocity obstacle cone

下载: 全尺寸图片幻灯片

图 6 VO-PPO算法框架图

Figure 6. Framework diagram of the VO-PPO algorithm

下载: 全尺寸图片幻灯片

图 7 状态空间解耦过程示意图

Figure 7. Schematic diagram of the state space decoupling process

下载: 全尺寸图片幻灯片

图 8 安全掩码在Actor网络中的应用

Figure 8. Application of security masks in the actor network

下载: 全尺寸图片幻灯片

图 9 AUV水下避障环境

Figure 9. AUV underwater obstacle avoidance environment

下载: 全尺寸图片幻灯片

图 10 不同最近障碍物数量奖励图

Figure 10. Reward graphs with different numbers of the nearest obstacles

下载: 全尺寸图片幻灯片

图 11 不同时间窗口参数奖励图

Figure 11. Reward chart under different time windows parameters

下载: 全尺寸图片幻灯片

图 12 不同算法下的AUV路径对比

Figure 12. Comparison of AUV paths under different algorithms

下载: 全尺寸图片幻灯片

图 13 各算法奖励对比图

Figure 13. Comparison of reward for each algorithm

下载: 全尺寸图片幻灯片

表 1 算法主要参数设置

Table 1. Main parameters setting of the algorithm

序号	主要参数	符号	数值
1	折扣因子	$ \gamma $	0.99
2	GAE系数	$ \lambda $	0.95
3	裁剪参数	$ \varepsilon $	0.2
4	Actor学习率	—	1×10⁻⁴
5	Critic学习率	—	1×10⁻⁴
6	价值函数权重	$ c_1 $	1.0
7	熵系数	$ c_2 $	0.01
8	回合最大步长	—	1 000
9	最大训练步数	—	15×10⁵
10	安全裕度/m	$ {\delta }_{\mathrm{safe}} $	1.5
11	缓冲余量/m	$ {\delta }_{\mathrm{buffer}} $	4
12	批大小	—	8
13	步长	$ \Delta {{t}}$	0.1

下载: 导出CSV

表 2 不同最近障碍物数量性能对比表

Table 2. Performance comparison with different numbers of the nearest obstacles

n	成功率/%	平均路径长度/m	平均任务时长/s
1	84	113.4	63
2	85	114.1	64
3	88	116.6	65
4	82	120.2	68
5	77	121.6	71

下载: 导出CSV

表 3 不同时间窗口参数性能对比

Table 3. Performance comparison under parameter settings of different time windows

$ \tau $/s	成功率/%	平均路径长度/m	平均任务时长/s
4	73	111.4	62
5	78	115.9	64
6	88	116.6	65
7	88	120.2	69
8	86	122.6	71

下载: 导出CSV

表 4 不同障碍物密度场景下性能对比

Table 4. Performance comparison under different obstacle density scenarios

场景	成功率/%	平均路径长度/m	平均任务时长/s
1	94	106.7	61
2	93	108.4	62
3	88	116.6	65
4	82	121.4	68
5	56	136.6	75

下载: 导出CSV

表 5 不同算法性能对比表

Table 5. Performance comparison of different algorithms

算法	成功率/%	平均路径长度/m	平均任务时长/s
VO-PPO	88	116.6	65
SAC	41	118.6	71
VO	84	124.1	63
PPO	35	121.4	77
PPO+状态解耦	37	122.6	75

下载: 导出CSV

参考文献(30)

[1]	郭银景, 鲍建康, 刘琦, 等. AUV实时避障算法研究进展[J]. 水下无人系统学报, 2020, 28(4): 351-358, 369. Guo Y J, Bao J K, Liu Q, et al. Research progress of real-time obstacle avoidance algorithms for unmanned undersea vehicle: A review[J]. Journal of Unmanned Undersea Systems, 2020, 28(4): 351-358, 369.
[2]	朱仲本, 张嘉豪, 薛祎凡, 等. 洋流环境下基于DVFH+的AUV避障控制[J]. 水下无人系统学报, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077 Zhu Z B, Zhang J H, Xue Y F, et al. Obstacle avoidance control of autonomous undersea vehicle based on DVFH+ in ocean current environment[J]. Journal of Unmanned Undersea Systems, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077
[3]	侯海平, 钱家昌, 赵楠, 等. 自主式水下航行器水下生存力关键技术[J]. 舰船科学技术, 2023, 45(11): 98-101. Hou H P, Qian J C, Zhao N, et al. Key technologies of underwater survivability of AUV[J]. Ship Science and Technology, 2023, 45(11): 98-101.
[4]	Li C, Guo S, Guo J. Study on obstacle avoidance strategy using multiple ultrasonic sensors for spherical underwater robots[J]. IEEE Sensors Journal, 2022, 22(24): 24458-24470. doi: 10.1109/JSEN.2022.3220246
[5]	Hao L Y, Dong G G, Li T S, et al. Path-following control with obstacle avoidance of autonomous surface vehicles subject to actuator faults[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 956-964. doi: 10.1109/JAS.2023.123675
[6]	Lin C, Liu Y, Lin S. An adaptive dynamic window approach for UUV obstacle avoidance planning in 3D environments[J]. Journal of Physics: Conference Series, 2024, 2704: 012026. doi: 10.1088/1742-6596/2704/1/012026
[7]	唐意成. 基于改进动态窗口法的无人艇动态避障方法研究[J]. 通信与信息技术, 2025(2): 23-27.
[8]	Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The International Journal of Robotics Research, 1998, 17(7): 760-772. doi: 10.1177/027836499801700706
[9]	Sun Y, Luo X, Ran X, et al. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons[J]. Journal of Marine Science and Engineering, 2021, 9(3): 252. doi: 10.3390/jmse9030252
[10]	Sun Y, Ran X, Zhang G, et al. AUV 3D path planning based on the improved hierarchical deep Q network[J]. Journal of Marine Science and Engineering, 2020, 8(2): 145. doi: 10.3390/jmse8020145
[11]	Pang W, Zhu D, Sun C. Multi-AUV formation reconfiguration obstacle avoidance algorithm based on affine transformation and improved artificial potential field under ocean currents disturbance[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(2): 1469-1487. doi: 10.1109/TASE.2023.3245818
[12]	张艳, 李炳华, 霍涛, 等. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564. doi: 10.16182/j.issn1004731x.joss.24-0143
[13]	Zhang W, Wei S, Teng Y, et al. Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J]. Sensors, 2017, 17: 2742. doi: 10.3390/s17122742
[14]	许文瑶, 贺继林. 基于改进速度障碍法的水下机器人动态避障[J]. 电光与控制, 2021, 28(12): 86-90.
[15]	章飞, 胡春磊. 基于滚动速度障碍法的AUV动态避障路径规划[J]. 水下无人系统学报, 2021, 29(1): 30-38. Zhang F, Hu C L. Research on AUV dynamic obstacle avoidance path planning based on the rolling speed obstacle method[J]. Journal of Unmanned Undersea Systems, 2021, 29(1): 30-38.
[16]	Wang H, Gao W, Wang Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering, 2023, 12: 63. doi: 10.3390/jmse12010063
[17]	Xu J, Huang F, Wu D, et al. A learning method for AUV collision avoidance through deep reinforcement learning[J]. Ocean Engineering, 2022, 260: 112038. doi: 10.1016/j.oceaneng.2022.112038
[18]	潘云伟, 李敏, 曾祥光, 等. 基于形状离散层的多智能体编队控制[J]. 计算机科学, 2025, 52(10): 287-295. Pan Y W, Li M, Zeng X G, et al. Multi-agent formation control based on discrete layers of formation shapes[J]. Computer Science, 2025, 52(10): 287-295.
[19]	Yuan J Y, Wang H J, Zhang H H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9: 1166. doi: 10.3390/jmse9111166
[20]	Chu Z, Wang F, Lei T, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 108-120. doi: 10.1109/TIV.2022.3153352
[21]	Gao X, Yan L, Li Z, et al. Improved deep deterministic policy gradient for dynamic obstacle avoidance of mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(6): 3675-3682. doi: 10.1109/TSMC.2022.3230666
[22]	李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422
[23]	邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767. Xing L J, Li M, Zeng X G, et al. AUV path planning based on behavior cloning and improved DQN in partially unknown environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767.
[24]	宗律, 李立刚, 贺则昊, 等. 融合速度障碍法和DQN的无人船避障方法[J]. 电子测量技术, 2024, 47(20): 60-67.
[25]	Zhu G, Shen Z, Liu L, et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm[J]. IEEE Access, 2022, 10: 121340-121351. doi: 10.1109/ACCESS.2022.3223382
[26]	蔡泽, 胡耀光, 闻敬谦, 等. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020
[27]	潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300 Pan Y W, Li M, Zeng X G, et al. AUV obstacle avoidance and path planning based on artificial potential field and improved reinforcement learning[J]. Acta Armamentarii, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300
[28]	Helbing D, Farkas I, Vicsek T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023
[29]	Almeida A, Ramalho G, Santana H, et al. Recent advances on multi-agent patrolling[C]//17th Brazilian Symposium on Artificial Intelligence, SBIA 2004, 2004: 474-483.
[30]	Litimein H, Huang Z Y, Hamza A. A survey on techniques in the circular formation of multi-agent systems[J]. Electronics, 2021, 10(23): 2959. doi: 10.3390/electronics10232959