• 中国科技核心期刊
  • Scopus收录期刊
  • DOAJ收录期刊
  • JST收录期刊
  • Euro Pub收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于VO-PPO的自主水下航行器动态避障研究

张滔 曾祥光 李敏 谢地杰 任文哲 彭倍

张滔, 曾祥光, 李敏, 等. 基于VO-PPO的自主水下航行器动态避障研究[J]. 水下无人系统学报, 2026, 34(2): 1-13 doi: 10.11993/j.issn.2096-3920.2025-0154
引用本文: 张滔, 曾祥光, 李敏, 等. 基于VO-PPO的自主水下航行器动态避障研究[J]. 水下无人系统学报, 2026, 34(2): 1-13 doi: 10.11993/j.issn.2096-3920.2025-0154
ZHANG Tao, ZENG Xiangguang, LI Min, XIE Dijie, REN Wenzhe, PENG Bei. Dynamic Obstacle Avoidance for Autonomous Underwater Vehicles via VO-PPO[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0154
Citation: ZHANG Tao, ZENG Xiangguang, LI Min, XIE Dijie, REN Wenzhe, PENG Bei. Dynamic Obstacle Avoidance for Autonomous Underwater Vehicles via VO-PPO[J]. Journal of Unmanned Undersea Systems. doi: 10.11993/j.issn.2096-3920.2025-0154

基于VO-PPO的自主水下航行器动态避障研究

doi: 10.11993/j.issn.2096-3920.2025-0154
基金项目: 国家自然科学基金项目 (52075456); 四川省科技厅重点研发计划项目 (2023YFG0285).
详细信息
    作者简介:

    张滔:张 滔(2000-), 男, 硕士, 主要研究方向为强化学习与智能控制

  • 中图分类号: U663; TP18

Dynamic Obstacle Avoidance for Autonomous Underwater Vehicles via VO-PPO

  • 摘要: 自主水下航行器(AUV)执行军事任务时, 高效、安全的动态避障能力至关重要。针对传统强化学习方法在AUV避障训练中存在碰撞风险高和收敛速度慢的缺陷, 提出了一种融合改进速度障碍(VO)法与近端策略优化(PPO)的AUV动态避障算法(VO-PPO)。该算法在传统VO框架中引入安全裕度和时间窗口机制, 提升了避障决策的安全性和高效性; 同时, 通过构建“离散检查-连续执行”的安全动作掩码, 将几何安全约束嵌入策略优化过程, 并结合状态空间解耦与多目标奖励设计, 引导策略兼顾安全性、效率和轨迹平滑性。仿真实验结果表明, 相比传统速度障碍法, VO-PPO能够生成更符合AUV运动特性的平滑避障路径; 相比基线PPO算法, 其避障成功率提高53%, 训练收敛速度加快67.5%, 累积奖励提高56.7%, 有效缓解了高碰撞风险和收敛缓慢的问题。

     

  • 图  1  AUV坐标系示意图

    Figure  1.  Schematic diagram of AUV coordinate system

    图  2  声呐模型示意图

    Figure  2.  Schematic diagram of sonar model

    图  3  速度障碍锥的构建

    Figure  3.  Construction of the speed obstacle cone

    图  4  不同情境下速度障碍锥的构建

    Figure  4.  Speed obstacle conical structures in different situations

    图  5  改进后的VO示意图

    Figure  5.  Schematic diagram of the improved VO

    图  6  VO-PPO算法框架图

    Figure  6.  The framework diagram of the VO-PPO algorithm

    图  7  解耦框架图

    Figure  7.  Decoupling framework diagram

    图  8  安全掩码在Actor网络中的应用

    Figure  8.  The Application of Security Masks in Actor

    图  9  AUV水下避障环境

    Figure  9.  AUV underwater obstacle avoidance environment

    图  10  不同最近障碍物数量的奖励图

    Figure  10.  Reward graphs with different numbers of nearest obstacles

    图  11  不同时间窗口$ \tau $下的奖励图

    Figure  11.  Reward graphs under different time Windows

    图  12  不同算法下的AUV路径图

    Figure  12.  AUV path diagrams under different algorithms

    图  13  各算法奖励图对比图

    Figure  13.  Comparison chart of reward graph with PPO algorithm

    表  1  算法主要参数设置表

    Table  1.   Hyperparameter setting table

    序号主要参数符号数值
    1折扣因子$ {\delta }_{\mathrm{buffer}} $0.99
    2GAE系数$ {\delta }_{\mathrm{buffer}} $0.95
    3裁剪参数$ {\delta }_{\mathrm{buffer}} $0.2
    4Actor学习率/1×10−4
    5Critic学习率/1×10−4
    6价值函数权重$ {\delta }_{\mathrm{buffer}} $1.0
    7熵系数$ {\delta }_{\mathrm{buffer}} $0.01
    8回合最大步长/1 000
    9最大训练步数/15×105
    10安全裕度/m$ {\delta }_{\mathrm{buffer}} $1.5
    11缓冲余量/m$ {\delta }_{\mathrm{buffer}} $4
    12批大小/8
    13步长$ n=1,2,3,4,5 $0.1
    下载: 导出CSV

    表  2  不同最近障碍物数量结果对比表

    Table  2.   Comparison results with different numbers of recent obstacles

    n成功率/%平均路径长度/m平均任务时长/s
    184113.463
    285114.164
    388116.665
    482120.268
    577121.671
    下载: 导出CSV

    表  3  不同时间窗口下的对比表

    Table  3.   Comparison tables under different time Windows

    $ \tau $/s成功率/%平均路径长度/m平均任务时长/s
    473111.462
    578115.964
    688116.665
    788120.269
    886122.671
    下载: 导出CSV

    表  4  不同场景下算法表现情况表

    Table  4.   Comparison table with the results of traditional algorithms

    场景成功率/%平均路径长度/m平均任务时长/s
    194106.761
    293108.462
    388116.665
    482121.468
    556136.675
    下载: 导出CSV

    表  5  算法结果对比表

    Table  5.   Comparison table of algorithm results

    算法成功率/%平均路径长度/m平均任务时长/s
    VO-PPO88116.665
    SAC41118.671
    VO84124.163
    PPO35121.477
    PPO-Decoupled37122.675
    下载: 导出CSV
  • [1] 郭银景, 鲍建康, 刘琦, 等. AUV实时避障算法研究进展[J]. 水下无人系统学报, 2020, 28(4): 351-358,369.

    Guo Y J, Bao J K, Liu Q, et al. Research progress of real-time obstacle avoidance algorithms for unmanned undersea vehicle: A review[J]. Journal of Unmanned Undersea Systems, 2020, 28(4): 351-358,369.
    [2] 朱仲本, 张嘉豪, 薛祎凡, 等. 洋流环境下基于DVFH+的AUV避障控制[J]. 水下无人系统学报, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077

    Zhu Z B, Zhang J H, Xue Y F, et al. Obstacle avoidance control of autonomous undersea vehicle based on DVFH+ in ocean current environment[J]. Journal of Unmanned Undersea Systems, 2025, 33(1): 15-23. doi: 10.11993/j.issn.2096-3920.2024-0077
    [3] 侯海平, 钱家昌, 赵楠, 等. 自主式水下航行器水下生存力关键技术[J]. 舰船科学技术, 2023, 45(11): 98-101.

    HOU H P, QIAN J C, ZHAO N, et al. Key technologies of underwater survivability of AUV[J]. Ship Science and Technology, 2023, 45(11): 98-101.
    [4] LI C, GUO S, GUO J. Study on obstacle avoidance strategy using multiple ultrasonic sensors for spherical underwater robots[J]. IEEE Sensors Journal, 2022, 22(24): 24458-24470. doi: 10.1109/JSEN.2022.3220246
    [5] Hao L Y, Dong G G, Li T S, et al. Path-following control with obstacle avoidance of autonomous surface vehicles subject to actuator faults[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 956-964. doi: 10.1109/JAS.2023.123675
    [6] LIN C, LIU Y, LIN S. An adaptive dynamic window approach for UUV obstacle avoidance planning in 3D environments[J]. Journal of Physics: Conference Series, 2024, 2704: 012026. doi: 10.1088/1742-6596/2704/1/012026
    [7] 唐意成. 基于改进动态窗口法的无人艇动态避障方法研究[J]. 通信与信息技术, 2025(2): 23-27.

    Tang Y C. Research on dynamic obstacle avoidance method of unmanned surface vehicle on improved dynamic window approach.[J]. Communication & Information Technology, 2025(2): 23-27.
    [8] Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The Inter-national Journal of Robotics Research, 1998, 17(7): 760-772. doi: 10.1177/027836499801700706
    [9] Sun Y, Luo X, Ran X, et al. A 2D Optimal Path Planning Algorithm for Autonomous Underwater Vehicle Driving in Unknown Underwater Canyons[J]. Journal of Marine Science and Engineering, 2021, 9(3): 252. doi: 10.3390/jmse9030252
    [10] Sun Y, Ran X, Zhang G, et al. AUV 3D path planning based on the improved hierarchical deep Q network[J]. Journal of Marine Science and Engineering, 2020, 8(2): 145. doi: 10.3390/jmse8020145
    [11] Pang W, Zhu D, Sun C. Multi-AUV formation reconfiguration obstacle avoidance algorithm based on affine transformation and improved artificial potential field under ocean currents disturbance[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(2): 1469-1487. doi: 10.1109/TASE.2023.3245818
    [12] 张艳, 李炳华, 霍涛, 等. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564. doi: 10.16182/j.issn1004731x.joss.24-0143

    Zhang Y, Li B H, Huo T, et al. Research on robot dynamic obstacle avoidance method based on improved A* and Dynamic window algorithm[J]. Journal of System Simulation, 2025, 37(6): 1555-1564 . doi: 10.16182/j.issn1004731x.joss.24-0143
    [13] Zhang W, Wei S, Teng Y, et al. Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J]. Sensors, 2017, 17: 2742. doi: 10.3390/s17122742
    [14] 许文瑶, 贺继林. 基于改进速度障碍法的水下机器人动态避障[J]. 电光与控制, 2021, 28(12): 86-90.

    Xu W Y, He J L. Dynamic obstacle avoidance for ROV based on improved velocity obstacle method[J]. Electronics Optics & Control, 2021, 28(12): 86-90.
    [15] 章飞, 胡春磊. 基于滚动速度障碍法的AUV动态避障路径规划[J]. 水下无人系统学报, 2021, 29(1): 30.

    Zhang F, Hu C L. Research on AUV dynamic obstacle avoidance path planning based on the rolling speed obstacle method[J]. Journal of Unmanned Undersea Systems, 2021, 29(1): 30–38.
    [16] Wang H, Gao W, Wang Z, et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering, 2023, 12: 63. doi: 10.3390/jmse12010063
    [17] Xu J, Huang F, Wu D, et al. A learning method for AUV collision avoidance through deep reinforcement learning[J]. Ocean Engineering, 2022, 260: 112038. doi: 10.1016/j.oceaneng.2022.112038
    [18] 潘云伟, 李敏, 曾祥光, 等. 基于形状离散层的多智能体编队控制[J]. 计算机科学, 2025, 52(10): 287-295.

    Pan Y W, Li M, Zeng X G, et al. Multi-agent formation control based on discrete layers of formation shapes[J]. Computer Science, 2025, 52(10): 287-295.
    [19] Jianya Y, Wang H, Zhang H, et al. AUV obstacle avoidance planning based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9: 1166. doi: 10.3390/jmse9111166
    [20] Chu Z, Wang F, Lei T, et al. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 108-120. doi: 10.1109/TIV.2022.3153352
    [21] Gao X, Yan L, Li Z, et al. Improved deep deterministic policy gradient for dynamic obstacle avoidance of mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(6): 3675-3682. doi: 10.1109/TSMC.2022.3230666
    [22] 李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422

    Li M, Ye W Z, Yan J H. Path planning of desert robot based on deep reinforcement learning[J]. Journal of System Simulation, 2024, 36(12): 2917-2925. doi: 10.16182/j.issn1004731x.joss.23-1422
    [23] 邢丽静, 李敏, 曾祥光, 等. 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767.

    Xing L J, Li M, Zeng X G, et al. AUV path planning based on behavior cloning and improved DQN in partially unknown environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767.
    [24] 宗律, 李立刚, 贺则昊, 等. 融合速度障碍法和DQN的无人船避障方法[J]. 电子测量技术, 2024, 47(20): 60-67.

    Zong L, Li L G, He Z H, et al. Obstacle avoidance method for USV combining velocity obstacle method and DQN[J]. Electronic Measurement Technology, 2024, 47(20): 60-67.
    [25] Zhu G, Shen Z, Liu L, et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm[J]. IEEE Access, 2022, 10: 121340-121351. doi: 10.1109/ACCESS.2022.3223382
    [26] 蔡泽, 胡耀光, 闻敬谦, 等. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020

    Cai Z, Hu Y G, Wen J Q, et al. AGV obstacle avoidance method based on deep reinforcement learning in complex dynamic environment[J]. Computer Integrated Manufacturing Systems, 2023, 29(1): 236-245. doi: 10.13196/j.cims.2023.01.020
    [27] 潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300

    Pan Y W, Li M, Zeng X G, et al. AUV obstacle avoidance and path planning based on artificial potential field and improved reinforcement learning[J]. Acta Armamentarii, 2025, 46(4): 72-83. doi: 10.12382/bgxb.2024.0300
    [28] Helbing D, Farkas I, Vicsek T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023
    [29] Almeida A, Ramalho G, Santana H, et al. Recent advances on multi-agent patrolling[C]//17th Brazilian Symposium on Artificial Intelligence, SBIA 2004. 2004: 474-483.
    [30] Litimein H, Huang Z Y, Hamza A. A survey on techniques in the circular formation of multi-agent systems[J]. Electronics, 2021, 10(23): 2959. doi: 10.3390/electronics10232959
  • 加载中
计量
  • 文章访问数:  24
  • HTML全文浏览量:  15
  • PDF下载量:  12
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-11
  • 修回日期:  2025-12-12
  • 录用日期:  2025-12-24
  • 网络出版日期:  2026-03-16
图(13) / 表(5)

目录

    /

    返回文章
    返回
    服务号
    订阅号