• 中国科技核心期刊
  • JST收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于神经网络状态估计器的高速AUV强化学习控制

郭可建 林晓波 郝程鹏 侯朝焕

郭可建, 林晓波, 郝程鹏, 等. 基于神经网络状态估计器的高速AUV强化学习控制[J]. 水下无人系统学报, 2022, 30(2): 147-156 doi: 10.11993/j.issn.2096-3920.2022.02.002
引用本文: 郭可建, 林晓波, 郝程鹏, 等. 基于神经网络状态估计器的高速AUV强化学习控制[J]. 水下无人系统学报, 2022, 30(2): 147-156 doi: 10.11993/j.issn.2096-3920.2022.02.002
GUO Ke-jian, LIN Xiao-bo, HAO Cheng-peng, HOU Chao-huan. Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator[J]. Journal of Unmanned Undersea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002
Citation: GUO Ke-jian, LIN Xiao-bo, HAO Cheng-peng, HOU Chao-huan. Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator[J]. Journal of Unmanned Undersea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002

基于神经网络状态估计器的高速AUV强化学习控制

doi: 10.11993/j.issn.2096-3920.2022.02.002
基金项目: 国家自然科学基金项目资助(61971412).
详细信息
    作者简介:

    郭可建(1997-), 男, 硕士, 主要研究方向为高速水下航行器控制

  • 中图分类号: U674.941; U661

Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator

  • 摘要: 随着海洋研究与开发的日益扩大, 高速自主水下航行器(AUV)作为重要的无人水下工作平台受到广泛关注。然而由于其模型具有多输入多输出、强耦合欠驱动以及强非线性特性, 因此依赖精确模型的传统控制方法在实际应用中常受到限制。针对此问题, 文中提出一种不依赖精确模型的强化学习位姿控制器, 该控制器通过姿态环和位置环的配合不仅可以实现高速AUV的快速姿态稳定, 还可以更快地完成下潜到指定深度的动作; 同时, 为了降低获取用于训练强化学习控制器数据的成本, 结合神经网络技术提出了一种改进的高速AUV状态估计器, 该估计器可以在已知当前时刻AUV的状态以及所受控制量的情况下估计出下一时刻的状态, 从而为强化学习控制方法提供大量的训练数据。仿真实验结果表明, 估计器达到了较高的估计精度, 基于神经网络状态估计器训练得到的强化学习控制器可以完成原AUV的平稳快速控制, 从而验证了所提方法的可行性及有效性。

     

  • 图  1  AUV运动坐标系

    Figure  1.  Motion coordinate system of the AUV

    图  2  估计器姿态网络

    Figure  2.  The attitude network of estimator

    图  3  估计器速度网络

    Figure  3.  The velocity network of estimator

    图  4  双环强化学习控制器结构图

    Figure  4.  Structure of the double-loop controller with reinforcement learning

    图  5  基于原模型和估计器的姿态控制曲线

    Figure  5.  The attitude control curves of the original model and the estimator

    图  6  基于原模型和估计器的角速度控制曲线

    Figure  6.  The angular speed control curves of the original model and the estimator

    图  7  基于原模型和估计器的速度控制曲线

    Figure  7.  The velocity control curves of the original model and the estimator

    图  8  控制器在训练过程中所获奖励值

    Figure  8.  The reward values during the training process of the controller

    图  9  AUV运行速度曲线

    Figure  9.  The velocity curves during the AUV running

    图  10  双环补偿控制器下的AUV姿态控制曲线

    Figure  10.  The attitude curves of the AUV controlled by the double-loop complementary controller

    图  11  基于双环补偿控制器与PID控制器的AUV深度控制曲线

    Figure  11.  The depth curves of the AUV controlled by the double-loop complementary controller and the PID controller

    表  1  高速AUV参考模型参数

    Table  1.   Parameters of the reference model of the high speed AUV

    参数名参数值参数名参数值
    $G/({\rm{N}}\cdot {\rm{kg}}^{-1})$ 9.8 $C_y^{{\delta _e}}$ 0.51
    $B/{\rm{N}}$ 14 671 $C_y^\alpha $ 2.32
    $m/{\rm{kg}}$ 1 840 $C_y^{{{\bar \omega }_z}}$ 1.17
    $\rho /({\rm{kg}} \cdot {{\rm{m}}^{ - 3}})$ 1 019.2 $m_y^\beta $ 0.69
    $S/{{\rm{m}}^2}$ 0.224 $m_y^{{\delta _r}}$ −0.11
    $L/{\rm{m}}$ 7.738 $m_y^{{{\bar \omega }_x}}$ 0
    $T /{\rm{N}}$ 10 672 $m_y^{{{\bar \omega }_y}}$ −0.61
    ${C_{xS}}$ 0.141 $m_z^{{\delta _r}}$ −0.20
    $m_x^\beta $ 0.001 52 $m_z^\beta $ −2.32
    $m_x^{{\delta _r}}$ −0.000 32 $m_z^{{{\bar \omega }_y}}$ −1.17
    $m_x^{{\delta _d}}$ −0.081 2 $m_z^\alpha $ 0.69
    $m_x^{{{\bar \omega }_x}}$ −0.004 4 $m_z^{{\delta _e}}$ −0.28
    $m_x^{{{\bar \omega }_y}}$ 0.000 8 $m_z^{{{\bar \omega }_z}}$ −0.61
    $\Delta {M_{xp}}$ 0
    下载: 导出CSV
  • [1] 杜度. 基于RBF神经网络参数自整定的AUV深度控制[J]. 水下无人系统学报, 2019, 27(3): 284-289.

    Du Du. Parameters Self-Tuning for Depth Control of AUV Based on RBF Neural Network[J]. Journal of Unmanned Undersea System, 2019, 27(3): 284-289.
    [2] 李鑫, 黄茹楠, 丁宁. 输入受限的自主水下航行器自适应反演控制[J]. 水下无人系统学报, 2019, 27(6): 624-628.

    Li Xin, Huang Ru-nan, Ding Ning. Adaptive Backstepping Control of Autonomous Undersea Vehicle with Input Limitation[J]. Journal of Unmanned Undersea System, 2019, 27(6): 624-628.
    [3] Chen W, Wei Y, Zeng J. Back-stepping Control of Underactuated AUV’s Depth based on Nonlinear Disturbance Observer[C]//2015 34th Chinese Control Conference(CCC). Hangzhou, China: IEEE, 2015: 6061-6065.
    [4] Wang H J, Chen Z Y, Jia H M, et al. NN-Backstepping for Diving Control of an Underactuated AUV[C]//2011 Oceans’11 MTS/IEEE KONA. Waikoloa, HI, USA: IEEE, 2011: 1-6.
    [5] Hu B, Tian H, Qian J, et al. A Fuzzy-PID Method to Improve the Depth Control of AUV[C]//2013 IEEE International Conference on Mechatronics and Automation. Takamatsu, Japan: IEEE, 2013: 1528-1533.
    [6] Liu W, Ding X, Wan J, et al. An Effective Motion Control Based on 2-DOF PID and ELM for AUV[C]//2018 OCEANS 2018 MTS/IEEE Charleston. Charleston, SC, USA: IEEE, 2018: 1-4.
    [7] 吕建国, 王育才, 崔昊. 基于LQR方法的水下航行器热动力推进系统控制研究[J]. 弹箭与制导学报, 2007, 27(1): 174-176. doi: 10.3969/j.issn.1673-9728.2007.01.053

    Lü Jian-guo, Wang Yu-cai, Cui Hao. Research of Control for Propulsion System of Thermal Power Underwater Vehicle Based on LQR[J]. Journal of Projectiles, Rockets, Missiles, and Guidance, 2007, 27(1): 174-176. doi: 10.3969/j.issn.1673-9728.2007.01.053
    [8] Lakhwani D A, Adhyaru D M. Performance Comparison of PD, PI and LQR controller of Autonomous under water vehicle[C]//2013 Nirma University International Conference on Engineering(NUiCONE). Ahmedabad, India: IEEE, 2013: 1-6.
    [9] 赵旭, 龚时华, 杨进. 基于LMI的无人水下航行器干扰补偿控制[J]. 水下无人系统学报, 2020, 28(3): 271-277.

    Zhao Xu, Gong Shi-hua, Yang Jin. Disturbance Compensation Control for Unmanned Undersea Vehicle Based on LMI[J]. Journal of Unmanned Undersea System, 2020, 28(3): 271-277.
    [10] Makavita C D, Nguyen H D, Jayasinghe S G, et al. Predictor-Based Model Reference Adaptive Control of an Unmanned Underwater Vehicle[C]//2016 14th International Conference on Control, Automation, Robotics and Vision. Phuket, Thailand: IEEE, 2016: 1-7.
    [11] Nayak N, Das P, Das S R. Heading Plane Control of an Autonomous Underwater Vehicle: A Novel Fuzzy and Model Reference Adaptive Control Approach[C]//2020 Third International Conference on Advances in Electronics, Computers and Communications(ICAECC). Bengaluru, India: IEEE, 2020: 1-5.
    [12] Riedmiller M. Neural Fitted Q Iteration-First Experiences with a Data Efficient Neural Reinforcement Learning method[C]//2005 Machine Learning: ECML 2005. Oporto, Portugal: ECML, 2005: 317-328.
    [13] Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines[C]//1997 11th Annual Conference on Neural Information Processing Systems(NIPS). Denver, CO: Massachusetts Institute of Technology Press, 1998: 1043-1049.
    [14] Xu X, Hu D, Lu X. Kernel-based Least Squares Policy Iteration for Reinforcement Learning[J]. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992. doi: 10.1109/TNN.2007.899161
    [15] Barto A G, Sutton R S, Anderson C W. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control-Problems[J]. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834-846.
    [16] Konda V R, Tsitsiklis J N. Actor-Critic Algorithms[C]//Advances in Neural Information Processing Systems 12. Cambridge: Mit Press, 2000: 1008-1014.
    [17] Peters J, Vijayakumar S, Schaal S. Natural Actor-Critic[J]. Neurocomputing 2008, 71(7-9): 1180-1190.
    [18] Vamvoudakis K G, Lewis F L. Online Actor Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem[C]//2009 International Joint Conference on Neural Networks. New York: IEEE, 2009: 58-65.
    [19] Bhopale P, Kazif, Singh N. Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle[J]. Journal of Marine Science and Application, 2019, 18(2): 228-238. doi: 10.1007/s11804-019-00089-3
    [20] Che G F, Yu Z. Neural-Network Estimators Based Fault-Tolerant Tracking Control for AUV via ADP with Rudders Faults and Ocean Current Disturbance[J]. Neurocomputing, 2020, 411: 442-454. doi: 10.1016/j.neucom.2020.06.026
    [21] Wang D, Shen Y, Sha W, et al. Adaptive DDPG Design-Based Sliding-Mode Control for Autonomous Underwater Vehicles at Different Speeds[C]//2019 IEEE Underwater Technology(UT). Kaohsiung, Taiwan: IEEE, 2019: 1-5.
    [22] Wang S, Su Y M, Wang Z L, et al. Numerical and Experimental Analysis of Transverse Static Stability Loss of Planning Craft Sailing at High Forward Speed[J]. Engineering Applications of Computational Fluid Mechanics, 2014, 8(1): 44-54. doi: 10.1080/19942060.2014.11015496
    [23] Wang S X, Sun X J, Wang Y H, et al. Dynamic Modeling and Motion Simulation for a Winged Hybrid-Driven Underwater Glider[J]. China Ocean Engineering, 2011, 25(1): 97-112. doi: 10.1007/s13344-011-0008-7
    [24] Tiano A, Sutton R, Lozowicki A, et al. Observer Kalman Filter Identification of An Autonomous Underwater Vehicle[J]. Control Engineering Practice, 2007, 15(6): 727-739. doi: 10.1016/j.conengprac.2006.08.004
    [25] Dantas J L D, Barros E D A. Numerical Analysis of Control Surface Effects on AUV Manoeuvrability[J]. Applied Ocean Research, 2013, 42: 168-181. doi: 10.1016/j.apor.2013.06.002
    [26] Martin S C, Whitcomb L L. Preliminary Experiments in Comparative Experimental Identification of Six Degree-Of-Freedom Coupled Dynamic Plant Models for Underwater Robot Vehicles[C]//2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany: IEEE, 2013: 2962-2969.
    [27] Zheng X W, Wang W, Xiong M L, et al. Online State Estimation of a Fin-Actuated Underwater Robot Using Artificial Lateral Line System[J]. IEEE Transactions on Robotics, 2020, 36(2): 472-487. doi: 10.1109/TRO.2019.2956343
    [28] Jagannathan S, Galan G. One-Layer Neural-Network Controller with Preprocessed Inputs for Autonomous Underwater Vehicles[J]. IEEE Transactions on Vehicular Technology, 2003, 52(5): 1342-1355. doi: 10.1109/TVT.2003.816611
    [29] Duan K R, Fong S, Chen C L P. Multilayer Neural Networks-Based Control of Underwater Vehicles with Uncertain Dynamics And Disturbances[J]. Nonlinear Dynamics, 2020, 100(4): 3555-3573. doi: 10.1007/s11071-020-05720-5
    [30] Lin X B, Yu Y, Sun C Y. Supplementary Reinforcement Learning Controller Designed for Quadrotor UAVs[J]. IEEE Access, 2019, 7: 26422-26431. doi: 10.1109/ACCESS.2019.2901295
    [31] 王超, 胡志强, 衣瑞文, 等. 高速水下机器人通气空化减阻技术的水洞实验研究[J]. 机器人, 2018, 40(6): 779-785.

    Wang Chao, Hu Zhi-qiang, Yi Rui-wen, et al. Water Tunnel Experiment Research of Ventilated Cavitation Drag Reduction Technology for a High Speed AUV[J]. Robot, 2018, 40(6): 779-785.
    [32] 严卫生, 徐德民, 李俊, 等. 远程自主水下航行器建模研究[J]. 西北工业大学学报, 2004, 22(4): 500-504. doi: 10.3969/j.issn.1000-2758.2004.04.023

    Yan Wei-sheng, Xu De-min, Li Jun, et al. A New Method for Modeling Long Distance Autonomous Underwater Vehicle(AUV)[J]. Journal of Northwestern Polytechnical University, 2004, 22(4): 500-504. doi: 10.3969/j.issn.1000-2758.2004.04.023
    [33] Willy C J. Attitude Control of An Underwater Vehicle Subjected to Waves[D]. Massachusetts Ave, Cambridge: Massachusetts Institute of Technology, 1994.
    [34] Guo K J, Lin X B, Hao C P, et al. An Improved State Estimator for High-Speed AUV with NN[C]//2021 40th Chinese Control Conference. Shanghai, China: IEEE, 2021.
  • 加载中
图(11) / 表(1)
计量
  • 文章访问数:  3676
  • HTML全文浏览量:  28
  • PDF下载量:  61
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-22
  • 修回日期:  2021-08-03

目录

    /

    返回文章
    返回
    服务号
    订阅号