Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator

GUO Ke-jian; LIN Xiao-bo; HAO Cheng-peng; HOU Chao-huan

doi:10.11993/j.issn.2096-3920.2022.02.002

Volume 30 Issue 2

Apr 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Unmanned Undersea Systems > 2022 > 30(2): 147-156

GUO Ke-jian, LIN Xiao-bo, HAO Cheng-peng, HOU Chao-huan. Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator[J]. Journal of Unmanned Undersea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002

Citation:

GUO Ke-jian, LIN Xiao-bo, HAO Cheng-peng, HOU Chao-huan. Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator[J]. Journal of Unmanned Undersea Systems, 2022, 30(2): 147-156. doi: 10.11993/j.issn.2096-3920.2022.02.002

Citation:

PDF( 1658 KB)

Reinforcement-Learning Control for the High-Speed AUV Based on the Neural-Network State Estimator

doi: 10.11993/j.issn.2096-3920.2022.02.002

1.
School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China
2.
China Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China

Received Date: 2021-06-22
Rev Recd Date: 2021-08-03

Abstract

Abstract

With the development of ocean research and exploitation, high-speed autonomous undersea vehicle(AUV) has attracted increasing attention as important unmanned underwater platforms. However, the high-speed AUV model is multi-input-multi-output(MIMO), strong-coupling, underactuated, and strongly nonlinear; therefore, the traditional control method that relies on the exact model is often limited in practical applications. To address these problems, a position-attitude controller based on reinforcement learning(RL) that does not rely on an exact model is proposed. The RL controller can not only regulate the attitude of the AUV but also the driver, as it reaches the target depth faster with the aid of the attitude and position loops. An improved state estimator of a high-speed AUV is designed based on a neural network(NN) to decrease the cost of collecting data, which is employed to train the RL controller. The improved state estimator can estimate the state at the next time instant according to the current state of the high-speed AUV and the control input. The simulation results demonstrate that the NN-state-estimator can estimate the state of a high-speed AUV with high precision, and the RL controller trained by the estimator achieves fast and steady performance, which verifies the feasibility and effectiveness of the proposed method. .
- autonomous undersea vehicle,
- reinforcement learning,
- neural network,
- state estimation

FullText(HTML)

References(34)

References

[1]	杜度. 基于RBF神经网络参数自整定的AUV深度控制[J]. 水下无人系统学报, 2019, 27(3): 284-289. Du Du. Parameters Self-Tuning for Depth Control of AUV Based on RBF Neural Network[J]. Journal of Unmanned Undersea System, 2019, 27(3): 284-289.
[2]	李鑫, 黄茹楠, 丁宁. 输入受限的自主水下航行器自适应反演控制[J]. 水下无人系统学报, 2019, 27(6): 624-628. Li Xin, Huang Ru-nan, Ding Ning. Adaptive Backstepping Control of Autonomous Undersea Vehicle with Input Limitation[J]. Journal of Unmanned Undersea System, 2019, 27(6): 624-628.
[3]	Chen W, Wei Y, Zeng J. Back-stepping Control of Underactuated AUV’s Depth based on Nonlinear Disturbance Observer[C]//2015 34th Chinese Control Conference(CCC). Hangzhou, China: IEEE, 2015: 6061-6065.
[4]	Wang H J, Chen Z Y, Jia H M, et al. NN-Backstepping for Diving Control of an Underactuated AUV[C]//2011 Oceans’11 MTS/IEEE KONA. Waikoloa, HI, USA: IEEE, 2011: 1-6.
[5]	Hu B, Tian H, Qian J, et al. A Fuzzy-PID Method to Improve the Depth Control of AUV[C]//2013 IEEE International Conference on Mechatronics and Automation. Takamatsu, Japan: IEEE, 2013: 1528-1533.
[6]	Liu W, Ding X, Wan J, et al. An Effective Motion Control Based on 2-DOF PID and ELM for AUV[C]//2018 OCEANS 2018 MTS/IEEE Charleston. Charleston, SC, USA: IEEE, 2018: 1-4.
[7]	吕建国, 王育才, 崔昊. 基于LQR方法的水下航行器热动力推进系统控制研究[J]. 弹箭与制导学报, 2007, 27(1): 174-176. doi: 10.3969/j.issn.1673-9728.2007.01.053 Lü Jian-guo, Wang Yu-cai, Cui Hao. Research of Control for Propulsion System of Thermal Power Underwater Vehicle Based on LQR[J]. Journal of Projectiles, Rockets, Missiles, and Guidance, 2007, 27(1): 174-176. doi: 10.3969/j.issn.1673-9728.2007.01.053
[8]	Lakhwani D A, Adhyaru D M. Performance Comparison of PD, PI and LQR controller of Autonomous under water vehicle[C]//2013 Nirma University International Conference on Engineering(NUiCONE). Ahmedabad, India: IEEE, 2013: 1-6.
[9]	赵旭, 龚时华, 杨进. 基于LMI的无人水下航行器干扰补偿控制[J]. 水下无人系统学报, 2020, 28(3): 271-277. Zhao Xu, Gong Shi-hua, Yang Jin. Disturbance Compensation Control for Unmanned Undersea Vehicle Based on LMI[J]. Journal of Unmanned Undersea System, 2020, 28(3): 271-277.
[10]	Makavita C D, Nguyen H D, Jayasinghe S G, et al. Predictor-Based Model Reference Adaptive Control of an Unmanned Underwater Vehicle[C]//2016 14th International Conference on Control, Automation, Robotics and Vision. Phuket, Thailand: IEEE, 2016: 1-7.
[11]	Nayak N, Das P, Das S R. Heading Plane Control of an Autonomous Underwater Vehicle: A Novel Fuzzy and Model Reference Adaptive Control Approach[C]//2020 Third International Conference on Advances in Electronics, Computers and Communications(ICAECC). Bengaluru, India: IEEE, 2020: 1-5.
[12]	Riedmiller M. Neural Fitted Q Iteration-First Experiences with a Data Efficient Neural Reinforcement Learning method[C]//2005 Machine Learning: ECML 2005. Oporto, Portugal: ECML, 2005: 317-328.
[13]	Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines[C]//1997 11th Annual Conference on Neural Information Processing Systems(NIPS). Denver, CO: Massachusetts Institute of Technology Press, 1998: 1043-1049.
[14]	Xu X, Hu D, Lu X. Kernel-based Least Squares Policy Iteration for Reinforcement Learning[J]. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992. doi: 10.1109/TNN.2007.899161
[15]	Barto A G, Sutton R S, Anderson C W. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control-Problems[J]. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834-846.
[16]	Konda V R, Tsitsiklis J N. Actor-Critic Algorithms[C]//Advances in Neural Information Processing Systems 12. Cambridge: Mit Press, 2000: 1008-1014.
[17]	Peters J, Vijayakumar S, Schaal S. Natural Actor-Critic[J]. Neurocomputing 2008, 71(7-9): 1180-1190.
[18]	Vamvoudakis K G, Lewis F L. Online Actor Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem[C]//2009 International Joint Conference on Neural Networks. New York: IEEE, 2009: 58-65.
[19]	Bhopale P, Kazif, Singh N. Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle[J]. Journal of Marine Science and Application, 2019, 18(2): 228-238. doi: 10.1007/s11804-019-00089-3
[20]	Che G F, Yu Z. Neural-Network Estimators Based Fault-Tolerant Tracking Control for AUV via ADP with Rudders Faults and Ocean Current Disturbance[J]. Neurocomputing, 2020, 411: 442-454. doi: 10.1016/j.neucom.2020.06.026
[21]	Wang D, Shen Y, Sha W, et al. Adaptive DDPG Design-Based Sliding-Mode Control for Autonomous Underwater Vehicles at Different Speeds[C]//2019 IEEE Underwater Technology(UT). Kaohsiung, Taiwan: IEEE, 2019: 1-5.
[22]	Wang S, Su Y M, Wang Z L, et al. Numerical and Experimental Analysis of Transverse Static Stability Loss of Planning Craft Sailing at High Forward Speed[J]. Engineering Applications of Computational Fluid Mechanics, 2014, 8(1): 44-54. doi: 10.1080/19942060.2014.11015496
[23]	Wang S X, Sun X J, Wang Y H, et al. Dynamic Modeling and Motion Simulation for a Winged Hybrid-Driven Underwater Glider[J]. China Ocean Engineering, 2011, 25(1): 97-112. doi: 10.1007/s13344-011-0008-7
[24]	Tiano A, Sutton R, Lozowicki A, et al. Observer Kalman Filter Identification of An Autonomous Underwater Vehicle[J]. Control Engineering Practice, 2007, 15(6): 727-739. doi: 10.1016/j.conengprac.2006.08.004
[25]	Dantas J L D, Barros E D A. Numerical Analysis of Control Surface Effects on AUV Manoeuvrability[J]. Applied Ocean Research, 2013, 42: 168-181. doi: 10.1016/j.apor.2013.06.002
[26]	Martin S C, Whitcomb L L. Preliminary Experiments in Comparative Experimental Identification of Six Degree-Of-Freedom Coupled Dynamic Plant Models for Underwater Robot Vehicles[C]//2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany: IEEE, 2013: 2962-2969.
[27]	Zheng X W, Wang W, Xiong M L, et al. Online State Estimation of a Fin-Actuated Underwater Robot Using Artificial Lateral Line System[J]. IEEE Transactions on Robotics, 2020, 36(2): 472-487. doi: 10.1109/TRO.2019.2956343
[28]	Jagannathan S, Galan G. One-Layer Neural-Network Controller with Preprocessed Inputs for Autonomous Underwater Vehicles[J]. IEEE Transactions on Vehicular Technology, 2003, 52(5): 1342-1355. doi: 10.1109/TVT.2003.816611
[29]	Duan K R, Fong S, Chen C L P. Multilayer Neural Networks-Based Control of Underwater Vehicles with Uncertain Dynamics And Disturbances[J]. Nonlinear Dynamics, 2020, 100(4): 3555-3573. doi: 10.1007/s11071-020-05720-5
[30]	Lin X B, Yu Y, Sun C Y. Supplementary Reinforcement Learning Controller Designed for Quadrotor UAVs[J]. IEEE Access, 2019, 7: 26422-26431. doi: 10.1109/ACCESS.2019.2901295
[31]	王超, 胡志强, 衣瑞文, 等. 高速水下机器人通气空化减阻技术的水洞实验研究[J]. 机器人, 2018, 40(6): 779-785. Wang Chao, Hu Zhi-qiang, Yi Rui-wen, et al. Water Tunnel Experiment Research of Ventilated Cavitation Drag Reduction Technology for a High Speed AUV[J]. Robot, 2018, 40(6): 779-785.
[32]	严卫生, 徐德民, 李俊, 等. 远程自主水下航行器建模研究[J]. 西北工业大学学报, 2004, 22(4): 500-504. doi: 10.3969/j.issn.1000-2758.2004.04.023 Yan Wei-sheng, Xu De-min, Li Jun, et al. A New Method for Modeling Long Distance Autonomous Underwater Vehicle(AUV)[J]. Journal of Northwestern Polytechnical University, 2004, 22(4): 500-504. doi: 10.3969/j.issn.1000-2758.2004.04.023
[33]	Willy C J. Attitude Control of An Underwater Vehicle Subjected to Waves[D]. Massachusetts Ave, Cambridge: Massachusetts Institute of Technology, 1994.
[34]	Guo K J, Lin X B, Hao C P, et al. An Improved State Estimator for High-Speed AUV with NN[C]//2021 40th Chinese Control Conference. Shanghai, China: IEEE, 2021.