Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing

doi:10.23940/ijpe.22.07.p1.463474

Abstract

Abstract: In practice, classical path optimization algorithms performs poorly when applied to an unknown environment, swarm intelligence algorithms need further improvement in agility and accuracy to avoid a moving object in dynamic environment, and reinforcement learning algorithm, a usual solution adopted in machine learning, may give rise to curse of dimensionality due to the complexity of scenario. In view of aforesaid practical problems, this paper proposes using MAXQ hierarchical reinforcement learning method to achieve dimensionality reduction by abstraction and combining leader-wingman approach with dynamic dead zone to model after cooperative formation and design triangular form. A novel algorithm based on MAXQ and simulated annealing is designed to solve unmanned aerial vehicle (UAV) path planning problem, which accomplishes grid method-based path planning simulation in 2D scenarios. A comparative analysis is performed on Q-Learning, ε-Q-Learning, standard MAXQ and SA-MAXQ algorithms in terms of their convergence, time consumption and search steps. Moreover, leader-wingman method is combined with dynamic dead zone in modelling triangular form for Multi-UAV collaborative formation. The experimental results indicate SA-MAXQ algorithm yields quicker astringence, lower volatility, better learning effect, less time consumed and optimized searched route.

Key words: path planning, UAV collaboration, MAXQ hierarchical reinforcement learning, simulated annealing

Yuting Cheng, Dongcheng Li, W. Eric Wong, Man Zhao, and Dengfeng Mo. Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing [J]. Int J Performability Eng, 2022, 18(7): 463-474.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1. Aggarwal, S. and Kumar, N.Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Computer Communications, 149, pp.270-299, 2020.
2. Hart, P.E., Nilsson, N.J. and Raphael, B.A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2), pp.100-107, 1968.
3. Wang Q.Research on rapidly-exploring random trees based global path planning and its application. National University of Defense Technology, 2014.
4. Zhao, Y., Zheng, Z. and Liu, Y.Survey on computational-intelligence-based UAV path planning. Knowledge-Based Systems, 158, pp.54-64, 2018.
5. Shibata, T. and Fukuda, T.Robotic motion planning by genetic algorithm with fuzzy critic. Transactions of the Society of Instrument and Control Engineers, 30(3), pp.337-344, 1994.
6. Karaboga D.An idea based on honey bee swarm for numerical optimization (Vol. 200, pp. 1-10). Technical report-tr06, Erciyes university, engineering faculty, computer engineering department, 2005.
7. Xia, C. and Yudi, A.Multi—UAV path planning based on improved neural network. In 2018 Chinese Control And Decision Conference (CCDC) (pp. 354-359). IEEE, 2018.
8. Zhao M., Lu H., Yang S. and Guo F.The experience-memory Q-learning algorithm for robot path planning in unknown environment. IEEE Access, 8, pp.47824-47844, 2020.
9. Gautam, S.A. and Verma, N.Path planning for unmanned aerial vehicle based on genetic algorithm & artificial neural network in 3D. In 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC) (pp. 1-5). IEEE, 2014.
10. Zhang B., Mao Z., Liu W. and Liu J.Geometric reinforcement learning for path planning of UAVs. Journal of Intelligent & Robotic Systems, 77(2), pp.391-409, 2015.
11. Yan, Y., Wang, H. and Chen, X.Collaborative path planning based on MAXQ hierarchical reinforcement learning for manned/unmanned aerial vehicles. In 2020 39th Chinese Control Conference (CCC) (pp. 4837-4842). IEEE, 2020.
12. Xu J., Guo Q., Xiao L., Li Z. and Zhang G.Autonomous decision-making method for combat mission of uav based on deep reinforcement learning. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (Vol. 1, pp. 538-544). IEEE, 2019.
13. Liu Q., Shi L., Sun L., Li J., Ding M. and Shu F.Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Transactions on Vehicular Technology, 69(5), pp.5723-5728, 2020.
14. Yan, C., Xiang, X. and Wang, C.Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments. Journal of Intelligent & Robotic Systems, 98(2), pp.297-309, 2020.
15. Pateria S., Subagdja B., Tan A.H. and Quek C.Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5), pp.1-35, 2021.
16. Dietterich T.G.Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 13, pp.227-303, 2000.
17. Sun Y., Ran X., Zhang G., Xu H. and Wang X.AUV 3D path planning based on the improved hierarchical deep Q network. Journal of marine science and engineering, 8(2), p.145, 2020.
18. Low, E.S., Ong, P. and Cheah, K.C.Solving the optimal path planning of a mobile robot using improved Q-learning. Robotics and Autonomous Systems, 115, pp.143-161, 2019.

[1]	Zhiyang Zhang, Yonghua Li, Dongxu Zhang, Yuhan Tang, and Qing Xia. Reliability Evaluation of Flat Car Underframe based on GSA-BP Neural Network and Probability Box [J]. Int J Performability Eng, 2024, 20(6): 400-411.
[2]	Ran Zhang, Min Liu, Yifeng Yin, Qikun Zhang, and Zengyu Cai. Prediction Algorithm for Network Security Situation based on BP Neural Network Optimized by SA-SOA [J]. Int J Performability Eng, 2020, 16(8): 1171-1182.