Int J Performability Eng ›› 2022, Vol. 18 ›› Issue (7): 463-474.doi: 10.23940/ijpe.22.07.p1.463474

    Next Articles

Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing

Yuting Chenga, Dongcheng Lib, W. Eric Wongb, Man Zhaoa,*, and Dengfeng Moa   

  1. aSchool of Computer Science, China University of Geosciences, Wuhan, 430074, China;
    bDepartment of Computer Science, University of Texas at Dallas, 75082, USA
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address:

Abstract: In practice, classical path optimization algorithms performs poorly when applied to an unknown environment, swarm intelligence algorithms need further improvement in agility and accuracy to avoid a moving object in dynamic environment, and reinforcement learning algorithm, a usual solution adopted in machine learning, may give rise to curse of dimensionality due to the complexity of scenario. In view of aforesaid practical problems, this paper proposes using MAXQ hierarchical reinforcement learning method to achieve dimensionality reduction by abstraction and combining leader-wingman approach with dynamic dead zone to model after cooperative formation and design triangular form. A novel algorithm based on MAXQ and simulated annealing is designed to solve unmanned aerial vehicle (UAV) path planning problem, which accomplishes grid method-based path planning simulation in 2D scenarios. A comparative analysis is performed on Q-Learning, ε-Q-Learning, standard MAXQ and SA-MAXQ algorithms in terms of their convergence, time consumption and search steps. Moreover, leader-wingman method is combined with dynamic dead zone in modelling triangular form for Multi-UAV collaborative formation. The experimental results indicate SA-MAXQ algorithm yields quicker astringence, lower volatility, better learning effect, less time consumed and optimized searched route.

Key words: path planning, UAV collaboration, MAXQ hierarchical reinforcement learning, simulated annealing