Int J Performability Eng ›› 2022, Vol. 18 ›› Issue (9): 644-653.doi: 10.23940/ijpe.22.09.p5.644653

Previous Articles     Next Articles

Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem

Shaili Mishra and Anuja Arora*   

  1. Department of Computer Science Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, 201307, India
  • Submitted on ; Revised on ; Accepted on
  • Contact: *E-mail address: anuja.arora29@gmail.com
  • About author:ANUJA ARORA is working as an Associate Professor in the Computer Science Engineering Department of Jaypee Institute of Information Technology. She is having academics and research experience of 16 years and industry experience of 1.5 years. She has received her Ph.D. degree in Computer Science from Apaji Institute of Mathematics & Applied Computer Technology, Banasthali University, Banasthali, India in Dec 2013. She is a Senior IEEE Member, ACM Member, SIAM Member, INSTICC and Life Member of IAENG. She has published more than 80 research papers in peer-reviewed International Journal, Book Chapter, and Conferences. Three students have been awarded Ph.D. under her supervision and three are in process. Her Research Interest includes Deep Learning, Artificial Neural Network, Social Network Analysis and Mining, Sustainable Computing, Data Science, Machine Learning, Data Mining, Web Intelligence, Web Application development and Web Technologies, Software Engineering, Software Testing and Information Retrieval Systems.
    Shaili Mishra received her M.Tech degree in 2014 from Banasthali Vidhyapeeth University. She is currently pursuing her Ph.D in Computer Science from Jaypee Institute of Information and Technology. Her main research area is uses of Deep Reinforcement Learning approach for Physical Object properties prediction and computations.

Abstract: The emergence of reinforcement learning defines a new research direction in control theory where feedback influences the system behavior in order to achieve the desired output. To date, this research work has focused on the cart pole balancing problem using deep reinforcement learning (Deep RL) algorithms. Deep RL is a comprehensive learning framework to study the interplay in environmental input parameters and corresponding output as feedback and further decision making to design a new parameter set to get better output validated in terms of an achieved reward. In this research paper, deep Q network (DQN) and Double deep Q network (DDQN) have been applied to the cart pole balancing problem and reward is measured using a novel loss function - Huber Loss. Comparison results of DQN with MSE and Huber show the fast convergence performance of the Huber loss function. Thereafter, DQN and Double DQN performance is validated by Huber loss itself. Performance outcome shows that DDQN reduced Huber loss and also converged much faster than DQN.

Key words: reinforcement learning, Cart Pole problem, Q-Learning, deep Q learning, DQN, DDQN