Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem

doi:10.23940/ijpe.22.09.p5.644653

Int J Performability Eng ›› 2022, Vol. 18 ›› Issue (9): 644-653.doi: 10.23940/ijpe.22.09.p5.644653

Previous Articles Next Articles

Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem

Shaili Mishra and Anuja Arora^*

Department of Computer Science Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, 201307, India

Submitted on ; Revised on ; Accepted on
Contact: *E-mail address: anuja.arora29@gmail.com
About author:ANUJA ARORA is working as an Associate Professor in the Computer Science Engineering Department of Jaypee Institute of Information Technology. She is having academics and research experience of 16 years and industry experience of 1.5 years. She has received her Ph.D. degree in Computer Science from Apaji Institute of Mathematics & Applied Computer Technology, Banasthali University, Banasthali, India in Dec 2013. She is a Senior IEEE Member, ACM Member, SIAM Member, INSTICC and Life Member of IAENG. She has published more than 80 research papers in peer-reviewed International Journal, Book Chapter, and Conferences. Three students have been awarded Ph.D. under her supervision and three are in process. Her Research Interest includes Deep Learning, Artificial Neural Network, Social Network Analysis and Mining, Sustainable Computing, Data Science, Machine Learning, Data Mining, Web Intelligence, Web Application development and Web Technologies, Software Engineering, Software Testing and Information Retrieval Systems.
Shaili Mishra received her M.Tech degree in 2014 from Banasthali Vidhyapeeth University. She is currently pursuing her Ph.D in Computer Science from Jaypee Institute of Information and Technology. Her main research area is uses of Deep Reinforcement Learning approach for Physical Object properties prediction and computations.

Abstract

Abstract: The emergence of reinforcement learning defines a new research direction in control theory where feedback influences the system behavior in order to achieve the desired output. To date, this research work has focused on the cart pole balancing problem using deep reinforcement learning (Deep RL) algorithms. Deep RL is a comprehensive learning framework to study the interplay in environmental input parameters and corresponding output as feedback and further decision making to design a new parameter set to get better output validated in terms of an achieved reward. In this research paper, deep Q network (DQN) and Double deep Q network (DDQN) have been applied to the cart pole balancing problem and reward is measured using a novel loss function - Huber Loss. Comparison results of DQN with MSE and Huber show the fast convergence performance of the Huber loss function. Thereafter, DQN and Double DQN performance is validated by Huber loss itself. Performance outcome shows that DDQN reduced Huber loss and also converged much faster than DQN.

Key words: reinforcement learning, Cart Pole problem, Q-Learning, deep Q learning, DQN, DDQN

Shaili Mishra and Anuja Arora. Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem [J]. Int J Performability Eng, 2022, 18(9): 644-653.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1. Sewak M.Deep reinforcement learning. Springer Singapore, 2019.
2. Mukherjee A.A comparison of reward functions in q-learning applied to a cart position problem.arXiv preprint arXiv:2105.11617, 2021.
3. Wang Z., Bapst V., Heess N., Mnih V., Munos R., Kavukcuoglu K., andde Freitas, N. Sample efficient actor-critic with experience replay.arXiv preprint arXiv:1611.01224, 2016.
4. Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning.In Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
5. Dostupn´e tieˇz z: https://www.Analyticsvidhya.com/blog/2019/04/introduction-deep-qlearning-python/[online], Accessed on August 2022.
6. Naik V., Sahoo R., Mahajan S., Singh S., andMalik S.Exploration-Exploitation Problem in Policy-Based Deep Reinforcement Learning for Episodic and Continuous Environments, 2021.
7. Witte A. DDQN: Metrics for measuring stability using the example of replay buffer size and minibatch size, 2021
8. Hessel M., Modayil J.,Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., and Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
9. Gessow S., Bhat S., Hung Y. C., andGyuloglyan V.Parallel Double Cart Pole 239AS Project Report S2021.
10. Halat, S.,Ebadzadeh, M. M.Modified Double DQN: Addressing Stability. arXiv preprint arXiv:2108.04115, 2021
11. Kumar S.Balancing a cartpole system with reinforcement learning-a tutorial.arXiv preprint arXiv:2006.04938, 2020.
12. Nagendra S., Podila N., Ugarakhod R., andGeorge K.Comparison of reinforcement learning algorithms applied to the cart-pole problem.In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 26-32, IEEE, 2017.
13. Bignold A., Cruz F., Dazeley R., Vamplew P., andFoale C.Human engagement providing evaluative and informative advice for interactive reinforcement learning.Neural Computing and Applications, pp. 1-16, 2022.
14. Gym, O., Sanghi, N.Deep reinforcement learning with python, Apress, 2021
15. Kafetzis, I., Moysis, L.Inverted pendulum: A system with innumerable applications.School of Mathematical Sciences, 2017
16. Choudhary A.A hands-on introduction to deep q-learning using openai gym in python,Analytics Vidhya, 2019.
17. Sewak M.Double DQN in Code. In Deep Reinforcement Learning, Springer, Singapore.Conference Name:ACM Woodstock conference, pp. 109-126, 2019.

[1]	Sushant Jhingran, Mayank Kumar Goyal, and Nitin Rakesh. DQLC: A Novel Algorithm to Enhance Performance of Applications in Cloud Environment [J]. Int J Performability Eng, 2023, 19(12): 771-778.
[2]	Yuting Cheng, Dongcheng Li, W. Eric Wong, Man Zhao, and Dengfeng Mo. Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing [J]. Int J Performability Eng, 2022, 18(7): 463-474.
[3]	Chengjie Xu, Dongcheng Li, W. Eric Wong, and Man Zhao. Service Caching Strategy based on Edge Computing and Reinforcement Learning [J]. Int J Performability Eng, 2022, 18(5): 350-358.
[4]	Haoran Li, Dongcheng Li, W. Eric. Wong, Deze Zeng, and Man Zhao. Kubernetes Virtual Warehouse Placement based on Reinforcement Learning [J]. Int J Performability Eng, 2021, 17(7): 579-588.
[5]	Wei Feng and Yuqin Wu. DDoS Attack Real-Time Defense Mechanism using Deep Q-Learning Network [J]. Int J Performability Eng, 2020, 16(9): 1362-1373.
[6]	Shujun Pei, Qinggen Zhang, and Xuehui Cheng. Workflow Scheduling using Graph Segmentation and Reinforcement Learning [J]. Int J Performability Eng, 2020, 16(8): 1262-1270.

Double Deep Q Network with Huber Reward Function for Cart-Pole Balancing Problem

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 6

Recommended 0