Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (6): 1291-1299.doi: 10.23940/ijpe.18.06.p20.12911299

• Original articles • Previous Articles     Next Articles

Impact of Hyper Parameter Optimization for Cross-Project Software Defect Prediction

Yubin Qua, Xiang Chenb, Yingquan Zhaob, and Xiaolin Jub   

  1. aSchool of Mechanical and Electrical Engineering, Jiangsu College of Engineering and Technology, Nantong, 212003, China
    bSchool of Computer Science and Technology, Nantong University, Nantong, 226000, China

Abstract:

Recently, most studies have considered the default value for hyper parameters of the classification methods used by cross-project defect prediction (CPDP) methods. However, in previous studies for within-project defect prediction (WPDP), researchers found that the optimization for hyper parameter helps to improve the performance of software defect prediction models. Moreover, the default value for some hyper parameters in different machine learning libraries (such as Weka, Scikit-learn) may not be consistent. To the best of our knowledge, we first conduct an in-depth analysis for the influence on the performance of CPDP by using hyper parameter optimization. Based on different classification methods, we consider 5 different instance selection based CPDP methods in total. In our empirical studies, we choose 8 projects in AEEEM and Relink datasets as our evaluation subjects, and we use AUC as our model performance measure. Final results show that among these methods, the influence of hyper parameter optimization for 4 methods is non-negligible. Among the 11 hyper parameters considered by these 5 classification methods, the influence of 8 hyper parameters is non-negligible, and these hyper parameters are mainly distributed in support vector machine and k nearest neighbor classification methods. Meanwhile, by analyzing the actual computational cost of hyper parameter optimization, we find that the spent time is within the acceptable range. These empirical results show that in the future CPDP research, the hyper parameter optimization should be considered in experimental design.


Submitted on March 2, 2018; Revised on April 16, 2018; Accepted on May 19, 2018
References: 32