Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (6): 1263-1274.doi: 10.23940/ijpe.18.06.p17.12631274

• Original articles • Previous Articles     Next Articles

An Information Flow-based Feature Selection Method for Cross-Project Defect Prediction

Yaning Wu, Song Huang, and Haijin Ji   

  1. Research Center of Software Engineering, Army Engineering University of PLA, Nanjing, 210001, China

Abstract:

Software defect prediction (SDP) plays a significant part in identifying the most defect-prone modules before software testing and allocating limited testing resources. One of the most commonly used scenarios in SDP is classification. To guarantee the prediction accuracy, the classification models should first be trained appropriately. The training data could be obtained from historical software repositories, which may affect the performance of classification to a large extent. In order to improve the data quality, we propose a novel software feature selection method, which innovatively utilizes the information flows to perform causality analysis in the features of training datasets. More specifically, we conduct causality analysis between each feature metric and the labeled metric bug; then, based on the obtained feature ranking list, we select the top-k features to control redundancy. Finally, we choose the most suitable feature subset based on the F-measure. To demonstrate the effectiveness and practicability of the feature selection method, we select the Nearest Neighbor approach to construct a homogeneous training dataset, and utilize three commonly used classification models to implement comparison experiments. The final experimental results have verified the availability and validity of the feature selection method.


Submitted on March 12, 2018; Revised on April 17, 2018; Accepted on May 8, 2018
References: 34