Int J Performability Eng ›› 2021, Vol. 17 ›› Issue (1): 123-134.doi: 10.23940/ijpe.21.01.p12.123134

• Orginal Article • Previous Articles     Next Articles

Prediction of Number of Software Defects based on SMOTE

Guoqiang Xiea, Shiyi Xiea,b, Xiaohong Penga,b, and Zhao Lia,b,*   

  1. aCollege of Mathematics and Computer, Guangdong Ocean University, Zhanjiang, 524088, China
    bMarine Resources Big Data Center of South China Sea, Southern Marine Science and Engineering Guangdong Laboratory (Zhanjiang) Zhanjiang, 524088, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: * Corresponding author. E-mail address: zhaoli@gdou.edu.cn
  • About author:
    Guoqiang Xie is a master student. His research interests include software testing and big data mining.
    Shiyi Xie is a professor of the Guangdong Ocean University. His research interests include big data analysis and intelligence computing.
    Xiaohong Peng is a professor of the Guangdong Ocean University. His research interests include big data analysis and machine learning.
    Zhao Li is an associate professor of the Guangdong Ocean University. His research interests include software testing and software reliability.
  • Supported by:
    the Program for Guangdong Provincial Key Laboratory of Cyber-Physical Systems (No2020B1212060069)the Program for National & Local Joint Engineering Research Center of Intelligent Manufacturing Cyber-Physical Systemsthe Southern Marine Science and Engineering Guangdong Laboratory (Zhanjiang) (NoZJW-2019-06, 013S19006-007)and the Program for Scientific Research Start-up Funds of Guangdong Ocean University

Abstract:

Prediction of software defects is an effective way to improve system quality, and it is a key factor affecting the efficiency of defect detection and repair in software components. The purpose of this study is to improve the effectiveness of component defect prediction in the following two ways: for the imbalance of training data in defect prediction and the insufficient support of single regression in predicting the number of defects in components First, this study proposed to adopt SMOTE to construct a balanced sample dataset and oversample the defective components in the unbalanced sample dataset to take into account the proportion of different types of samples and improve the accuracy of prediction; second, this study proposes a method of multi-step prediction for the number of defects that supports regression after classification, and the method applies support vector machines to classify components and filter out non-defective components in the classification results, applies regression to establish a component defect number prediction model to effectively implement the multi-step prediction of component defect number, and further improves the accuracy of prediction. The evaluation of the experiment was completed on open-source datasets. The results show that the accuracy of multi-step prediction is better than the prediction by regression alone, and multi-step prediction has higher overall efficiency and applicability.

Key words: defect number prediction, oversampling, classification, SVM, regression