Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (10): 2701-2708.doi: 10.23940/ijpe.19.10.p16.27012708

• Orginal Article • Previous Articles     Next Articles

Active Learning using Uncertainty Sampling and Query-by-Committee for Software Defect Prediction

Yubin Qua, Xiang Chenb*, Ruijie Chenc, Xiaolin Jub, and Jiangfeng Guoa   

  1. aJiangsu College of Engineering and Technology, Nantong, 226001, China
    bNantong University, Nantong, 226019, China
    cNanjing Foreign Language School, Nanjing, 210008, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: Chen Xiang
  • About author:

    * Corresponding author. E-mail address: xchencs@ntu.edu.cn

  • Supported by:
    This work was supported by the Nantong Science and Technology Project (No JC2018134)

Abstract:

In the process of software defect prediction dataset construction, there are problems such as high labeling costs. Active learning can reduce labeling costs when using uncertainty sampling. Samples with the most uncertainty will be labeled, but samples with the highest certainty will always be discarded. According to cognitive theory, easy samples can promote the performance of the model. Therefore, a hybrid active learning query strategy is proposed. For the sample with lowest information entropy, query-by-committee will analyze it again using vote entropy. Empirical studies show that the proposed HIVE approach outperforms several state-of-the-art active learning approaches.

Key words: active learning, vote entropy, software defect prediction, uncertainty sampling