Int J Performability Eng ›› 2020, Vol. 16 ›› Issue (4): 609-617.doi: 10.23940/ijpe.20.04.p12.609617

• Orginal Article • Previous Articles     Next Articles

Active Learning Empirical Research on Cross-Version Software Defect Prediction Datasets

Fang Lia,b, Yubin Qua,b, Junxia Jic,*, Dejun Zhangd, and Long Lia   

  1. aGuangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
    bJiangsu College of Engineering and Technology, Nantong, 226001, China
    cAffiliated Hospital of Nantong University, Nantong, 226000, China
    dGuangdong Planning and Design Institute of Telecommunications Co., Ltd., Nanjing, 210029, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: Ji Junxia
  • Supported by:
    This work was supported by the Nantong science and technology project (JC2018134), the Nantong science and technology project (JC2019106), research topics on education informationization in universities (2019JSETKT064), and scientific research projects of Jiangsu College of Engineering and Technology(GYKY/2019/9).


Software quality plays an important part in software engineering. Active learning is introduced to conduct supervised learning classifier because labeling cost is very high. However, in the real software quality assurance process, there are fewer labeled instances in the initial stage of software development, and there may be a historical data set developed by the same team. Therefore, learning from the historical data set can be used for an active learning query strategy. In our empirical study, we design and conduct experiments on promise datasets, which are gathered from real open-source projects. We find that the meta active learning query strategy can perform better than the commonly used query strategy when a little data is labeled.

Key words: active learning, cross-version software defect prediction, random forest