LAL: Meta-Active Learning-based Software Defect Prediction

doi:10.23940/ijpe.20.02.p5.203213

Abstract

Abstract:

Software defect prediction plays an important role in improving the quality of software systems. Active learning can be used to choose unlabeled instances to construct a classifier for software defect prediction so that a smaller size of labeled instances and lower costs are needed. However, in the real software quality assurance process, there are a few labeled instances in the initial stage of software development. Moreover, there is a natural class imbalance in gathered software modules because most of software modules are defect-free modules. Therefore, a meta-active learning is introduced to resolve this problem. Firstly, the target dataset distribution can be learned via learning active learning (LAL) from historical datasets using random forests. The regression model is learned from the unbalanced dataset with Gaussian distribution. Finally, the model is used to calculate the loss gain of the unlabeled software module, and the sample with the max loss increase is labeled. In our empirical study, we conduct experiments on AEEEM, MORPH, and NASA datasets, which are gathered from real open source projects. Firstly, we analyze the influence of different query strategies and find that LAL can achieve the best performance on the three datasets when the proportion of labeled datasets is lower. Then, we compare the LAL query strategy with five state-of-the-art query strategies when the initial labeled instances ratio changes from 1% and 5% to 10%. We find that LAL can achieve the best performance.

Key words: active learning, software defect prediction, class imbalance, random forest

Yubin Qu, Fang Li, and Xiang Chen. LAL: Meta-Active Learning-based Software Defect Prediction [J]. Int J Performability Eng, 2020, 16(2): 203-213.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References 21.

1.	Chen X, Gu Q, Liu WS, Liu SL and Ni C, “State-of-the-Art survey of static software defect prediction,” Ruan Jian Xue Bao/Journal of Software, vol. 27, no. 1, pp. 1-25, 2016
2.	Qu Y.B and Chen X. “Software Defect Prediction Method Based on Cost-Sensitive Active Learning,” Journal of Nantong University (Natural Science Edition), vol. 18, no. 1, pp. 9-25, 2019
3.	S. Hosseini, B. Turhan, D. Gunarathna, "A systematic literature review and meta-analysis on cross project defect prediction", IEEE Transactions on Software Engineering, vol. 45, no. 2, pp. 111-147, February 2019
4.	Huihua Lu, Bojan Cukic, “An adaptive approach with active learning in software fault prediction,” in Proc. of the 8th International Conference on Predictive Models in Software Engineering, pp. 79-88,Lund, Sweden, 2012
5.	Ming Li, Hongyu Zhang, Rongxin Wu, Zhi-Hua Zhou, “Sample-based software defect prediction with active and semi-supervised learning,” Automated Software Engineering, vol. 19, no. 2, pp. 201-230, June 2012
6.	G. Luo, K. QIN et al., “Active learning for software defect prediction,” IEICE Transactions on Information and Systems, vol. 95, no. 6, pp. 1680-1683, 2012
7.	H. Lu, E. Kocaguneli, B. Cukic, "Defect prediction between software versions with active learning and dimensionality reduction", in Proc. 2014 IEEE 25th International Symposium on Software Reliability Engineering, Naples, ,pp. 312-322,Italy, November 2014
8.	Zhou Xu, Jin Liu, Xiapu Luo,Tao Zhang.“Cross-version defect prediction via hybrid active learning with kernel principal component analysis,” In Proc. of SANER '18. IEEE, Campobasso, pp. 209—220,Italy, March 2018
9.	Malhotra R, Kamal S, “An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data,” Neurocomputing, vol. 343, pp.120-140, May 2019
10.	J. Zhu, H. Wang,E. Hovy, “Learning a stopping criterion for active learning for word sense disambiguation and text classification,” in Proceedings of the Third International Joint Conference on Natural Language Processing, pp.366-372,Hyderabad, India, 2008
11.	Katrin Tomanek, Udo Hahn, “Reducing class imbalance during active learning for named entity annotation,” in Proceedings of the fifth international conference on Knowledge capture, Redondo Beach, pp. 105-112,California, USA, 2009
12.	S. Hudaet al., "An ensemble oversampling model for class imbalance problem in software defect prediction", IEEE Access, vol. 6, pp. 24184-24195, March 2018
13.	K. Konyushkova, R. Sznitman and P. Fua., “Learning active learning from data,” in Advances in Neural Information Processing Systems, Long Beach, pp. 4228—4238,CA, USA ,2017
14.	D. Lewis and W. Gale. “A sequential algorithm for training text classifiers,” in Proc. of SIGIR conference on Research and development in information retrieval, pp. 3-12,Dublin, Ireland, 1994
15.	B. Settles, “Active learning literature survey,” University of Wisconsin-Madison, Madison, WI, USA,2010
16.	Donmez, Pinar, Jaime G. Carbonell, and Paul N. Bennett. "Dual strategy active learning," in Proc. of Machine Learning: ECML 2007, pp. 116-127,Warsaw, Poland, 2007
17.	S. Huang, R. Jin, Z. and H. Zhou, “Active learning by querying informative and representative examples,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, No. 10, pp. 1936 - 1949, October 2014
18.	C.-L. Li, C.-S. Ferng, H.-T. Lin, "Active learning using hint information," Neural Computation, vol. 27, no. 8, pp. 1738-1765, 2015
19.	S. Wang, X. Yao, "Using class imbalance learning for software defect prediction," IEEE Trans. Rel., vol. 62, no. 2, pp. 434-443, June 2013
20.	Ni C., Chen X., Wu F., Shen Y., Gu Q., “An empirical study on pareto based multiobjective feature selection for software defect prediction,” Journal of Systems and Software, vol. 152, pp. 215-238, June 2019
21.	H. T. Nguyen, A. Smeulders, "Active learning using pre-clustering", in Proc. of 21th ICML, pp. 79-86, Banff, Alberta, Canada, 2004

[1]	Ashu Mehta, Navdeep Kaur, and Amandeep Kaur. A Review of Software Fault Prediction Techniques in Class Imbalance Scenarios [J]. Int J Performability Eng, 2025, 21(3): 123-130.
[2]	Ashu Mehta, Navdeep Kaur, and Amandeep Kaur. Addressing Class Imbalance in Software Fault Prediction using BVPC-SENN: A Hybrid Ensemble Approach [J]. Int J Performability Eng, 2025, 21(2): 94-103.
[3]	Sanjay M, Deepashree P. Vaideeswar, Kalapraveen Bagadi, Visalakshi Annepu, and Beebi Naseeba. Hyperspectral Image Classification: A Hybrid Approach Integrating Random Forest Feature Selection and Convolutional Neural Networks for Enhanced Accuracy [J]. Int J Performability Eng, 2024, 20(5): 263-270.
[4]	Naveen Monga and Parveen Sehgal. Effective Software Defect Prediction: Evaluating Classifiers and Feature Selection with Firefly Algorithm [J]. Int J Performability Eng, 2024, 20(4): 195-204.
[5]	Ashu Mehta, Amandeep Kaur, and Navdeep Kaur. Optimizing Software Fault Prediction using Voting Ensembles in Class Imbalance Scenarios [J]. Int J Performability Eng, 2024, 20(11): 676-687.
[6]	C. Rohith Bhat and Madhusundar Nelson. Artificial Intelligence Based Credit Card Fraud Detection for Online Transactions Optimized with Sparrow Search Algorithm [J]. Int J Performability Eng, 2023, 19(9): 624-632.
[7]	Pranshu Kumar Soni and Leema Nelson. PCP: Profit-Driven Churn Prediction using Machine Learning Techniques in Banking Sector [J]. Int J Performability Eng, 2023, 19(5): 303-311.
[8]	Harshita Batra and Leema Nelson. DCADS: Data-Driven Computer Aided Diagnostic System using Machine Learning Techniques for Polycystic Ovary Syndrome [J]. Int J Performability Eng, 2023, 19(3): 193-202.
[9]	Priyanshu Verma, Ishan Sharma, Sonia Deshmukh, and Rohit Vashisht. Customer Churn Analysis using Spark and Hadoop [J]. Int J Performability Eng, 2023, 19(10): 663-675.
[10]	Shobhanam Krishna and Sumati Sidharth. HR Analytics: Employee Attrition Analysis using Random Forest [J]. Int J Performability Eng, 2022, 18(4): 275-281.
[11]	K. Eswara Rao, G. Appa Rao, and P. Sankara Rao. A Weighted Ada-Boosting Approach for Software Defect Prediction using Characterized Code Features Associated with Software Quality [J]. Int J Performability Eng, 2022, 18(11): 798-807.
[12]	Lakshmi Kala Pampana, and Manjula Sri Rayudu. Multi-Class Classification of Retinal Abnormality using Machine Learning Algorithms [J]. Int J Performability Eng, 2022, 18(11): 826-832.
[13]	C Chandana and G Parthasarathy. Efficient Machine Learning Regression Algorithm using Naïve Bayes Classifier for Crop Yield Prediction and Optimal Utilization of Fertilizer [J]. Int J Performability Eng, 2022, 18(1): 47-55.
[14]	Ngan Tran, Haihua Chen, Janet Jiang, Jay Bhuyan, Junhua Ding. Effect of Class Imbalance on the Performance of Machine Learning-based Network Intrusion Detection [J]. Int J Performability Eng, 2021, 17(9): 741-755.
[15]	Mahesha Pandit, Deepali Gupta. Performance of Genetic Programming-based Software Defect Prediction Models [J]. Int J Performability Eng, 2021, 17(9): 787-795.