A Review of Software Fault Prediction Techniques in Class Imbalance Scenarios

doi:10.23940/ijpe.25.03.p1.123130

Abstract

Abstract: A thorough analysis of methods for addressing class imbalance in software failure prediction is presented in this work. A common problem that has a big influence on machine learning models' performance and frequently results in biased predictions is class imbalance. To lessen this difficulty, a range of strategies have been investigated, including ensemble strategies like Bagging, Boosting, Stacking, and Two-Stage Ensembles; algorithm-level strategies like Cost-Sensitive Learning; and data-level strategies like SMOTE and MAHAKIL. Based on important performance criteria like accuracy, precision, recall, and stability, the evaluation determines how well these methods work on a number of popular datasets, including PROMISE, NASA, and CPDP. Furthermore, hybrid approaches that blend ensemble learning and sampling strategies have demonstrated encouraging outcomes in terms of enhancing prediction resilience and accuracy. In order to help choose the best techniques for software failure prediction in unbalanced situations, this research attempts to shed light on the advantages and disadvantages of each strategy.

Key words: software fault prediction, class imbalance, machine learning, ensemble techniques, SMOTE

Ashu Mehta, Navdeep Kaur, and Amandeep Kaur. A Review of Software Fault Prediction Techniques in Class Imbalance Scenarios [J]. Int J Performability Eng, 2025, 21(3): 123-130.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Hall T., Beecham S., Bowes D., Gray D., and Counsell S., 2011. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), pp. 1276-1304.
[2] Buckley F.J., and Poston R., 1984. Software quality assurance. IEEE Transactions on Software Engineering, (1), pp. 36-41.
[3] Jing X.Y., Wu F., Dong X., and Xu B., 2016. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Transactions on Software Engineering, 43(4), pp. 321-339.
[4] Tantithamthavorn C., McIntosh S., Hassan A.E., and Matsumoto K., 2016. An empirical comparison of model validation techniques for defect prediction models. IEEE Transactions on Software Engineering, 43(1), pp. 1-18.
[5] Krawczyk B.,2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), pp. 221-232.
[6] Weiss G.M., and Provost F., 2001. The effect of class distribution on classifier learning: an empirical study.
[7] Yoon K., and Kwek S., 2007. A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Computing and Applications, 16, pp. 295-306.
[8] Ali U., Aftab S., Iqbal A., Nawaz Z., Bashir M.S., and Saeed M.A., 2020. Software defect prediction using variant based ensemble learning and feature selection techniques. International Journal of Modern Education and Computer Science, 13(5), 29.
[9] Pand ey S.K., Rathee D., and Tripathi A.K., 2020. Software defect prediction using K‐PCA and various kernel‐based extreme learning machine: an empirical study. IET Software, 14(7), pp. 768-782.
[10] Mehta S., and Patnaik K.S., 2021. Improved prediction of software defects using ensemble machine learning techniques. Neural Computing and Applications, 33(16), pp. 10551-10562.
[11] Khleel N.A.A., and Nehéz K., 2024. Software defect prediction using a bidirectional LSTM network combined with oversampling techniques. Cluster Computing, 27(3), pp. 3615-3638.
[12] Gupta M., Rajnish K., and Bhattacharjee V., 2024. Software fault prediction with imbalanced datasets using SMOTE-tomek sampling technique and genetic algorithm models. Multimedia Tools and Applications, 83(16), pp. 47627-47648.
[13] Bennin K.E., Keung J., Phannachitta P., Monden A., and Mensah S., 2017. Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Transactions on Software Engineering, 44(6), pp. 534-550.
[14] Song Q., Guo Y., and Shepperd M., 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering, 45(12), pp. 1253-1269.
[15] Wang S., and Yao X., 2013. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability, 62(2), pp. 434-443.
[16] Galar M., Fernand ez A., Barrenechea E., Bustince H., and Herrera F., 2011. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), pp. 463-484.
[17] Yu Q., Jiang S., and Zhang Y., 2017. The performance stability of defect prediction models with class imbalance: an empirical study. IEICE TRANSACTIONS on Information and Systems, 100(2), pp. 265-272.
[18] Tong H., Liu B., and Wang S., 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Information and Software Technology, 96, pp. 94-111.
[19] Rodriguez D., Herraiz I., Harrison R., Dolado J., and Riquelme J.C., 2014. Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, pp. 1-10.
[20] Gao K., Khoshgoftaar T.M., and Napolitano A., 2014. The use of ensemble-based data preprocessing techniques for software defect prediction. International Journal of Software Engineering and Knowledge Engineering, 24(09), pp. 1229-1253.
[21] Malhotra R., and Jain J., 2020. Hand ling imbalanced data using ensemble learning in software defect prediction. In 2020 10th International Conference on Cloud Computing, Data Science & Engineering(Confluence), pp. 300-304.
[22] Sun Z., Song Q., and Zhu X., 2012. Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), pp. 1806-1817.
[23] Balogun A.O., Basri S., Said J.A., Adeyemo V.E., Imam A.A., and Bajeh A.O., 2019. Software defect prediction: analysis of class imbalance and performance stability.
[24] Goel L., Sharma M., Khatri S.K., and Damodaran D., 2018. Implementation of data sampling in class imbalance learning for cross project defect prediction: an empirical study. In 2018 Fifth International Symposium on Innovation in Information and Communication Technology (ISIICT), pp. 1-6.
[25] Siers M.J., and Islam M.Z., 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Information Systems, 51, pp. 62-71.