Adaptive Ensemble Learning for Software Defect Prediction with Imbalanced Data

doi:10.23940/ijpe.26.03.p6.167177

Abstract

Abstract:

Software Fault Prediction (SFP) plays a very crucial role in improving software reliability by facilitating the early detection of modules prone to defects. Nevertheless, ongoing issues like extreme imbalance in classes and unstable performance of the classifiers on the heterogeneous datasets deter the efficiency of current methods. To address these problems, in this paper, a stability-conscious meta-ensemble learning architecture is proposed combining adaptive sampling with meta-level classifier fusion. Contrary to traditional ensemble-based approaches that rely on resampling and fixed combinations of models, the presented architecture dynamically chooses the appropriate sampling techniques to rely on the properties of the data and trains the best combination of classifiers with the help of a meta-learner. Wide experiments performed on benchmark datasets of PROMISE, NASA, AEEEM, ReLink, and SoftLab indicate that there is a consistent improvement in performance compared to baseline ensemble models with better AUC, MCC, and G-mean. Moreover, the experiments of the cross-project fault prediction prove high generalization and low deterioration of performance. The statistical significance tests such as Wilcoxon Signed-Rank Test, Cliff- Delta, and Nemenyi post-hoc tests confirm the strength of the suggested method. In general, the framework offers a practical and generalizable method of resolving the issues of class imbalance and performance instability in the real-world software fault prediction.

Key words: software fault prediction, meta-ensemble learning, class imbalance, adaptive sampling, cross-project prediction

Ashu Mehta. Adaptive Ensemble Learning for Software Defect Prediction with Imbalanced Data [J]. Int J Performability Eng, 2026, 22(3): 167-177.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References 23

[1]	Wong W.E., Horgan J.R., Syring M., Zage W., and Zage D., 2000. Applying design metrics to predict fault‐proneness: a case study on a large‐scale software system. Software: Practice and Experience, 30(14), pp. 1587-1608.
[2]	Nandeesh T., and Mehta A., 2024. Comparative performance of supervised learning models for software defect detection. In 2024 International Conference on Information Science and Communications Technologies (ICISCT), pp. 19-24.
[3]	Mehta A., Batra I., and Fergina A., 2025. Boosting software fault prediction accuracy with ensemble learning. Engineering Proceedings, 107(1), 63.
[4]	Bennin K.E., Keung J., Phannachitta P., Monden A., and Mensah S., 2017. Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Transactions on Software Engineering, 44(6), pp. 534-550.
[5]	Zhang J., Wu J., Chen C., Zheng Z., and Lyu M.R., 2020. CDS: A cross-version software defect prediction model with data selection. IEEE Access, 8, pp. 110059-110072.
[6]	Jing X.Y., Wu F., Dong X., and Xu B., 2016. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Transactions on Software Engineering, 43(4), pp. 321-339.
[7]	Poon W.N., Bennin K.E., Huang J., Phannachitta P., and Keung J.W., 2017. Cross-project defect prediction using a credibility theory based naive bayes classifier. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 434-441.
[8]	Mehta A., Kaur N., and Kaur A., 2025. An ensemble voting classification approach for software defects prediction. International Journal of Information Technology, 17(3), pp. 1813-1820.
[9]	Arun C., and Lakshmi C., 2022. Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction. Soft Computing, 26(23), pp. 12915-12931.
[10]	Galar M., Fernandez A., Barrenechea E., Bustince H., and Herrera F., 2011. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), pp. 463-484.
[11]	Balogun A.O., Basri S., Said J.A., Adeyemo V.E., Imam A.A., and Bajeh A.O., 2019. Software defect prediction: analysis of class imbalance and performance stability.
[12]	Rathore S.S., Chouhan S.S., Jain D.K., and Vachhani A.G., 2022. Generative oversampling methods for handling imbalanced data in software fault prediction. IEEE Transactions on Reliability, 71(2), pp. 747-762.
[13]	Wang S., and Yao X., 2013. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability, 62(2), pp. 434-443.
[14]	Song Q., Guo Y., and Shepperd M., 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering, 45(12), pp. 1253-1269.
[15]	Gong L., Jiang S., and Jiang L., 2019. Tackling class imbalance problem in software defect prediction through cluster-based over-sampling with filtering. IEEE Access, 7, pp. 145725-145737.
[16]	Huda S., Liu K., Abdelrazek M., Ibrahim A., Alyahya S., Al-Dossari H., and Ahmad S., 2018. An ensemble oversampling model for class imbalance problem in software defect prediction. IEEE Access, 6, pp. 24184-24195.
[17]	Kaliraj S., Kishoore A.M., and Sivakumar V., 2024. Software fault prediction using cross-project analysis: a study on class imbalance and model generalization. IEEE Access, 12, pp. 64212-64227.
[18]	Chen H., Jing X.Y., and Xu B., 2021. Heterogeneous defect prediction through joint metric selection and matching. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), pp. 367-377.
[19]	Goyal S., 2022. Handling class-imbalance with KNN (neighborhood) under-sampling for software defect prediction. Artificial Intelligence Review, 55(3), pp. 2023-2064.
[20]	Yu Q., Jiang S., and Zhang Y., 2017. The performance stability of defect prediction models with class imbalance: an empirical study. IEICE TRANSACTIONS on Information and Systems, 100(2), pp. 265-272.
[21]	Rodriguez D., Herraiz I., Harrison R., Dolado J., and Riquelme J.C., 2014. Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, pp. 1-10.
[22]	Krawczyk B., 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), pp. 221-232.
[23]	Siers M.J., and Islam M.Z., 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Information Systems, 51, pp. 62-71.