Data Driven Software Quality Assessment: Correlation Analysis of Code Metrics and Fault-Proneness

doi:10.23940/ijpe.25.03.p4.149156

Abstract

Abstract: Predicting software faults is essential for raising program quality and cutting maintenance expenses. Debugging efforts can be minimized, software failures can be avoided, and overall software reliability can be increased via early detection of problematic modules. Code metrics from NASA's Metrics Data Program (MDP) datasets are analyzed in this research in order to find trends and connections between software complexity and defectiveness. We examine how different code complexity indicators and software flaws are related using statistical methods and exploratory data analysis. We discover that defect-prone modules are highly correlated with cyclomatic complexity, decision density, and unique operands. By determining threshold values for these important indicators, we offer information on the quality of the software and possible places where code maintainability could be improved. This analysis emphasizes the value of empirical investigation, statistical validation, and organized feature selection in defect prediction. We lay the groundwork for future defect avoidance efforts by providing useful suggestions to lower software complexity and increase reliability through comparative analysis across several NASA datasets. By offering data-driven insights that can assist developers in optimizing code architectures and reducing defect risks, the study advances software engineering. Furthermore, our analysis highlights how important it is to comprehend software complexity early on in the development process so that teams may proactively enhance maintainability and code quality. Software engineers, quality assurance teams, and companies looking to create more reliable and fault-resistant software systems can use the research's findings as a guide. Software teams can improve software lifecycle management, reduce post-release problems, and increase productivity by methodically identifying defect-prone modules based on predetermined thresholds. Future developments in real-time monitoring and automated flaw detection systems can bolster these initiatives even more, increasing the effectiveness and dependability of software development.

Key words: software fault prediction, NASA MDP dataset, code metrics, software quality

Seema Kalonia and Amrita Upadhyay. Data Driven Software Quality Assessment: Correlation Analysis of Code Metrics and Fault-Proneness [J]. Int J Performability Eng, 2025, 21(3): 149-156.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Karim S., Warnars H.L.H.S., Gaol F.L., Abdurachman E., and Soewito B., 2017. Software metrics for fault prediction using machine learning approaches: A literature review with PROMISE repository dataset. In 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), pp. 19-23.
[2] Gray D., Bowes D., Davey N., Sun Y., and Christianson B., 2012. Reflections on the NASA MDP data sets. IET Software, 6(6), pp. 549-558.
[3] Siddiqui T., and Mustaqeem M., 2023. Performance evaluation of software defect prediction with NASA dataset using machine learning techniques. International Journal of Information Technology, 15(8), pp. 4131-4139.
[4] Petrić J., Bowes D., Hall T., Christianson B., and Baddoo N., 2016. The jinx on the NASA software defect data sets. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, pp. 1-5.
[5] Ramadhani A.A.P., Nugroho R.A., Faisal M.R., Abadi F., and Herteno R., 2024. The impact of software metrics in NASA metric data program dataset modules for software defect prediction. TELKOMNIKA (Telecommunication Computing Electronics and Control), 22(4), pp. 846-853.
[6] Murillo-Morera J., Quesada-López C., Castro-Herrera C., and Jenkins M., 2016. An empirical evaluation of NASA-MDP data sets using a genetic defect-proneness prediction framework. In 2016 IEEE 36th Central American and Panama Convention (CONCAPAN XXXVI), pp. 1-6.
[7] Canaparo M., Ronchieri E., and Bertaccini G., 2022. Software defect prediction: A study on software metrics using statistical and machine learning methods. In International Symposium on Grids & Clouds 2022, 20.
[8] Meenakshi and Pareek, M., 2023. Software effort estimation using deep learning: a gentle review. In International Conference on Sustainable and Innovative Solutions for Current Challenges in Engineering & Technology, pp. 351-364.
[9] Kalonia S., and Upadhyay A., 2024. Nature inspired optimization algorithms: A gentle review. Emerging Trends in IoT and Computing Technologies, pp. 465-469.
[10] Kalonia S., and Upadhyay A., 2025. Deep learning-based approach to predict software faults. In Artificial Intelligence and Machine Learning Applications for Sustainable Development, pp. 326-348.
[11] Gayatri N.A.G., Soeparno H., Gaol F.L., and Arifin Y., 2024. Enhancing software quality through defect prediction. In 2024 3rd International Conference on Creative Communication and Innovative Technology (ICCIT), pp. 1-7.