Username   Password       Forgot your password?  Forgot your username? 


A Two-Stage Feature Weighting Method for Naive Bayes and Its Application in Software Defect Prediction

Volume 14, Number 7, July 2018, pp. 1468-1480
DOI: 10.23940/ijpe.18.07.p10.14681480

Haijin Jia,b, Song Huanga, Xuewei Lva,b, Yaning Wua, and Zhanwei Huia

aCommand & Control Engineering College, Army Engineering University of PLA, Nanjing, 210007, China
bSchool of Computer Science and Technology, Huaiyin Normal University, Huaian, 223300, China

(Submitted on April 5, 2018; Revised on May 23, 2018; Accepted on June 20, 2018)


Software defect prediction (SDP) models facilitate software practitioners to find out defect-prone software modules in software. Software practitioners can then test these defect-prone software modules with limited testing resources to minimize software defects. Among various SDP models, Naive Bayes (NB) has been widely used in SDP because of its simplicity, effectiveness and robustness. The NB classifier is an effective classification approach, especially for data sets with discrete attributes. In NB, the attributes are assumed to be independent and thus equally important. However, in common practice, the attributes of software defect data sets are usually continuous or numeric, and because they are designed for different purposes, their contributions to prediction are different. Therefore, this paper proposes a new NB method called TSWNB, which contains two stages: feature (i.e. attribute) discretization and feature weighting. More specifically, for the stage of feature discretization, we make the comparison between two discretization methods i.e. equal-width discretization method and equal-frequency discretization method, and identify the most appropriate one. For the stage of feature weighting, we use the feature weighting technique to alleviate the equal importance assumption, which combines the obtained feature weights into the NB formula and its likelihood estimations. To evaluate the proposed method, we carry out experiments on 5 software defect data sets of NASA MDP provided by PROMISE repository. Three well-known classification algorithms and two feature weighting techniques are included for comparison. The experimental results reveal the effectiveness and practicability of the two-stage feature weighting method TSWNB.


References: 24

        1. R. Özakıncı, A. Tarhan, “Early Software Defect Prediction: A Systematic Map and Review,” Journal of Systems & Software, 2018
        2. R. Malhotra, “An empirical framework for defect prediction using machine learning techniques with Android software,” Applied Soft Computing, 2016.
        3. T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2-13, 2007.
        4. J. Hand, K. Yu, “Idiot’s Bayes - not so stupid after all?,” International Statistical Review, vol. 69, no. 3, pp. 385-398, 2001.
        5. B. Cestnik, I. Kononenko, and I. Bratko, “Assistant 86: A knowledge-elicitation tool for sophisticated users,” In Proceedings of the Second European Working Session on Learning, pp. 31-45, Wulmslow, UK: Sigma Press, 1987.
        6. T. T. Wong, “A hybrid discretization method for naïve Bayesian classifiers,”  Pattern Recognition, vol. 45, no. 6, pp. 2321-2325, 2012.
        7. R. Kohavi, M. Sahami, Error-based and entropy-based discretization of continuous features. Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining, Portland, OR, pp.114–119, 1996.
        8. B. Turhan, A. B. Bener, “Software Defect Prediction: Heuristics for Weighted Naive Bayes,” Proceedings of the Second International Conference on Software and Data Technologies, pp. 244-249, 2007.
        9. N. B. Ebrahimi, “On the statistical analysis of the number of errors remaining in a software design document after inspection,” IEEE Trans. Softw. Eng., vol. 23, no. 8, pp. 529–532, 1997.
        10. K. El Emam and O. Laitenberger, “Evaluating capture-recapture models with two inspectors,” IEEE Trans. Softw. Eng., vol. 27, no. 9, pp. 851–864, 2001.
        11. S. S. Rathore, S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,” Knowle dge-Base d Systems, vol. 119, pp. 232-256, 2017.
        12. K. Ganesan, T. M. Khoshgoftaar, and E. Allen, “Case-based software quality prediction,” Int’l J. Software Eng. and Knowledge Eng., vol. 10, no. 2, pp. 139–152, 2000.
        13. B. Turhan and A. Bener, “Analysis of naive bayes’ assumptions on software fault data: An empirical study,” Data Knowledge Eng., vol. 68, no. 2, pp. 278–290, 2009.
        14. H. M. Olague, S. Gholston, S. Quattlebaum, “Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes,” IEEE Transactions on Software Engineering, vol. 33, no. 6, pp. 402–419, 2007.
        15. J. Dougherty, R. Kohavi, M. Sahami, Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference on Machine Learning, SanFrancisco, CA:MorganKaufmann, pp.194 – 202, 1995.
        16. J. Catlett, “On changing continuous attributes into ordered discrete attributes,” Proceedings of the 5th European Working Sessionon Learning on Machine Learning, Porto, Portugal, , pp.164–178, 1991.
        17. H. Zhang, S. Sheng, “Learning weighted naïve Bayes with accurate ranking,” In Proceedings of the 4th International Conference on Data Mining. IEEE, Brighton, UK, pp.567–570, 2004.
        18. M. Hall, “Correlation-based feature selection for discrete and numeric class machine learning,” In: Proceedings of the 17th International Conferenceon Machine Learning, pp.359–366, 2000.
        19. L. Jiang, C. Li, S. Wang, L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification,” Engineering Applications of Artificial Intelligence, vol. 52, no. C, pp. 26 –39, 2016.
        20. E. Alpaydin, “Introduction to Machine Learning,” The MIT Press, October 2004.
        21. J. R. Quinlan, “C4.5: programs for machine learning,” 1993.
        22. S. Huda, et al., “A Framework for Software Defect Prediction and Metric Selection,” IEEE Access, no. 99, 2017
        23. H. M. Olague, S. Gholston, S. Quattlebaum, “Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes,” IEEE Transactions on Software Engineering, vol. 33, no. 6, pp. 402–419, 2007.
        24. G. Jagannathan, K. Pillaipakkamnatt, and R. N. Wright, "A Practical Differentially Private Random Decision Tree Classifier," in IEEE International Conference on Data Mining Workshops, pp. 114-121, 2009.


              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

              This site uses encryption for transmitting your passwords.