Int J Performability Eng ›› 2008, Vol. 4 ›› Issue (1): 5-18.doi: 10.23940/ijpe.08.1.p5.mag

• Original articles •     Next Articles

Software Quality Modeling and Estimation with Missing Data

NAEEM SELIYA1, TAGHI M. KHOSHGOFTAAR2   

  1. 1Computer and Information Science, University of Michigan – Dearborn 4901 Evergreen Road, Dearborn, MI 48128, USA, January 2008
    2Computer Science and Engineering, Florida Atlantic University 777 Glades Road, Boca Raton, FL 33431, USA

Abstract:

Software quality estimation models generally exploit the software engineering measurements hypothesis that software metrics encapsulate the underlying quality of the software system. A typical model is trained using software measurements and fault data of a similar, previously developed project. Such a strategy requires complete knowledge of fault data for all of the training modules. However, various practical software engineering issues limit the availability of fault data for all modules in the training data. We present a semi-supervised learning scheme as a solution to software defect modeling when there is limited prior knowledge of software quality. The commonly used EM algorithm for estimating missing data values is used in conjunction with k-means clustering. An empirical investigation using software measurement and defect data from real world projects demonstrates the effectiveness and viability of the proposed method. It is shown that estimation accuracy of the defect prediction model after the semi-supervised learning process is generally better compared to a defect prediction model trained with a dataset consisting of (only available) program modules with known number of faults.
Received on March 15, 2007
References: 31