Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (9): 2105-2115.doi: 10.23940/ijpe.18.09.p19.21052115

Previous Articles     Next Articles

Using Cross-Entropy Value of Code for Better Defect Prediction

Xian Zhang*, Kerong Ben, and Jie Zeng   

  1. Department of Computer and Data Engineering, Naval University of Engineering, Wuhan, 430033, China
  • Revised on ; Accepted on
  • Contact: * E-mail address: tomtomzx@foxmail.com

Abstract: Defect prediction is meaningful because it can assist software inspection by predicting defective code locations and improving software reliability. Many software features are designed for defect prediction models to identify potential bugs, but no one feature set can perform well in most cases yet. To improve defect prediction, this paper proposes a new code feature, the cross-entropy value of the sequence of code’s abstract syntax tree nodes (CE-AST), and develops a neural language model for feature measurement. To evaluate the effectiveness of CE-AST, we first investigate its discrimination for defect-proneness. Experiments on 12 Java projects show that CE-AST is more discriminative than 45% of twenty widely used traditional features. Furthermore, we investigate CE-AST’s contribution to defect prediction. Combined with different traditional feature suites to feed prediction models, CE-AST can bring performance improvements of 4.7% in Precision, 2.5% in Recall, and 3.5% in F1 on average.

Key words: software reliability, defect prediction, natural language processing, language model, code naturalness, cross-entropy