Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (6): 1140-1148.doi: 10.23940/ijpe.18.06.p5.11401148

• Original articles • Previous Articles     Next Articles

A Novel Imbalanced Classification Method based on Decision Tree and Bagging

Hongjiao Guana, Yingtao Zhanga, Hengda Chengb, and Xianglong Tanga   

  1. aSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
    bSchool of Computer Science and Technology, Utah State University, Logan, 84322, USA

Abstract:

Imbalanced classification is a challenging problem in the field of big data research and applications. Complex data distributions, such as small disjuncts and overlapping classes, make traditional methods unable to easily recognize the minority class and thus, lead to low sensitivity. The misclassification costs of the minority class are usually higher than that of the majority class. To deal with imbalanced datasets, typical algorithmic-level methods either introduce cost information or simply rebalance class distribution without considering the distribution of the minority class. In this paper, we propose an optimization embedded bagging (OEBag) approach to increase the sensitivity by learning the complex distributions in the minority class more precisely. By learning these base classifiers, OEBag selectively learns the minority examples that are misclassified easily by referring to examples in out-of-bag. OEBag is implemented by using two specialized under-sampling bagging methods. Nineteen real datasets with diverse levels of classification difficulties are utilized in this paper. Experimental results demonstrate that OEBag performs significantly better in sensitivity and has a great overall performance in terms of AUC (area under ROC curve) and G-mean when compared with several state-of-the-art methods.


Submitted on March 6, 2018; Revised on April 16, 2018; Accepted on May 21, 2018
References: 22