Int J Performability Eng ›› 2017, Vol. 13 ›› Issue (6): 945-955.doi: 10.23940/ijpe.17.06.p15.945955

• Original articles • Previous Articles     Next Articles

A Novel Ensemble Classification for Data Streams with Class Imbalance and Concept Drift

Yange Suna, b, Zhihai Wanga, *, Hongtao Lia, and Yao Lia   

  1. aSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
    bSchool of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China

Abstract: The processing of streaming data implies new requirements concerning restrictive processing time, limited amount of memory and one scan of incoming instances. One of the biggest challenges facing data stream learning is to deal with concept drift, i.e., the underlying distribution of the data may be evolving over time. Most of the approaches in the literature are under the hypothesis that the distribution of classes is balance. Unfortunately, the class imbalance issue is common in the real-world. And the imbalance issue further increases the difficulty of solving the concept drift problem. Motivated by this challenge, a novel ensemble classification for mining imbalanced streaming data is proposed to overcome both issues simultaneously. The algorithm utilizes the under-sampling and over-sampling techniques to balance the positive and negative instances. Moreover, dynamic weighting strategy was adopted to deal with concept drift. The experimental results on synthetic and real datasets demonstrate that our proposed method performs better than competitive algorithms, especially in situations where there exist concept drift and class imbalance.

Submitted on July 25, 2017; Revised on August 30, 2017; Accepted on September 15, 2017(This paper was presented at the Third International Symposium on System and Software Reliability.
References: 26