Int J Performability Eng ›› 2017, Vol. 13 ›› Issue (5): 742-753.doi: 10.23940/ijpe.17.05.p17.742753

• Original articles • Previous Articles     Next Articles

A Novel Information Theory-Based Ensemble Feature Selection Framework for High-Dimensional Microarray Data

Jie Caia, Jiawei Luoa, *, Cheng Liangb, and ShengYanga   

  1. aCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
    bSchool of Information Science and Engineering, Shandong Normal University, Jinan, 250358, Shangdong, China

Abstract: Ensemble feature selection is one of the ensemble learning methods, where each classifier is trained or built by feature selection result. Ensemble feature selection is an effective way for dealing with high dimension and small sample data, such as microarray data. However, ensemble feature selection should achieve more accurate and stable classification performance. In this paper, we present a novel diversity measure based on information theory called Sum of Minimal Information Distance (SMID), which maximizes the relevance between feature subsets and class label as well as the diversity between feature subsets. Moreover, a novel ensemble feature selection framework satisfying this criterion is proposed. In this framework, features that have more mutual information with class label and more diversity between each other are retained. Different feature subsets are used to train base classifiers after being obtained by incremental search method, and then these classifiers are aggregated into a consensus classifier by majority voting. Comparing with three representative feature selection methods and five ensemble learning methods on ten microarray datasets, the experiment results show that the proposed method achieves better performance than the other methods in terms of the classification accuracy.

Submitted on March 8, 2017; Revised on July 1, 2017; Accepted on August 27, 2017
References: 30