Int J Performability Eng ›› 2017, Vol. 13 ›› Issue (7): 1159-1164.doi: 10.23940/ijpe.17.07.p19.11591164

• Original articles • Previous Articles     Next Articles

Text Feature Selection based on Feature Dispersion Degree and Feature Concentration Degree

Zhifeng Zhanga, Yuhua Lia, and Haodong Zhub, *   

  1. aSchool of Software, Zhengzhou University of Light Industry, Zhengzhou, Henan, 450002, P. R. China
    bSchool of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, 450002, P. R. China

Abstract: Text feature selection is one of the key steps in text classification, and thus can affect performance of text classification. In this paper, the feature dispersion degree of between-class documents is first put forward to measure the feature dispersion between categories (the greater its value, the larger the influence of the feature has). The feature concentration degree of within-class documents is then proposed to measure feature concentration in the text of a category (the greater its value, the larger the influence of feature has). Subsequently, a text feature selection method is presented, which uses both of the proposed degrees comprehensively to measure the importance of features. Experimental comparison results show that the proposed feature selection method can often get more representative feature subsets and improve performance of text classification.


Submitted on July 17, 2017; First Revised on October 7, 2017; Second Revised on October 15, 2017; Accepted on October 17, 2017
References: 11