Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (4): 1131-1140.doi: 10.23940/ijpe.19.04.p8.11311140

Previous Articles     Next Articles

Feature Selection Combined Feature Resolution with Attribute Reduction based on Correlation Matrix of Equivalence Classes

Zhifeng Zhang* and Junxia Ma   

  1. School of Software, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
  • Revised on ; Accepted on
  • Contact: E-mail address: zhuhaodong80@163.com
  • About author:Zhifeng Zhang received his B.S. degree from Xi’an University of Electronic Science and Technology in 2001 and his M.S. degree from Xi’an University of Technology in 2006. He is currently an associate professor in the School of Software at Zhengzhou University of Light Industry. His major research interests include cloud computation, intelligence information processing, and data mining. Junxia Ma received her B.S. degree from Henan Normal University in 1996 and her M.S. degree from Zhengzhou University in 2007. She is currently a lecturer in the School of Software at Zhengzhou University of Light Industry. Her major research interests include knowledge engineering and data mining.

Abstract: Feature selection is one of the key steps in text classification. To some extent, it can affect the performance of text classification. In this paper, we firstly proposed an optimized document frequency-based word frequency and document frequency and then presented the feature resolution based on the optimized document frequency. Meanwhile, we introduced rough set into feature selection and provided an attribute reduction algorithm based on the correlation matrix of equivalence classes. We finally put forward a feature selection method combining the presented feature resolution with the provided attribute reduction algorithm. The proposed feature selection method firstly employs the presented feature resolution to select some valuable text features and filter out useless terms to reduce the sparsely of text feature spaces, and then it uses the provided attribute reduction algorithm to eliminate redundant features. The comparative experimental results show that the proposed feature selection method has certain advantages in consumed time, macro-average, micro-average, and average classification accuracy.

Key words: feature selection, text classification, rough set, attribute reduction, equivalence class