Username   Password       Forgot your password?  Forgot your username? 

 

Feature Selection Combined Feature Resolution with Attribute Reduction based on Correlation Matrix of Equivalence Classes

Volume 15, Number 4, April 2019, pp. 1131-1140
DOI: 10.23940/ijpe.19.04.p8.11311140

Zhifeng Zhang and Junxia Ma

School of Software, Zhengzhou University of Light Industry, Zhengzhou, 450002, China


(Submitted on November 8, 2018; Revised on December 10, 2018; Accepted on January 12, 2019)

Abstract:

Feature selection is one of the key steps in text classification. To some extent, it can affect the performance of text classification. In this paper, we firstly proposed an optimized document frequency-based word frequency and document frequency and then presented the feature resolution based on the optimized document frequency. Meanwhile, we introduced rough set into feature selection and provided an attribute reduction algorithm based on the correlation matrix of equivalence classes. We finally put forward a feature selection method combining the presented feature resolution with the provided attribute reduction algorithm. The proposed feature selection method firstly employs the presented feature resolution to select some valuable text features and filter out useless terms to reduce the sparsely of text feature spaces, and then it uses the provided attribute reduction algorithm to eliminate redundant features. The comparative experimental results show that the proposed feature selection method has certain advantages in consumed time, macro-average, micro-average, and average classification accuracy.

References: 33

    1. “The 40th China Internet Network Development State Statistic Report,” (http://www.cac.gov.cn/2017-08/04/c_1121427728.htm, last accessed on September 15 2018)
    2. X. F. Zhang and F. X. Kong, “Research on Scientific Literature Retrieval based on Semantic Concept Analysis,” Journal of Information Theory and Practice, Vol. 39, No. 8, pp. 115-118, August 2016
    3. D. T. Pele, O. E. Lazar, and A. Dufour, “Information Entropy and Measures of Market Risk,” Entropy, Vol. 19, No. 5, pp. 226-244, May 2017
    4. J. H. Liu, Y. J. Lin, M. L. Lin, S. X. Wu, and J. Zhang, “Feature Selection based on Quality of Information,” Neurocomputing, Vol. 225, pp. 11-22, February 2017
    5. B. Tang, S. Kay, and H. B. He, “Toward Optimal Feature Selection in Naive Bayes for Text Categorization,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 9, pp. 2508-2521, September 2016
    6. T. Basu and C. A. Murthy, “A Supervised Term Selection Technique for Effective Text Categorization,” International Journal of Machine Learning and Cybernetics, Vol. 7, No. 5, pp. 877-892, May 2016
    7. R. H. W. Pinheiro, G. D. C. Cavalcanti, and I. R. sang, “Combining Dissimilarity Spaces for Text Categorization,” Information Sciences, Vol. 406, pp. 87-101, September 2017
    8. T. Sun, S. Y. Qian, and H. D. Zhu, “Feature Selection Method based on Category Correlation and Discernible Sets,” Journal of Computational Information Systems, Vol. 11, No. 22, pp. 9687-9698, August 2014
    9. H. D. Zhu, H. C. Li, D. Wu, D. S. Huang, and B. Wang, “Feature Selection Method based on Feature Distinguishability and Fractal Dimension,” Journal of Information and Computational Science, Vol. 36, No. 5, pp. 6033-6041, May 2015
    10. D. Oreski, S. Oreski, and B. Klicek, “Effects of Dataset Characteristics on the Performance of Feature Selection Techniques,” Applied Soft Computing, Vol. 52, No. 3, pp. 109-119, March 2017
    11. A. Katrutsa and V. Strijov, “Comprehensive Study of Feature Selection Methods to Solve Multicollinearity Problem According to Evaluation Criteria,” Expert Systems with Applications, Vol. 76, No. 7, pp. 1-11, July 2017
    12. Z. H. Zhang, L. Bai, and Y. H. Liang, “Joint Hypergraph Learning and Sparse Regression for Feature Selection,” Pattern Recognition, Vol. 63, No. 3, pp. 291-309, March 2017
    13. B. S. C. Wade, S. H. Joshi, and B. A. Gutman, “Machine Learning on High Dimensional Shape Data from Subcortical Brain Surfaces: A Comparison of Feature Selection and Classification Methods,” Pattern Recognition, Vol. 63, No. 3, pp. 731-739, March 2017.
    14. A. Khaled, W. H. Guo, and C. H. Yang, “Feature Selection based on Rough Sets and Minimal Attribute Reduction Algorithm,” International Journal of Hybrid Information Technology, Vol. 9, No. 8, pp. 333-346, August 2016
    15. S. Q. Wang and J. M. Wei, “Feature Selection based on Measurement of Ability to Classify Subproblems,” Neurocomputing, Vol. 224, pp. 155-165, February 2017
    16. I. A. Gheyas and L. S. Smith, “Feature Subset Selection in Large Dimensionality Domains,” Pattern Recognition, Vol. 43, No. 1, pp. 5-13, January 2010
    17. H. D. Zhu and H. C. Li, “Feature Selection based on Mutual Information and Rough Set Theory,” Computer Engineering, Vol. 37, No. 15, pp. 181-183, August 2011
    18. H. D. Zhu and H. C. Li, “Feature Selection Combined Classificatory Concentration with Improved RBF Neural Network,” China Journal of Microelectronics & Computer, Vol. 28, No. 2, pp. 145-149, February 2011
    19. M. H. Nguyen and F. D. L. Torre, “Optimal Feature Selection for Support Vector Machines,” Pattern Recognition, Vol. 43, No. 3, pp. 584-591, March 2010
    20. H. W. Liu, J. G. Sun, and L. Liu, “Feature Selection with Dynamic Mutual Information,” Pattern Recognition, Vol. 42, No. 7, pp. 1330-1339, July 2009
    21. A. Destrero, S. Mosci, C. D. Mol, A. Verri, and F. Odone, “Feature Selection for High-Dimensional Data,” Computational Management Science, Vol. 6, No. 1, pp. 25-40, January 2009
    22. X. Yan, “A Formal Study of Feature Selection in Text Categorization,” American Journal of Communication and Computer, Vol. 6, No. 4, pp. 32-41, July 2009
    23. A. Rehman, K. Javed, and H. A. Babri, “Feature Selection based on A Normalized Difference Measure for Text Classification,” Information Processing & Management, Vol. 53, No. 2, pp. 473-489, February 2017
    24. S. R. Y. Leela, V. Sucharita, B. Debnath, and H. J. Kim, “Performance Evaluation of Feature Selection Methods on Large Dimensional Databases,” International Journal of Database Theory and Application, Vol. 9, No. 9, pp. 75-82, September 2016
    25. J. Fan, Y. L. Jiang, and Y. Liu, “Quick Attribute Reduction with Generalized Indiscernibility Models,” Information Sciences, Vol. 397, pp. 15-36, January 2017
    26. X. X. Zhang, D. G. Chen, and E. C. C. Tsang, “Generalized Dominance Rough Set Models for the Dominance Intuitionistic Fuzzy Information Systems,” Information Sciences, Vol. 378, pp. 1-25, January 2017
    27. B. Yang and B. Q. Hu, “On Some Types of Fuzzy Covering-based Rough Sets,” Fuzzy Sets and Systems, Vol. 312, pp. 36-65, April 2017
    28. Y. H. She, X. L. He, and H. X. Shi, “A Multiple-Valued Logic Approach for Multigranulation Rough Set Model,” International Journal of Approximate Reasoning, Vol. 82, No. 1, pp. 270-284, January 2017
    29. J. Qian, C. Y. Dang, and X. D. Yue, “Attribute Reduction for Sequential Three-Way Decisions under Dynamic Granulation,” International Journal of Approximate Reasoning, Vol. 85, pp. 196-216, June 2017
    30. X. Y. Zhang and D. Q. Miao, “Three-Way Attribute Reducts,” International Journal of Approximate Reasoning, Vol. 88, pp. 401-434, September 2017
    31. U. Jamal, G. Rozaida, and M. M. Deris, “An Empirical Analysis of Rough Set Categorical Clustering Techniques,” Plos One, Vol. 12, No. 1, pp. 1-22, January 2017
    32. D. Hu, X. C. Yu, and J. Y. Wang,Statistical Inference in Rough Set Theory based on Kolmogorov–Smirnov Goodness-of-Fit Test,” IEEE Transactions on Fuzzy Systems, Vol. 25, No. 4, pp. 799-812, April 2017
    33. S. T. Hu and Y. Q. He, “Rough Decision Theory and Application,” Beihang University Press, Beijing, 2006

     

     

    Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

     
    This site uses encryption for transmitting your passwords. ratmilwebsolutions.com