Feature Selection Combined Feature Resolution with Attribute Reduction based on Correlation Matrix of Equivalence Classes

doi:10.23940/ijpe.19.04.p8.11311140

Abstract

Abstract: Feature selection is one of the key steps in text classification. To some extent, it can affect the performance of text classification. In this paper, we firstly proposed an optimized document frequency-based word frequency and document frequency and then presented the feature resolution based on the optimized document frequency. Meanwhile, we introduced rough set into feature selection and provided an attribute reduction algorithm based on the correlation matrix of equivalence classes. We finally put forward a feature selection method combining the presented feature resolution with the provided attribute reduction algorithm. The proposed feature selection method firstly employs the presented feature resolution to select some valuable text features and filter out useless terms to reduce the sparsely of text feature spaces, and then it uses the provided attribute reduction algorithm to eliminate redundant features. The comparative experimental results show that the proposed feature selection method has certain advantages in consumed time, macro-average, micro-average, and average classification accuracy.

Key words: feature selection, text classification, rough set, attribute reduction, equivalence class

Zhifeng Zhang and Junxia Ma. Feature Selection Combined Feature Resolution with Attribute Reduction based on Correlation Matrix of Equivalence Classes [J]. Int J Performability Eng, 2019, 15(4): 1131-1140.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1 “The 40th China Internet Network Development State Statistic Report,” (http://www.cac.gov.cn/2017-08/04/c_1121427728.htm, last accessed on September 15 2018
2 X. F.Zhang and F. X. Kong, “Research on Scientific Literature Retrieval based on Semantic Concept Analysis,” Journal of Information Theory and Practice, Vol. 39, No. 8, pp. 115-118, August 2016
3 D. T. Pele, O. E. Lazar,A. Dufour, “Information Entropy and Measures of Market Risk,” Entropy, Vol. 19, No. 5, pp. 226-244, May 2017
4 J. H. Liu, Y. J. Lin, M. L. Lin, S. X. Wu,J. Zhang, “Feature Selection based on Quality of Information,” Neurocomputing, Vol. 225, pp. 11-22, February 2017
5 B. Tang, S. Kay,H. B. He, “Toward Optimal Feature Selection in Naive Bayes for Text Categorization,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 9, pp. 2508-2521, September 2016
6 T. Basu and C. A. Murthy, “A Supervised Term Selection Technique for Effective Text Categorization,” International Journal of Machine Learning and Cybernetics, Vol. 7, No. 5, pp. 877-892, May 2016
7 R. H. W.Pinheiro, G. D. C. Cavalcanti, and I. R. sang, “Combining Dissimilarity Spaces for Text Categorization,” Information Sciences, Vol. 406, pp. 87-101, September 2017
8 T. Sun, S. Y. Qian,H. D. Zhu, “Feature Selection Method based on Category Correlation and Discernible Sets,” Journal of Computational Information Systems, Vol. 11, No. 22, pp. 9687-9698, August 2014
9 H. D. Zhu, H. C. Li, D. Wu, D. S. Huang,B. Wang, “Feature Selection Method based on Feature Distinguishability and Fractal Dimension,” Journal of Information and Computational Science, Vol. 36, No. 5, pp. 6033-6041, May 2015
10 D. Oreski, S. Oreski,B. Klicek, “Effects of Dataset Characteristics on the Performance of Feature Selection Techniques,” Applied Soft Computing, Vol. 52, No. 3, pp. 109-119, March 2017
11 A. Katrutsa and V. Strijov, “Comprehensive Study of Feature Selection Methods to Solve Multicollinearity Problem According to Evaluation Criteria,” Expert Systems with Applications, Vol. 76, No. 7, pp. 1-11, July 2017
12 Z. H. Zhang, L. Bai,Y. H. Liang, “Joint Hypergraph Learning and Sparse Regression for Feature Selection,” Pattern Recognition, Vol. 63, No. 3, pp. 291-309, March 2017
13 B. S. C.Wade, S. H. Joshi, and B. A. Gutman, “Machine Learning on High Dimensional Shape Data from Subcortical Brain Surfaces: A Comparison of Feature Selection and Classification Methods,” Pattern Recognition, Vol. 63, No. 3, pp. 731-739, March 2017.
14 A. Khaled, W. H. Guo,C. H. Yang, “Feature Selection based on Rough Sets and Minimal Attribute Reduction Algorithm,” International Journal of Hybrid Information Technology, Vol. 9, No. 8, pp. 333-346, August 2016
15 S. Q.Wang and J. M. Wei, “Feature Selection based on Measurement of Ability to Classify Subproblems,” Neurocomputing, Vol. 224, pp. 155-165, February 2017
16 I. A.Gheyas and L. S. Smith, “Feature Subset Selection in Large Dimensionality Domains,” Pattern Recognition, Vol. 43, No. 1, pp. 5-13, January 2010
17 H. D.Zhu and H. C. Li, “Feature Selection based on Mutual Information and Rough Set Theory,” Computer Engineering, Vol. 37, No. 15, pp. 181-183, August 2011
18 H. D.Zhu and H. C. Li, “Feature Selection Combined Classificatory Concentration with Improved RBF Neural Network,” China Journal of Microelectronics & Computer, Vol. 28, No. 2, pp. 145-149, February 2011
19 M. H.Nguyen and F. D. L. Torre, “Optimal Feature Selection for Support Vector Machines,” Pattern Recognition, Vol. 43, No. 3, pp. 584-591, March 2010
20 H. W. Liu, J. G. Sun,L. Liu, “Feature Selection with Dynamic Mutual Information,” Pattern Recognition, Vol. 42, No. 7, pp. 1330-1339, July 2009
21 A. Destrero, S. Mosci, C. D. Mol, A. Verri,F. Odone, “Feature Selection for High-Dimensional Data,” Computational Management Science, Vol. 6, No. 1, pp. 25-40, January 2009
22 X. Yan, “A Formal Study of Feature Selection in Text Categorization,” American Journal of Communication and Computer, Vol. 6, No. 4, pp. 32-41, July 2009
23 A. Rehman, K. Javed,H. A. Babri, “Feature Selection based on A Normalized Difference Measure for Text Classification,” Information Processing & Management, Vol. 53, No. 2, pp. 473-489, February 2017
24 S. R. Y.Leela, V. Sucharita, B. Debnath, and H. J. Kim, “Performance Evaluation of Feature Selection Methods on Large Dimensional Databases,” International Journal of Database Theory and Application, Vol. 9, No. 9, pp. 75-82, September 2016
25 J. Fan, Y. L. Jiang,Y. Liu, “Quick Attribute Reduction with Generalized Indiscernibility Models,” Information Sciences, Vol. 397, pp. 15-36, January 2017
26 X. X. Zhang, D. G. Chen,E. C. C.Tsang, “Generalized Dominance Rough Set Models for the Dominance Intuitionistic Fuzzy Information Systems,” Information Sciences, Vol. 378, pp. 1-25, January 2017
27 B. Yang and B. Q. Hu, “On Some Types of Fuzzy Covering-based Rough Sets,” Fuzzy Sets and Systems, Vol. 312, pp. 36-65, April 2017
28 Y. H. She, X. L. He,H. X. Shi, “A Multiple-Valued Logic Approach for Multigranulation Rough Set Model,” International Journal of Approximate Reasoning, Vol. 82, No. 1, pp. 270-284, January 2017
29 J. Qian, C. Y. Dang,X. D. Yue, “Attribute Reduction for Sequential Three-Way Decisions under Dynamic Granulation,” International Journal of Approximate Reasoning, Vol. 85, pp. 196-216, June 2017
30 X. Y.Zhang and D. Q. Miao, “Three-Way Attribute Reducts,” International Journal of Approximate Reasoning, Vol. 88, pp. 401-434, September 2017
31 U. Jamal, G. Rozaida,M. M. Deris, “An Empirical Analysis of Rough Set Categorical Clustering Techniques,” Plos One, Vol. 12, No. 1, pp. 1-22, January 2017
32 D. Hu, X. C. Yu,J. Y. Wang, “Statistical Inference in Rough Set Theory based on Kolmogorov-Smirnov Goodness-of-Fit Test,” IEEE Transactions on Fuzzy Systems, Vol. 25, No. 4, pp. 799-812, April 2017
33 S. T.Hu and Y. Q. He, “Rough Decision Theory and Application,” Beihang University Press, Beijing, 2006

[1]	P. Antony Seba and J. V. Bibal Benifa. Hybrid Outlier Detection Strategy and Weighted Decision Matrix Ordinal Classifier for CKD Severity Prediction [J]. Int J Performability Eng, 2023, 19(2): 144-154.
[2]	Rushali A. Deshmukh. Naive Bayes and Neural Network Techniques for Marathi Poem Classification into Nine Rasa using Feature Selection [J]. Int J Performability Eng, 2022, 18(9): 626-636.
[3]	Sandhya Alagarsamy and Visumathi James. RNN LSTM-based Deep Hybrid Learning Model for Text Classification using Machine Learning Variant XGBoost [J]. Int J Performability Eng, 2022, 18(8): 545-551.
[4]	Risu Na, Weiguo Zhang, Kaifa Jia, and Quan Zhang. Key Factors of Seal Ring Reliability based on QFD [J]. Int J Performability Eng, 2022, 18(11): 759-769.
[5]	Lakshmi Kala Pampana, and Manjula Sri Rayudu. Multi-Class Classification of Retinal Abnormality using Machine Learning Algorithms [J]. Int J Performability Eng, 2022, 18(11): 826-832.
[6]	Shenyi Qian, Yongsheng Shi, Huaiguang Wu, and Songtao Shang. Prediction of Electricity Tariff Recovery Risk based on Hybrid Feature Selection Algorithm [J]. Int J Performability Eng, 2020, 16(6): 846-854.
[7]	Wanjuan Zhang, Xiaodan Wang, Diego Cabrera, and Yun Bai. Product Quality Reliability Analysis based on Rough Bayesian Network [J]. Int J Performability Eng, 2020, 16(1): 37-47.
[8]	Haodong Zhu, Wenqi Li, and Hongchan Li. Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop [J]. Int J Performability Eng, 2019, 15(6): 1518-1527.
[9]	Hui Xu, Qianqian Cao, Heng Fu, and Hongwei Chen. Applying an Improved Elephant Herding Optimization Algorithm with Spark-based Parallelization to Feature Selection for Intrusion Detection [J]. Int J Performability Eng, 2019, 15(6): 1600-1610.
[10]	Hongchan Li and Ni Yao. Four-Layer Feature Selection Method for Scientific Literature based on Optimized K-Medoids and Apriori Algorithms [J]. Int J Performability Eng, 2019, 15(4): 1141-1150.
[11]	Shengjie Zhao and Qianyun Jiang. Short Text Classification based on Feature Extension using Information in Images [J]. Int J Performability Eng, 2019, 15(2): 667-675.
[12]	Lei La, Qimin Cao, and Ning Xu. Dimensionality Reduction by Feature Co-Occurrence based Rough Set [J]. Int J Performability Eng, 2019, 15(1): 307-316.
[13]	Songtao Shang, Yong Gan, and Huaiguang Wu. An Improved Text Sentiment Analysis Algorithm based on TF-Gini [J]. Int J Performability Eng, 2018, 14(9): 2008-2014.