Username   Password       Forgot your password?  Forgot your username? 


Four-Layer Feature Selection Method for Scientific Literature based on Optimized K-Medoids and Apriori Algorithms

Volume 15, Number 4, April 2019, pp. 1141-1150
DOI: 10.23940/ijpe.19.04.p9.11411150

 Hongchan Li and Ni Yao

School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China

(Submitted on November 11, 2018; Revised on December 14, 2018; Accepted on January 13, 2019)


With the increase in scientific literature, classifying scientific literature has become an important focus. Effectively selecting representative features from scientific literature has become a key step in scientific literature classification and influences the performance of scientific literature classification. According to the structural characteristics of scientific literature, we combine an optimized K-medoids algorithm, which firstly adopts information entropy to empower clustering objects to correct the distance function and then employs the corrected distance function to select the optimal initial clustering centres, with the Apriori algorithm to propose a four-layer feature selection method. The proposed feature selection method firstly divides scientific literature into four layers according to their structural characteristics, selects features layer by layer from the former three layers by means of the optimized K-medoids algorithm, subsequently mines the maximum frequent item sets from the fourth layer by the Apriori algorithm to act as the features of the fourth layer, and finally merges selected features of every layer and eliminates duplicate features to obtain the final feature set. Experimental results show that the proposed four-layer feature selection method achieves higher performance in scientific literature classification.

References: 19

    1. X. F. Zhang and F. X. Kong, “Research on Scientific Literature Retrieval based on Semantic Concept Analysis,” Journal of Information Theory and Practice, Vol. 39, No. 8, pp. 115-118, August 2016
    2. Q. Li, W. J. Yang, and L. Tan, “Application Research on Constructing a Vector Space Model of Classification based on Thesaurus for the Judgment of Relevance of Chinese Literatures,” Library Journal, Vol. 35, No. 12, pp. 32-40, December 2016
    3. T. Sun, S. Y. Qian, and H. D. Zhu, “Feature Selection Method based on Category Correlation and Discernible Sets,” Journal of Computational Information Systems, Vol. 11, No. 22, pp. 9687-9698, August 2014
    4. M. H. Nguyen and F. D. L. Torre, “Optimal Feature Selection for Support Vector Machines,” Pattern Recognition, Vol. 43, No. 3, pp. 584-591, March 2010
    5. H. D. Zhu, H. C. Li, D. Wu, D. S. Huang, and B. Wang, “Feature Selection Method based on Feature Distinguishability and Fractal Dimension,” Journal of Information and Computational Science, Vol. 36, No. 5, pp. 6033-6041, May 2015
    6. A. Destrero, S. Mosci, C. D. Mol, A. Verri, and F. Odone, “Feature Selection for High-Dimensional Data,” Computational Management Science, Vol. 6, No. 1, pp. 25-40, January 2009
    7. S. Q. Wang and J. M. Wei, “Feature Selection based on Measurement of Ability to Classify Subproblems,” Neurocomputing, Vol. 224, pp. 155-165, February 2017
    8. S. R. Y. Leela, V. Sucharita, B. Debnath, and H. J. Kim, “Performance Evaluation of Feature Selection Methods on Large Dimensional Databases,” International Journal of Database Theory and Application, Vol. 9, No. 9, pp. 75-82, September 2016
    9. A. Rehman, K. Javed, and H. A. Babri, “Feature Selection based on A Normalized Difference Measure for Text Classification,” Information Processing & Management, Vol. 53, No. 2, pp. 473-489, February 2017
    10. D. Lacko, T. Huysmans, J. Vleugels, G. D. Bruyne, M. M. V. Hulle, J. Sijbers, and S. Verwulgen, “Product Sizing with 3D Anthropometry and K-Medoids Clustering,” Computer Aided Design, Vol. 91, pp. 60-74, October 2017
    11. Y. X. Shen, “Benefits Transfer Research of Public Companies Shareholders based on Apriori Algorithm,” Journal of Discrete Mathematical Sciences & Cryptography, Vol. 20, No. 4, pp. 861-872, April 2017
    12. R. Mamoon and K. Lovepreet, “Finding Bugs in Android Application using Genetic Algorithm and Apriori Algorithm,” Indian Journal of Science and Technology, Vol. 9, No. 23, pp. 1-5, September 2016
    13. D. T. Pele, O. E. Lazar, and A. Dufour, “Information Entropy and Measures of Market Risk,” Entropy, Vol. 19, No. 5, pp. 226-244, May 2017
    14. D. P. P. Mesquita, J. P. P. Gomes, A.H. S. Junior, and J. S. Nobreb, “Euclidean Distance Estimation in Incomplete Datasets,” Neurocomputing, Vol. 248, pp. 11-18, July 2017
    15. J. H. Liu, Y. J. Lin, M. L. Lin, S. X. Wu, and J. Zhang, “Feature Selection based on Quality of Information,” Neurocomputing, Vol. 225, pp. 11-22, February 2017
    16. A. Rai and S. H. Upadhyay, “Bearing Performance Degradation Assessment based on A Combination of Empirical Mode Decomposition and K-Medoids Clustering,” Mechanical Systems and Signal, Vol. 93, pp. 16-29, September 2017
    17. B. Tang, S. Kay, and H. B. He, “Toward Optimal Feature Selection in Naive Bayes for Text Categorization,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 9, pp. 2508-2521, September 2016
    18. T. Basu and C. A. Murthy, “A Supervised Term Selection Technique for Effective Text Categorization,” International Journal of Machine Learning and Cybernetics, Vol. 7, No. 5, pp. 877-892, May 2016
    19. R. H. W. Pinheiro, G. D. C. Cavalcanti, and I. R. Sang, “Combining Dissimilarity Spaces for Text Categorization,” Information Sciences, Vol. 406, pp. 87-101, September 2017


    Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

    This site uses encryption for transmitting your passwords.