Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (4): 1141-1150.doi: 10.23940/ijpe.19.04.p9.11411150

Previous Articles     Next Articles

Four-Layer Feature Selection Method for Scientific Literature based on Optimized K-Medoids and Apriori Algorithms

Hongchan Li* and Ni Yao   

  1. School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
  • Revised on ; Accepted on
  • Contact: E-mail address: zhuhaodong80@163.com
  • About author:Hongchan Li received her B.S. degree from Heilongjiang Bayi Agricultural University in 2007 and her M.S. degree from Sichuan University of Science and Engineering in 2010. She is currently a lecturer in the School of Computer and Communication Engineering at Zhengzhou University of Light Industry. Her major research interests include cloud computation, intelligence information processing, computing intelligence, and data mining. Ni Yao received her M.S. degree from Wuhan University in 2012. She is currently a research assistant in the School of Computer and Communication Engineering at Zhengzhou University of Light Industry. Her major research interests include cloud computation, intelligence information processing, computing intelligence, and data mining.

Abstract: With the increase in scientific literature, classifying scientific literature has become an important focus. Effectively selecting representative features from scientific literature has become a key step in scientific literature classification and influences the performance of scientific literature classification. According to the structural characteristics of scientific literature, we combine an optimized K-medoids algorithm, which firstly adopts information entropy to empower clustering objects to correct the distance function and then employs the corrected distance function to select the optimal initial clustering centres, with the Apriori algorithm to propose a four-layer feature selection method. The proposed feature selection method firstly divides scientific literature into four layers according to their structural characteristics, selects features layer by layer from the former three layers by means of the optimized K-medoids algorithm, subsequently mines the maximum frequent item sets from the fourth layer by the Apriori algorithm to act as the features of the fourth layer, and finally merges selected features of every layer and eliminates duplicate features to obtain the final feature set. Experimental results show that the proposed four-layer feature selection method achieves higher performance in scientific literature classification.

Key words: feature selection, k-medoids algorithm, information entropy, apriori algorithm