Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (1): 307-316.doi: 10.23940/ijpe.19.01.p31.307316

Previous Articles     Next Articles

Dimensionality Reduction by Feature Co-Occurrence based Rough Set

Lei Laa*(), Qimin Caob, and Ning Xub   

  1. a School of Information Technology & Management, University of International Business and Economics, Beijing, 100029,China
    b Library,China University of Political Science and Law, Beijing, 100088, China
  • Revised on ; Accepted on
  • Contact: La Lei E-mail:lalei1984@aliyun.com
  • About author:Lei La is an associate professor of School of Information Technology & Management, University of International Business and Economics. His research interests include semi-supervised machine learning and network data analysis.|Qimin Cao is an associate research fellow of Library, China University of Political Science and Law. Her research interests include system engineering and information analysis.|Ning Xu is an undergraduate student of School of Information Technology & Management, University of International Business and Economics. His research interests include machine learning algorithm development and coding.

Abstract:

Feature selection is the key issue of unstructured data mining related fields. This paper presents a dimensionality reduction method which uses a rough set as the feature selection tool. Different from previous rough set based classification algorithm, it takes feature co-occurrence into account when make attribution reduction to get a more accurate feature subset. The novel method called Feature Co-occurrence Quick Reduction algorithm is in this article. Experimental results show it has a high efficiency in dimensionality reduction—time consumption by approximately 23% less than traditional rough set based dimensionality reduction methods. Moreover, classification based on the feature set selected by Feature Co-occurrence Quick Reduction algorithm is more precise. The proposed algorithm is helpful to us for refining knowledge from massive unstructured data.

Key words: dimensionality reduction, feature co-occurrence, rough set, time consumption, unstructured data