Username   Password       Forgot your password?  Forgot your username? 


Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop

Volume 15, Number 6, June 2019, pp. 1518-1527
DOI: 10.23940/ijpe.19.06.p3.15181527

Haodong Zhu, Wenqi Li, and Hongchan Li

School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China

(Submitted on March 10, 2019; Revised on April 5, 2019; Accepted on June 7, 2019)


For the micro-blog sentiment analysis problem in big data environments, the "dimension disaster" caused by the continuous increase in text information data brings great challenges to the emotional analysis of micro-blogs. To solve this problem, this paper proposes a fusion of the advantages of three feature dimensionality reduction algorithms, based on the traditional document frequency (DF), mutual information (MI), and chi-square test (CHI). Firstly, the document frequency factor is added to the mutual information (MI) algorithm to solve the problem of low-frequency word defects. Then, the standard score factor is added to the chi-square test (CHI) algorithm to solve the negative correlation problem. Finally, the average value is calculated and the advantages of the three algorithms are fused. An improved Proposed DF-MI-CHI fusion algorithm is proposed. The simulation results show that after using this algorithm to process the micro-blog data, the accuracy of sentiment analysis is improved and maintained at 95%. The recall rate is more than 90%, and the F value is maintained between 92% and 94%. In the % interval, it is higher than other improved algorithms and tends to be stable, which indicates that the algorithm can effectively improve the accuracy and efficiency of micro-blog emotional sentiment analysis when dealing with massive micro-blog text data.


References: 20

  1. X. N. Kong, “A Summary of Text Acquisition and Pre-Processing in Chinese Micro-Blog,” Software Guide, Vol. 16, No. 2, pp. 186-189, 2018
  2. X. B. Tang and Z. Q. Wang, “Research on Micro-Blog Topic Tracking Model based on Wikipedia Semantic Extension,” Information Science, Vol. 35, No. 2, pp. 80-85, 2017
  3. Z. G. Wang, “The Realization Method of Text Classification Processing in Micro-Blog in the Process of Network Public Opinion Monitoring,” Library and Information Guide, Vol. 12, No. 1, pp. 129-133, 2016
  4. Y. Yang, X. U. Bing, and M. Y. Yang, “An Emotional Classification Method based on Joint Deep Learning Model,” Journal of Shandong University, Vol. 12, No. 4, pp. 652-662, 2017
  5. X. M. Ye and J. C. Xia,“Improvement of TF-IDF Algorithm for Text Categorization,” Computer Engineering and Application, Vol. 33, No. 5, pp. 65-70, 2018
  6. Y. C. Mao and H. Ping, “Feature Selection based on Mutual Information of Maximum Joint Conditions,” Journal of Computer Applications, Vol. 41, No. 20, pp. 172-178, 2018
  7. Y. F. Qiu and W. Wang, “CHI Feature Selection Method based on Variance,” Application Research of Computers, Vol. 29, No. 4, pp. 1304-1306, 2012
  8. Z. G. Jin and B. H. Hu, “Multidimensional Feature Emotional Analysis of Micro-Blog based on Deep Learning,” Journal of Central South University, Vol. 149, No. 5, pp. 1135-1140, 2018
  9. J. Lv, X. Wang, and F. Huang, “TREST: A Hadoop based Distributed Mobile Trajectory Retrieval System,” in Proceedings of IEEE International Conference on Data Science in Cyberspace, pp. 341-346, 2016
  10. P. Y, Zou, J. H. Yang, and X. M. Li, “Supervised Topic Models with Weighted Words: Multi-Label Document Classification,” Frontiers of Information Technology & Electronic Engineering, Vol. 19, No. 4, pp. 513-523, 2018
  11. B. L. Li, “Using Class based Document Frequency to Select Features in Text Classification,” Abstract of Big Data Technology and Applications, Vol. 25, No. 14, pp. 698-705, 2015
  12. Z. L. Zhen, H. J. Wang, and L. X. Han, “Categorical Document Frequency based Feature Selection for Text Categorization,” Computer Engineering and Management Sciences, Vol. 110, No. 4, pp. 526-531, 2011
  13. L. F. Wang and Y. Xu, “Synergy and Redundancy in a Signaling Cascade with Different Feedback Mechanisms,” Communications in Theoretical Physics, Vol. 70, No. 10, pp. 485-495, 2018
  14. C. Zheng, Q. N. Xu, and J. P. Zhang, “Research on Recommendation System based on Mutual Information,” Microelectronics & Computer, Vol. 35, No. 12, pp. 76-79, 2018
  15. W. Liang and Y. Su, “Research on Text Classification Method based on Improved Mutual Information Function,” Bulletin of Science and Technology, Vol. 34, No. 11, pp. 188-191, 2018
  16. M. Y. Huang and X. B. Zhang, ”Emotional Text Feature Selection based on CHI and Information Gain,” Journal of Xi'an Polytechnic University, Vol. 12, No. 6, pp. 713-717, 2018
  17. C. X. Song, X. H. Chen, and Q. Niu, “An Improved Feature Selection Method based on CHI in Text Categorization,” Microelectronics & Computer, Vol. 35, No. 9, pp. 74-78, 2018
  18. C. J. Fan and Y. T. Wang, “An Improved CHI Text Feature Selection Method,” Computer and Modernization, Vol. 25, No. 11, pp. 7-11, 2016
  19. K. Chen, S. J. Li, and B. Xie, “Emotional Analysis of Micro-Blog based on Semi-Supervised Learning,” Computer and Digital Engineering, Vol. 46, No. 9, pp. 1850-1855, 2018
  20. X. Kong and Q. Lin, “Summary of Research on Subjective Text Emotional Classification,” Information Technology, Vol. 42, No. 8, pp. 126-130, 2018


Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

This site uses encryption for transmitting your passwords.