Int J Performability Eng ›› 2020, Vol. 16 ›› Issue (6): 968-978.doi: 10.23940/ijpe.20.06.p15.968978

Previous Articles     Next Articles

Evaluation of Text Semantic Features using Latent Dirichlet Allocation Model

Chunjie Zhoua, Nao Lib,c,*, Chi Zhangb, and Xiaoyu Yanga   

  1. a Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, 100101, China;
    b Collaborative Innovation Center of eTourism, Tourism College of Beijing Union University, Beijing, 100101, China;
    c Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: lytlinao@buu.edu.cn
  • About author:Chunjie Zhou is a master's student in the Beijing Key Laboratory of Information Service Engineering at Beijing Union University. Her research interests include natural language processing and text analysis.
    Nao Li is a professor at Tourism College of Beijing Union University. Her research interests are big data analysis and mining, computer simulation, and technology applications in tourism.
    Chi Zhang is the Deputy Dean and an associate professor at Tourism College of Beijing Union University. His research interests include tourism economics and tourism information.
    Xiaoyu Yang is a master's student in the Beijing Key Laboratory of Information Service Engineering at Beijing Union University. His research interests include natural language processing and text analysis.
  • Supported by:
    This study was jointly supported by the Funding Project for Postgraduates in Beijing Union University, the Premium Funding Project for Academic Human Resources Development in Beijing Union University, and the State Key Laboratory of Resources and Environmental Information System.

Abstract: Obtaining useful information from mass data on the Internet has been a hot topic in information process research in recent years. For un-structural data like online reviews based on natural languages, it becomes more challenging. Online consumer reviews reflect customers' real experience and opinions on products or services. However, there are short of methods or tools to help potential customers find high-quality and helpful reviews from a large number of reviews. This paper applied the concept and idea of creative computing to solve this problem. Tf-idf, as a traditional method to extract text features, measures the importance of words through word frequency and ignores the semantic information in the text data, while the topic model makes up for this deficiency. This paper proposed to use the vector of reviews allocated by LDA topic model to represent text semantic features. Basing on semantic features of reviews, it calculated cosine similarity between the thumb up reviews and other reviews and thus obtain the simulated helpfulness scores of all reviews. Then, a linear regression was designed to obtain two features, i.e., the syntax and semantic features, and determine the simulated helpfulness scores. The proposed method was validated by collected online tourism reviews of Forbidden City and Mount Huang on three Chinese representative online tourism platforms. The results showed that the proposed method can effectively obtain and thus compare the helpfulness of online reviews in a creative way.

Key words: online reviews, creative computing, semantic feature, latent Dirichlet allocation (LDA), information architecture