Int J Performability Eng ›› 2022, Vol. 18 ›› Issue (9): 605-612.doi: 10.23940/ijpe.22.09.p1.605612

    Next Articles

Text Independent Data-Level Fusion Network for Multimodal Sentiment Analysis

Sachin Aggarwal and Smriti Sehgal*   

  1. Department of Computer Science and Engineering, Amity University, Noida, 201303, India
  • Submitted on ; Revised on ; Accepted on
  • Contact: *E-mail address:

Abstract: To understand human intentions behind a text accurately and as well as to reduce the tensions and misunderstandings caused by sarcasm and ambiguity, multimodal signals that include visual and audio signals should be utilized. This work is focused on creating a robust model to predict emotion depicted in a video by analyzing the visual and acoustic modalities. This paper has used a data level fusion of the features extracted by different feature extractors used for different modalities of the multimodal dataset. There are two reasons to use data-level fusion as it helps to reduce the computation time which is a major issue in most model-level fusion networks. Also, it will allow us to use the same model for all modalities as the data is already fused and there is no need to create a different model for each modality and then fuse them. Also, to check the effectiveness of the proposal, this work uses four different algorithms, which are BP Neural Network, Deep Neural Network RNN, and XGboost, to see the improvement in the accuracy and the other effectiveness evaluation parameters used in this paper. The datasets which are used for this study are CMU-MOSEI and CMU-MOSI. These datasets were labeled with the help of human agents where a video is classified to be depicting negative emotion or positive emotions. Some of the performance evaluation parameters used in this work are Cross-validation, Error rate, F1 score, Recall, and Precision. After analyzing the results, it was observed that the error rate of each improved model vs their base model was reduced by 8-11% with this approach.

Key words: F1 score, BP neural network, deep neural network, classification, multimodal