Emotion-Driven Music Recommender System: A Novel Deep Learning Approach for Enhanced User Experience

doi:10.23940/ijpe.26.05.p3.253262

Abstract

Abstract: Emotion identification using audio is a significant difficulty in interactions between humans and computers, as emotional indicators within speech are usually complex and dependent on context. Conventional techniques encounter difficulties in precise classification owing to high-dimensional characteristics and restricted predictability. This paper is devoted to the recognition of emotions from audio with a novel approach based on a hybrid ResNet single-channel feature-tailored architecture. The elicitation of emotions by the suggested predication system is an indicator of robust classification error. The suggested methodology is a dimensionality reduction methodology based on Principal Component Analysis and takes advantage of ANOVA to provide detailed statistical validation to improve feature selection and general performance of the model. The model possesses a high accuracy of 94.50% in addition to high precision, recall and F1 scores, indicating that the model is well able to identify emotional states. When comparing our study with previous literature, it is evident that our model has superior performance, and it is more effective than both traditional machine learning approaches and other approaches of deep learning. It is a part of the speech emotion recognition method development that may be used in personalized music recommendation systems and other human-computer interaction technologies.

Key words: MFCC, ResNet, ANOVA, principal component analysis, deep learning, recommendation system

Ritika Bidlan and Sonal Chawla. Emotion-Driven Music Recommender System: A Novel Deep Learning Approach for Enhanced User Experience [J]. Int J Performability Eng, 2026, 22(5): 253-262.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Pandeya Y.R., andLee J., 2021. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimedia Tools and Applications,80(2), pp. 2887-2905.
[2] Lucia-Mulas M.J., Revuelta-Sanz P., Ruiz-Mezcua B., andGonzalez-Carrasco I., 2023. Automatic music emotion classification model for movie soundtrack subtitling based on neuroscientific premises: MJ lucia-mulas et al. Applied Intelligence,53(22), pp. 27096-27109.
[3] Gonçalves A.R., Fernandes C., Pasion R., Ferreira-Santos F., Barbosa F., andMarques-Teixeira J., 2018. Effects of age on the identification of emotions in facial expressions: A meta-analysis.PeerJ, 6, e5278.
[4] Meena G., Mohbey K.K., Indian A., Khan M.Z., andKumar S., 2024. Identifying emotions from facial expressions using a deep convolutional neural network-based approach. Multimedia Tools and Applications,83(6), pp. 15711-15732.
[5] Xu C.,2018. A novel recommendation method based on social network using matrix factorization technique. Information Processing & Management,54(3), pp. 463-474.
[6] Deshmukh P., andKale G., 2018. A survey of music recommendation system. International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT),3(3), pp. 1721-1729.
[7] Krumhansl C.L.,2002. Music: A link between cognition and emotion. Current Directions in Psychological Science,11(2), pp. 45-50.
[8] Yang Y.H., andChen H.H., 2012. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology (TIST),3(3), pp. 1-30.
[9] Mahsal Khan S., Abdul Hamid N., andMohd Rashid S., 2019. Music and its familiarity affection on audience decision making. Jurnal the Messenger,11(1A), pp. 70-80.
[10] Balakrishnan A., andDixit K., 2014. Deepplaylist: using recurrent neural networks to predict song similarity.Stanfort University, pp. 1-7.
[11] Park T., andJeong O.R., 2015. Social network based music recommendation system. Journal of Internet Computing and Services,16(6), pp. 133-141.
[12] Polignano M., Narducci F., de Gemmis M., andSemeraro G., 2021. Towards emotion-aware recommender systems: an affective coherence model based on emotion-driven behaviors.Expert Systems with Applications, 170, 114382.
[13] Cowen A.S., andKeltner D., 2017. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the National Academy of Sciences,114(38), pp. E7900-E7909.
[14] Gunes H., Schuller B., Pantic M., andCowie R., 2011. Emotion representation, analysis and synthesis in continuous space: A survey. In2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 827-834.
[15] Mary Little Flower T., Jaya T., andChristopher Ezhil Singh S., 2024. Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition. Automatika: čAsopis Za Automatiku, Mjerenje, Elektroniku, RačUnarstvo I Komunikacije,65(4), pp. 1325-1338.
[16] Mishra S.P., Warule P., andDeb S., 2024. Speech emotion recognition using mfcc-based entropy feature. Signal, Image and Video Processing,18(1), pp. 153-161.
[17] Wani T.M., Gunawan T.S., Qadri S.A.A., Kartiwi M., andAmbikairajah E., 2021. A comprehensive review of speech emotion recognition systems.IEEE Access, 9, pp. 47795-47814.
[18] Yu Z., Zhao M., Wu Y., Liu P., andChen H., 2020. Research on automatic music recommendation algorithm based on facial micro-expression recognition. In2020 39th Chinese Control Conference (CCC), pp. 7257-7263.
[19] Bhuiyan A.,2021. Web-Based Music Player for Music Performance Analysis (Master's thesis, Itä-Suomen yliopisto).
[20] Sarkar R., Choudhury S., Dutta S., Roy A., andSaha S.K., 2020. Recognition of emotion in music based on deep convolutional neural network. Multimedia Tools and Applications,79(1), pp. 765-783.
[21] Yang P.T., Kuang S.M., Wu C.C., andHsu J.L., 2020. Predicting music emotion by using convolutional neural network. InInternational Conference on Human-Computer Interaction, pp. 266-275.
[22] Delbouys R., Hennequin R., Piccoli F., Royo-Letelier J., andMoussallam M., 2018. Music mood detection based on audio and lyrics with deep neural net.Arxiv Preprint Arxiv:1809.07276.
[23] Dong Y., Yang X., Zhao X., andLi J., 2019. Bidirectional convolutional recurrent sparse network (BCRSN): an efficient model for music emotion recognition. IEEE Transactions on Multimedia,21(12), pp. 3150-3163.
[24] Coutinho E., andCangelosi A., 2011. Musical emotions: predicting second-by-second subjective feelings of emotion from low-level psychoacoustic features and physiological measurements.Emotion, 11(4), 921.
[25] Barthet M., Fazekas G., andSandler M., 2013. Music emotion recognition: from content-to context-based models. InInternational Symposium on Computer Music Modeling and Retrieval, pp. 228-252.
[26] Qian Y., Zhang Y., Ma X., Yu H., andPeng L., 2019. EARS: emotion-aware recommender system based on hybrid information fusion.Information Fusion, 46, pp. 141-146.
[27] Lu C.C., andTseng V.S., 2009. A novel method for personalized music recommendation. Expert Systems with Applications,36(6), pp. 10035-10044.
[28] Han B.J., Rho S., Jun S., andHwang E., 2010. Music emotion classification and context-based music recommendation. Multimedia Tools and Applications,47(3), pp. 433-460.
[29] Wang X., andWang Y., 2014. Improving content-based and hybrid music recommendation using deep learning. InProceedings of the 22nd ACM International Conference on Multimedia, pp. 627-636.
[30] Wang S., Xu C., Ding A.S., andTang Z., 2021. A novel emotion-aware hybrid music recommendation method using deep neural network.Electronics, 10(15), 1769.
[31] Kumble A., Medatati S., andBhatt A., 2023. Emotion-based music recommendation. InInternational Conference on Advanced Computational and Communication Paradigms, pp. 147-155.
[32] Deldjoo Y., Schedl M., andKnees P., 2024. Content-driven music recommendation: evolution, state of the art, and challenges.Computer Science Review, 51, 100618.
[33] Younis E.M., Mohsen S., Houssein E.H., andIbrahim O.A.S., 2024. Correction: machine learning for human emotion recognition: a comprehensive review. Neural Computing & Applications,36(16), pp. 8949-8949.
[34] Benzirar A., Hamidi M., andBouami M.F., 2025. Conception of speech emotion recognition methods: A review. Indonesian Journal of Electrical Engineering and Computer Science,37(3), pp. 1856-1864.
[35] Kumari H.M.L.S., Kumari H.M.N.S., andNawarathne U.M.M.P.K., 2024. Speech emotion recognition with hybrid CNN-LSTM and transformers models: evaluating the hybrid model using grad-CAM.
[36] Jin Z., andZai W., 2025. Audiovisual emotion recognition based on bi-layer LSTM and multi-head attention mechanism on RAVDESS dataset: Z. Jin, Wthe Journal of Supercomputing, 81(1), 31.
[37] Ning J., andZhang W., 2025. Speech-based emotion recognition using a hybrid RNN-CNN network.Signal, Image and Video Processing, 19(2), 124.
[38] Mares A., Diaz-Arango G., Perez-Jacome-Friscione J., Vazquez-Leal H., Hernandez-Martinez L., Huerta-Chua J., Jaramillo-Alvarado A.F., andDominguez-Chavez A., 2025. Advancing spanish speech emotion recognition: A comprehensive benchmark of pre-trained models.Applied Sciences, 15(8), 4340.
[39] Livingstone S.R., andRusso F.A., 2018. The ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in north american english.PloS One, 13(5), e0196391.
[40] Hema C., andMarquez F.P.G., 2023. Emotional speech recognition using cnn and deep learning techniques.Applied Acoustics, 211, 109492.
[41] Ammanavar Y., Patil S., Bidargaddi A.P., Deshanur P., Rathod R., andSM M., 2024. Speech emotion recognition on RAVDESS dataset using deep learning. In2024 5th International Conference for Emerging Technology (INCET), pp. 1-6.
[42] Rochlani Y.R., andRaut A.B., 2024. Machine learning approach for detection of speech emotions for RAVDESS audio dataset. In2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1-7.
[43] Liau C.S., andHong K.S., 2023. Multilingual speech emotion recognition using deep learning approach. InInternational Visual Informatics Conference, pp. 526-540.
[44] Dal Ri F.A., Ciardi F.C., andConci N., 2023. Speech emotion recognition and deep learning: an extensive validation using convolutional neural networks.IEEE Access, 11, pp. 116638-116649.
[45] Bhangale K., andKothandaraman M., 2023. Speech emotion recognition based on multiple acoustic features and deep convolutional neural network.Electronics, 12(4), 839.
[46] Rao A.S., Reddy A.P., Vulpala P., Rani K.S., andHemalatha P., 2023. Deep learning structure for emotion prediction using MFCC from native languages. International Journal of Speech Technology,26(3), pp. 721-733.
[47] Singh J., Saheer L.B., andFaust O., 2023. Speech emotion recognition using attention model.International Journal of Environmental Research and Public Health, 20(6), 5140.