Federated Learning for Heterogeneous Multimodal Emotion Recognition on Edge Devices

doi:10.23940/ijpe.26.03.p4.149157

Abstract

Abstract:

The rapid proliferation of artificial intelligence in mental health applications, particularly in digital journaling and emotion tracking, has accumulated significant privacy concerns regarding the centralization of sensitive user data. Although Federated Learning offers a decentralized alternative by training models locally on edge devices, existing frameworks predominantly rely on the assumption of Independent and Identically Distributed unimodal data. This approach fails to address the inherent diversity of real-world user interactions where client inputs vary dynamically between text-only entries and multimodal content. To bridge this gap, we present MobileFedFusion, a resource-efficient and privacy-preserving Federated Learning architecture designed for heterogeneous edge environments. We propose a novel Modality Masking mechanism that enables a unified global model to aggregate gradients from diverse clients, seamlessly integrating text-only and multimodal contributors without architectural fragmentation. The system leverages a lightweight fusion of MobileBERT and MobileViT that is specifically engineered to operate within the computational constraints of mobile hardware. Experimental validation on a non-IID partition of the Reddit GoEmotions and Memotion 3.0 datasets demonstrates that our approach achieves a Global Macro F1 score of 0.68. The results indicate that the model effectively converges to state-of-the-art performance while ensuring that sensitive personal data never leaves the local device.

Key words: federated learning, multimodal emotion recognition, edge computing, privacy-preserving AI, data heterogeneity, digital phenotyping, modality masking, resource-efficient deep learning

Bhawana Sharma and Komal Saxena. Federated Learning for Heterogeneous Multimodal Emotion Recognition on Edge Devices [J]. Int J Performability Eng, 2026, 22(3): 149-157.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References 19

[1]	Torous J., Myrick K.J., Rauseo-Ricupero N., and Firth J., 2020. Digital mental health and COVID-19: using technology today to accelerate the curve on access and quality tomorrow. JMIR Mental Health, 7(3), e18848.
[2]	Insel T.R., 2017. Digital phenotyping: technology for a new science of behavior. Jama, 318(13), pp. 1215-1216.
[3]	Huckvale K., Torous J., and Larsen M.E., 2019. Assessment of the data sharing and privacy practices of smartphone apps for depression and smoking cessation. JAMA Network Open, 2(4), pp. e192542-e192542.
[4]	Kostkova P., Brewer H., De Lusignan S., Fottrell E., Goldacre B., Hart G., Koczan P., Knight P., Marsolier C., McKendry R.A., and Ross E., 2016. Who owns the data? open data for healthcare. Frontiers in Public Health, 4, 7.
[5]	McMahan B., Moore E., Ramage D., Hampson S., and y Arcas B.A., 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273-1282.
[6]	Yang Q., Liu Y., Chen T., and Tong Y., 2019. Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), pp. 1-19.
[7]	Radford A., Kim J.W., Hallacy C., Ramesh A., Goh G., Agarwal S., Sastry G., Askell A., Mishkin P., Clark J., and Krueger G., 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748-8763.
[8]	Li T., Sahu A.K., Talwalkar A., and Smith V., 2020. Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), pp. 50-60.
[9]	Kairouz P., McMahan H.B., Avent B., Bellet A., Bennis M., Bhagoji A.N., Bonawitz K., Charles Z., Cormode G., Cummings R., and D’Oliveira R.G., 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1-2), pp. 1-210.
[10]	Zhao Y., Li M., Lai L., Suda N., Civin D., and Chandra V., 2018. Federated learning with non-iid data. Arxiv Preprint Arxiv:1806.00582.
[11]	Sun Z., Yu H., Song X., Liu R., Yang Y., and Zhou D., 2020. Mobilebert: a compact task-agnostic Bert for resource-limited devices. Arxiv Preprint Arxiv:2004.02984.
[12]	Mehta S., and Rastegari M., 2021. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. Arxiv Preprint Arxiv:2110.02178.
[13]	Demszky D., Movshovitz-Attias D., Ko J., Cowen A., Nemade G., and Ravi S., 2020. GoEmotions: A dataset of fine-grained emotions. Arxiv Preprint Arxiv:2005.00547.
[14]	Mishra S., Suryavardan S., Patwa P., Chakraborty M., Rani A., Reganti A., Chadha A., Das A., Sheth A., Chinnakotla M., and Ekbal A., 2023. Memotion 3: dataset on sentiment and emotion analysis of codemixed hindi-english memes. Arxiv Preprint Arxiv:2303.09892.
[15]	Akiba T., Sano S., Yanase T., Ohta T., and Koyama M., 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623-2631.
[16]	Konečný J., McMahan H.B., Yu F.X., Richtárik P., Suresh A.T., and Bacon D., 2016. Federated learning: strategies for improving communication efficiency. Arxiv Preprint Arxiv:1610.05492.
[17]	Bonawitz K., Eichner H., Grieskamp W., Huba D., Ingerman A., Ivanov V., Kiddon C., Konečný J., Mazzocchi S., McMahan B., and Van Overveldt T., 2019. Towards federated learning at scale: system design. Proceedings of Machine Learning and Systems, 1, pp. 374-388.
[18]	Li T., Sahu A.K., Zaheer M., Sanjabi M., Talwalkar A., and Smith V., 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2, pp. 429-450.
[19]	Baltruaitis T., Ahuja C., and Morency L.P., 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), pp. 423-443.