Int J Performability Eng ›› 2026, Vol. 22 ›› Issue (3): 149-157.doi: 10.23940/ijpe.26.03.p4.149157

• Original article • Previous Articles     Next Articles

Federated Learning for Heterogeneous Multimodal Emotion Recognition on Edge Devices

Bhawana Sharma and Komal Saxena*   

  1. Amity Institute of Information Technology, Amity University, Noida, India
  • Submitted on ; Revised on ; Accepted on
  • About author:
    * Corresponding author.
    E-mail address: ksaxena1@amity.edu

Abstract:

The rapid proliferation of artificial intelligence in mental health applications, particularly in digital journaling and emotion tracking, has accumulated significant privacy concerns regarding the centralization of sensitive user data. Although Federated Learning offers a decentralized alternative by training models locally on edge devices, existing frameworks predominantly rely on the assumption of Independent and Identically Distributed unimodal data. This approach fails to address the inherent diversity of real-world user interactions where client inputs vary dynamically between text-only entries and multimodal content. To bridge this gap, we present MobileFedFusion, a resource-efficient and privacy-preserving Federated Learning architecture designed for heterogeneous edge environments. We propose a novel Modality Masking mechanism that enables a unified global model to aggregate gradients from diverse clients, seamlessly integrating text-only and multimodal contributors without architectural fragmentation. The system leverages a lightweight fusion of MobileBERT and MobileViT that is specifically engineered to operate within the computational constraints of mobile hardware. Experimental validation on a non-IID partition of the Reddit GoEmotions and Memotion 3.0 datasets demonstrates that our approach achieves a Global Macro F1 score of 0.68. The results indicate that the model effectively converges to state-of-the-art performance while ensuring that sensitive personal data never leaves the local device.

Key words: federated learning, multimodal emotion recognition, edge computing, privacy-preserving AI, data heterogeneity, digital phenotyping, modality masking, resource-efficient deep learning