Int J Performability Eng ›› 2024, Vol. 20 ›› Issue (10): 610-620.doi: 10.23940/ijpe.24.10.p3.610620

• Original article • Previous Articles     Next Articles

A Hybrid Ensemble Learning Approach for Detecting Bots on Twitter

Hannousse Abdelhakimab*() and Talha Ziedb   

  1. a MIS Laboratory, Université 8 Mai 1945 Guelma, Guelma, Algeria
    b Laboratoire de Vision et d’Intelligence Artificielle (LAVIA), Larbi Tebessi University, Tebessa, Algeria
  • Submitted on ; Revised on ; Accepted on
  • Contact: Hannousse Abdelhakim E-mail:hannousse.abdelhakim@univ-guelma.dz
  • About author:

    E-mail address: hannousse.abdelhakim@univ-guelma.dz

Abstract:

The proliferation of social media platforms has revolutionized communication, but it has also given rise to social media bots that spread misinformation, manipulate public opinion, and compromise online discourse integrity. This study addresses the critical issue of detecting social media bots on Twitter, where traditional detection methods often fall short due to the evolving nature of these bots and the vast amount of data involved. To overcome these challenges, this research proposes a hybrid ensemble model that combines feature engineering and natural language processing techniques. The ensemble model is a meta-model that predicts the nature of a Twitter account based on the outputs of two base models. The first model uses a combination of engineered profile and content-based features, while the second model employs automatically extracted natural language processing features from posted tweets. By integrating these distinct features, the hybrid model captures a broader spectrum of bot behaviors and characteristics, leading to more accurate and robust detection. This combined approach allows the meta-model to identify bots that might evade detection when only one type of feature is used, offering a more holistic understanding of bot behavior across both structural and content dimensions. Extensive experiments demonstrate significant improvements in detection performance, achieving an impressive F1-score of 90.22% on the challenging Twibot-20 dataset, outperforming state-of-the-art models.

Key words: social media, bot detection, twitter, hybrid approach, ensemble learning, natural language processing