-
HDBiTweeC: A Novel Dynamic Clustering Algorithm for Tweets
- Vineeth Menon, Bibal Benifa J V, and Christy K T
-
2025, 21(10):
572-582.
doi:10.23940/ijpe.25.10.p4.572582
-
Abstract
PDF (633KB)
-
References |
Related Articles
Social media platforms such as X (Twitter) generate vast volumes of continuously evolving data, making the extraction of meaningful insights a challenging task. To address this, this paper proposes HDBiTweeC, a novel hybrid clustering framework designed for time-evolving text data. The framework integrates autoencoder-based dimensionality reduction with HDBSCAN, enabling the capture of evolving patterns in data while preserving semantic relationships. For empirical evaluation, tweets corresponding to five trending hashtags were collected using snscrape, pre-processed, and transformed into embeddings. The performance of HDBiTweeC was benchmarked against K-Means, K-Means++, and HDBSCAN using clustering quality metrics including the Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index. Experimental results demonstrate that HDBiTweeC consistently outperforms the baseline methods. In addition, a customized explainability module incorporating z-scoring, polarity-based sentiment analysis, and model-agnostic techniques such as LIME enhances the interpretability of the clustering outcomes. This framework thus enables the discovery of crowd patterns and offers potential applications in identifying emergent events such as disasters, protests, and the spread of misinformation.