Int J Performability Eng ›› 2026, Vol. 22 ›› Issue (6): 318-330.doi: 10.23940/ijpe.26.06.p3.318330

Previous Articles     Next Articles

Cross-Project Generalization Challenges in Transformer-Based Code Smell Detection: An Empirical Study

Bhavana Chowdary Burra*, Seema Shukla, and Mayank Kumar Goyal   

  1. Software Engineering, Computer Science Dept. Sharda University, Greater Noida, India
  • Contact: *E-mail address: burrabhavana@gmail.com

Abstract: Detecting code smells is very important for increasing software maintainability and lowering the technical debt of large-scale software systems. Traditional machine learning methods rely heavily on manually engineered features and, as a result, can struggle to generalize across projects due to domain differences and class imbalance in the datasets. However, although transformer-based pre-trained models have shown great promise in understanding the semantics of source code, there has been limited investigation into how well they perform across different datasets, particularly balanced versus imbalanced ones. In this study, we compare the performance of baseline machine learning models and transformer-based models for detecting multiple types of code smells on two heterogeneous datasets with different distribution properties. From the analysis, we see that the degree of imbalance in the datasets and the differences between the two domains significantly affect the performance and generalization of the various models. Our experimental results show that whilst transformer-based models outperform baseline machine learning models, the extent of their advantage varies with dataset characteristics; therefore, transformer-based models do not generalize well across projects. We have also found that providing domain-specific fine-tuning strategies can improve adaptability and detection performance in real-world use. This study provides insights into dataset characteristics, model behavior across domains, and the need for adaptive learning approaches to develop robust, generalized code smell detection systems.

Key words: code smell detection, software quality, machine learning, transformer models, dataset imbalance, code smells, cross-project generalization, fine tuning