Int J Performability Eng ›› 2025, Vol. 21 ›› Issue (10): 559-571.doi: 10.23940/ijpe.25.10.p3.559571

Previous Articles     Next Articles

Understanding Code Quality: A Qualitative Evaluation of LLM-Generated vs. Human-Written Code

Abiha Naqvi, Apeksha Jain, Avisha Goyal, and Ankita Verma*   

  1. Jaypee Institute of Information Technology, Noida, India
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: ankita.verma@mail.jiit.ac.in

Abstract: As Large Language Models (LLMs) like GPT and Gemini become increasingly integrated into software development, understanding their capabilities and limitations is essential. This project evaluates the effectiveness of these models in code generation by comparing AI-generated code to human-written code in C++ and Python. Key software quality metrics—including cyclomatic complexity, lines of code, and space and time complexity—are used to assess the performance, efficiency, and readability of the generated code. The study also examines how prompt complexity, analyzed at two distinct levels, influences the quality of code produced by the models. By highlighting the strengths and weaknesses of LLMs in handling programming tasks of varying difficulty, this research provides valuable insights for developers, researchers, and industry professionals. The findings aim to inform best practices for integrating AI assistance into development workflows, ensuring a balance between automation and human oversight. Ultimately, this work contributes to more efficient and maintainable coding practices in an AI-augmented development landscape.

Key words: programming, generative AI, code analysis, large language model