A Two-Stage Code Generation Method using Large Language Models

doi:10.23940/ijpe.24.07.p6.460467

Abstract

Abstract: Large language models are capable of generating source code in a zero-shot manner to develop programs that meet user functional requirements. However, when faced with scenarios involving complex business requirements, the generated source code may fail to satisfy user needs. Addressing the challenge of understanding software requirements, we propose a two-stage code generation approach. Initially, the large language model generates pseudocode based on the user’s functional requirements, refined through an iterative process with user feedback. Subsequently, the model generates source code based on the finalized pseudocode. We conducted empirical studies on an open code generation dataset, and experimental results with models such as GPT-4, Claude Sonnect 3, and Geminipro 1.5 demonstrate that our method outperforms zero-shot prompt learning in scenarios with complex user requirements, with improvements in PASS@K reaching up to 15%.

Key words: large language models, two-stage code generation, zero-shot prompt learning

Dapeng Zhao and Tongcheng Geng. A Two-Stage Code Generation Method using Large Language Models [J]. Int J Performability Eng, 2024, 20(7): 460-467.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Gu T., Liu K., Dolan-Gavitt B. and Garg S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7, pp.47230-47244.
[2] Cai H., Zhang P., Dong H., Xiao Y., Koffas S. and Li Y., 2024. Towards stealthy backdoor attacks against speech recognition via elements of sound.IEEE Transactions on Information Forensics and Security.
[3] Zhao S., Ma X., Zheng X., Bailey J., Chen J. and Jiang Y.G., 2020. Clean-label backdoor attacks on video recognition models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition(pp. 14443-14452).
[4] Lou Q., Liu Y. and Feng B., 2023. Trojtext: Test-time invisible textual trojan insertion.arXiv preprint arXiv:2303.02242.
[5] Hadi M.U., Qureshi R., Shah A., Irfan M., Zafar A., Shaikh M.B., Akhtar N., Wu J. and Mirjalili S., 2023. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects.Authorea Preprints.
[6] Luitel D., Hassani S. and Sabetzadeh M., 2024. Improving requirements completeness: Automated assistance through large language models. Requirements Engineering,29(1), pp.73-95.
[7] Hou X., Zhao Y., Liu Y., Yang Z., Wang K., Li L., Luo X., Lo D., Grundy J. and Wang H., 2023. Large language models for software engineering: A systematic literature review.arXiv preprint arXiv:2308.10620.
[8] Fan A., Gokkaya B., Harman M., Lyubarskiy M., Sengupta S., Yoo S. and Zhang J.M., 2023, May. Large language models for software engineering: Survey and open problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE)(pp. 31-53). IEEE.
[9] Wang J., Huang Y., Chen C., Liu Z., Wang S. and Wang Q., 2024. Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering.
[10] Ozkaya I.,2023. Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Software,40(3), pp.4-8.
[11] Li Y., Choi D., Chung J., Kushman N., Schrittwieser J., Leblond R., Eccles T., Keeling J., Gimeno F., Dal Lago A. and Hubert T., 2022. Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Science,378(6624), pp.1092-1097.
[12] Chen M., Tworek J., Jun H., Yuan Q., Pinto H.P.D.O., Kaplan J., Edwards H., Burda Y., Joseph N., Brockman G. and Ray A., 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374.
[13] Zheng Q., Xia X., Zou X., Dong Y., Wang S., Xue Y., Wang Z., Shen L., Wang A., Li Y. and Su T., 2023. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x.arXiv preprint arXiv:2303.17568.
[14] Qu Y., Huang S., Chen X., Wang X. and Yao Y., 2024. Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks. Journal of Systems and Software, 207, p.111859.
[15] Qu Y., Huang S. and Yao Y., 2024. A survey on robustness attacks for deep code models.Automated Software Engineering, 31(2), p.65.
[16] Brown T.B.,2020. Language models are few-shot learners.arXiv preprint arXiv:2005.14165.
[17] Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Aleman F.L., Almeida D., Altenschmidt J., Altman S., Anadkat S. and Avila R., 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774.
[18] Feng Z., Guo D., Tang D., Duan N., Feng X., Gong M., Shou L., Qin B., Liu T., Jiang D. and Zhou M., 2020. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155.
[19] Qu Y., Huang S., Chen X., Yao Y. and Bai T., An Input-Denoising-Based Defense Against Stealthy Backdoor Attacks in Large Language Models for Code.Available at SSRN 4821388.
[20] Li J., Zhao Y., Li Y., Li G. and Jin Z., 2023. Acecoder: Utilizing existing code to enhance code generation.arXiv preprint arXiv:2303.17780.
[21] Austin J., Odena A., Nye M., Bosma M., Michalewski H., Dohan D., Jiang E., Cai C., Terry M., Le Q. and Sutton C., 2021. Program synthesis with large language models.arXiv preprint arXiv:2108.07732.
[22] Team G., Anil R., Borgeaud S., Wu Y., Alayrac J.B., Yu J., Soricut R., Schalkwyk J., Dai A.M., Hauth A. and Millican K., 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805.
[23] Sonoda Y., Kurokawa R., Nakamura Y., Kanzawa J., Kurokawa M., Ohizumi Y., Gonoi W. and Abe O., 2024. Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in “Diagnosis Please” cases.Japanese Journal of Radiology, pp.1-5.
[24] Nijkamp E., Pang B., Hayashi H., Tu L., Wang H., Zhou Y., Savarese S. and Xiong C., 2022. Codegen: An open large language model for code with multi-turn program synthesis.arXiv preprint arXiv:2203.13474.
[25] Fried D., Aghajanyan A., Lin J., Wang S., Wallace E., Shi F., Zhong R., Yih W.T., Zettlemoyer L. and Lewis M., 2022. Incoder: A generative model for code infilling and synthesis.arXiv preprint arXiv:2204.05999.
[26] Shen B., Zhang J., Chen T., Zan D., Geng B., Fu A., Zeng M., Yu A., Ji J., Zhao J. and Guo Y., 2023. Pangu-coder2: Boosting large language models for code with ranking feedback.arXiv preprint arXiv:2307.14936.
[27] Jiang X., Dong Y., Wang L., Zheng F., Shang Q., Li G., Jin Z. and Jiao W., 2023. Self-planning Code Generation with Large Language Models.ACM Transactions on Software Engineering and Methodology.