Strategic Management of Hybrid Retrieval-Augmented Microservices for Long-Horizon Cloud Machine Learning

doi:10.23940/ijpe.26.05.p6.288296

Int J Performability Eng ›› 2026, Vol. 22 ›› Issue (5): 288-296.doi: 10.23940/ijpe.26.05.p6.288296

Strategic Management of Hybrid Retrieval-Augmented Microservices for Long-Horizon Cloud Machine Learning

Deepak Bansal^a, Yojna Arora^b, Hare Ram Singh^c, Rashmi Sharma^d, and Rekha Chaturvedi^e,*

^aFinance Department, GNIOT Institute of Management Studies, Uttar Pradesh, India;
^bDepartment of Computer Science & Engineering, School of Engineering and Technology, Sharda University, Uttar Pradesh. India;
^cDepartment of Computer Science & Engineering, KCC Institute of Technology & Management, Uttar Pradesh. India;
^dDepartment of Computer Engineering, MPSTME, SVKM's NMIMS University, Maharashtra, India;
^eDepartment of Data Science and Engineering, School of Information Security and Data Science, Manipal University Jaipur, Rajasthan, India

Submitted on ; Revised on ; Accepted on
Contact: * E-mail address: rekha.chaturvedi@jaipur.manipal.edu

Abstract

Abstract: For long-horizon machine learning, there is a need to process large amounts of contextual data and perform reasoning over long sequences of time or knowledge. The traditional monolithic machine learning architectures have shown limitations in terms of scalability, accessibility of knowledge, and efficient utilization of resources in cloud computing. The paper aims to introduce a Hybrid Retrieval-Augmented Microservices Architecture (HRAMA), which can be used to enhance the efficiency of machine learning architectures in cloud computing. The hybrid retrieval mechanism integrates semantic vector similarity search with metadata-based filtering to improve the relevance of the extracted information. The architecture is based on a machine learning pipeline that is decomposed into independent microservices, which are then deployed using containerized cloud computing. The performance of the proposed architecture is validated using experimental results that show improved retrieval accuracy, system throughput, and scalability, along with reduced inference latency. The proposed HRAMA framework is efficient for long horizon cloud machine learning applications.

Key words: hybrid retrieval, microservices architecture, retrieval-augmented learning, cloud machine learning, long-horizon reasoning, vector databases, Kubernetes, distributed machine learning

Deepak Bansal, Yojna Arora, Hare Ram Singh, Rashmi Sharma, and Rekha Chaturvedi. Strategic Management of Hybrid Retrieval-Augmented Microservices for Long-Horizon Cloud Machine Learning [J]. Int J Performability Eng, 2026, 22(5): 288-296.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., Küttler H., Lewis M., Yih W.T., Rocktäschel T., andRiedel S., 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems, 33, pp. 9459-9474.
[2] Wu C., Peng Q., Xia Y., Jin Y., andHu Z., 2023. Towards cost-effective and robust AI microservice deployment in edge computing environments.Future Generation Computer Systems, 141, pp. 129-142.
[3] Klesel M., andWittmann H.F., 2025. Retrieval-augmented generation (rag) m. klesel, hf wittmann. Business & Information Systems Engineering,67(4), pp. 551-561.
[4] Brown A., Roman M., andDevereux B., 2025. A systematic literature review of retrieval-augmented generation: techniques, metrics, and challenges.Arxiv Preprint Arxiv:2508.06401.
[5] Li Z., Wang Z., Wang W., Hung K., Xie H., andWang F.L., 2025. Retrieval-augmented generation for educational application: A systematic survey.Computers and Education: Artificial Intelligence, 8, 100417.
[6] Aksakalli I.K., Çelik T., Can A.B., andTekinerdoğan B., 2021. Deployment and communication patterns in microservice architectures: A systematic literature review.Journal of Systems and Software, 180, 111014.
[7] Li H., Rao W., Hu B., Tian Y., andShen J., 2025. Energy-aware elastic scaling algorithm for microservices in kubernetes clouds.Journal of Network and Computer Applications, 242, 104218.
[8] Seo J., Jang S., Cha J., Choi H., Kim D., andKim S., 2023. MDED-framework: a distributed microservice deep-learning framework for object detection in edge computing.Sensors, 23(10), 4712.
[9] Ferreira R.C., Trapmann R., andvan den Heuvel W.J., 2025. MLOps with microservices: A case study on the maritime domain. InSymposium and Summer School on Service-Oriented Computing, pp. 3-15.
[10] Ahmad H., Treude C., Wagner M., andSzabo C., 2025. Resilient auto-scaling of microservice architectures with efficient resource management.Arxiv Preprint Arxiv:2506.05693.
[11] Hossen M.R., Islam M.A., andAhmed K., 2022. Practical efficient microservice autoscaling with QoS assurance. InProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pp. 240-252.
[12] Malhotra F.Y.S.,2020. A multi-cloud orchestration model using kubernetes for microservices.
[13] Devarakonda R.R.,2017. A microservices-based approach for scalable deployment of machine learning models on a cloud-based platform.Available at SSRN 5234707.
[14] Karpukhin V., Oguz B., Min S., Lewis P., Wu L., Edunov S., Chen D., andYih W.T., 2020. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769-6781.
[15] Johnson J., Douze M., andJégou H., 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data,7(3), pp. 535-547.
[16] Burns B., Grant B., Oppenheimer D., Brewer E., andWilkes J., 2016. Borg, omega, and kubernetes. Communications of the ACM,59(5), pp. 50-57.
[17] Dragoni N., Giallorenzo S., Lafuente A.L., Mazzara M., Montesi F., Mustafin R., andSafina L., 2017. Microservices: yesterday, today, and tomorrow.Present and Ulterior Software Engineering, pp. 195-216.
[18] Zaharia M., Chen A., Davidson A., Ghodsi A., Hong S.A., Konwinski A., Murching S., Nykodym T., Ogilvie P., Parkhe M., andXie F., 2018. Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull.,41(4), pp. 39-45.
[19] Trabelsi I., Mahmoudi B., Minani J.B., Moha N., andGuéhéneuc Y.G., 2025. A systematic literature review of machine learning approaches for migrating monolithic systems to microservices.IEEE Transactions on Software Engineering.
[20] Amugongo L.M., Mascheroni P., Brooks S., Doering S., andSeidel J., 2025. Retrieval augmented generation for large language models in healthcare: A systematic review.PLOS Digital Health, 4(6), e0000877.

Strategic Management of Hybrid Retrieval-Augmented Microservices for Long-Horizon Cloud Machine Learning

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Recommended 0

[1]	Jayanthi M and K. Ram Mohan Rao. Efficient Resource Managing and Job Scheduling in a Heterogeneous Kubernetes Cluster for Big Data [J]. Int J Performability Eng, 2024, 20(3): 157-166.
[2]	Haoran Li, Dongcheng Li, W. Eric. Wong, Deze Zeng, and Man Zhao. Kubernetes Virtual Warehouse Placement based on Reinforcement Learning [J]. Int J Performability Eng, 2021, 17(7): 579-588.