ІНТЕЛЕКТУАЛЬНЕ БАЛАНСУВАННЯ НАВАНТАЖЕННЯ В МІКРОСЕРВІСНІЙ АРХІТЕКТУРІ

N. G. Axak; Yu. O. Shelikhov

doi:10.32782/2521-6643-2025-2-70.23

N. G. Axak Kharkiv National University of Radio Electronics https://orcid.org/0000-0001-8372-8432
Yu. O. Shelikhov Kharkiv National University of Radio Electronics https://orcid.org/0009-0009-8970-6571

DOI: https://doi.org/10.32782/2521-6643-2025-2-70.23

Keywords: load forecasting, microservice, architecture, cloud computing, distributed system, cloud services, machine learning, artificial intelligence

Abstract

This paper presents an intelligent method for load balancing in microservice architectures that combines parallel (hedged) request routing with the Thompson Sampling Multi-Armed Bandit (MAB) algorithm. The goal is to address tail-latency spikes and performance variability that traditional policies (Round Robin, Least Connections) cannot handle under heterogeneous, bursty workloads. The proposed architecture comprises a YARP-based API Gateway that executes weighted hedging, an AI load balancer (FastAPI) that learns routing probabilities from live telemetry, and a Prometheus–Grafana stack providing continuous feedback for adaptation. The balancer transforms observed metrics (latency percentiles, error rate) into rewards and updates per-replica posteriors via Thompson Sampling, thereby balancing exploration and exploitation while preventing persistent bias toward temporarily fast but unstable instances. We evaluate four strategies–static round-robin (k=1), static hedging (k=2), adaptive MAB hedging (k=2), and adaptive MAB hedging (k=3). Experiments with up to 1,000 concurrent clients show that adaptive hedging with Thompson Sampling reduces P99 latency by ≈65% and the error rate by ≈45% versus baseline, with negligible throughput loss and moderate CPU overhead. Increasing parallelism beyond two replicas yields diminishing returns, confirming that small k is sufficient when combined with probabilistic weighting and strict idempotency. The findings demonstrate that integrating speculative duplication with Bayesian decision-making provides a lightweight, cloud-native path to tail-tolerant performance. The solution is modular and reproducible, and it generalizes to Kubernetes-based deployments and IoT/cyber-physical scenarios where real-time, context-aware coordination and reliability are essential.

References

1. Linkerd. Beyond Round-Robin: Load Balancing for Latency. URL: https://linkerd.io/2016/03/16/beyond-round-robin-load-balancing-for-latency (дата звернення: 06.09.2025).
2. Wang H., Wang Y., Liang G., Gao Y., Gao W., Zhang W. Research on load balancing technology for microservice architecture. MATEC Web of Conferences. 2021. Vol. 336. P. 08002. EDP Sciences.
3. Cui J., Chen P., Yu G. A learning-based dynamic load balancing approach for microservice systems in multi-cloud environment. 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS). 2020. P. 334–341. IEEE.
4. Dean J., Barroso L. A. The tail at scale. Communications of the ACM. 2013. Vol. 56, No. 2. P. 74–80.
5. Zhang P., Xiang L., Song Z., Yang Y. Adaptive load balancing and fault-tolerant microservices architecture for high-availability web systems using Docker and Spring Cloud. Discover Applied Sciences. 2025. Vol. 7, Article 705. DOI: 10.1007/s42452-025-07320-7.
6. Moritz de Carvalho Neto P., Castro M., Siqueira F. Dynamic load balancing in Kubernetes environments with Kubernetes Scheduling Extension (KSE). Concurrency and Computation: Practice and Experience. 2024. Vol. 37, No. 3. e8344. DOI: 10.1002/cpe.8344.
7. Bhattacharya R., Gao Y., Wood T. Dynamically balancing load with overload control for microservices. ACM Transactions on Autonomous and Adaptive Systems. 2024. Vol. 19, No. 4. Article 22. DOI: 10.1145/3676167.
8. Singh N., Hamid Y., Juneja S. та ін. Load balancing and service discovery using Docker Swarm for microservice based big data applications. Journal of Cloud Computing. 2023. Vol. 12, No. 1. Article 4. DOI: 10.1186/s13677-022-00358-7.
9. Selvakumar G., Jayashree L. S., Arumugam S. Latency minimization using an adaptive load balancing technique in microservices applications. Computer Systems Science & Engineering. 2023. Vol. 46, No. 1. P. 1215–1231. DOI: 10.32604/csse.2023.021879.
10. Аксак Н., Кушнарьов М., Шеліхов Ю. Інтелектуальні системи керування мікрокліматом у розумних середовищах. Information control systems and intelligent technologies. Achievements and applications: монографія / за ред. проф. В. Вичужаніна. Львів–Торунь: Liha-Pres, 2025. С. [273–295]. 402 с. ISBN 978-966-397-538-2. DOI: https://doi.org/10.36059/978-966-397-538-2

INTELLIGENT LOAD BALANCING IN MICROSERVICE ARCHITECTURE

Abstract

References

Most read articles by the same author(s)