Anomaly Detection and Failure Root Cause Analysis in Microservice-Based Cloud Applications

  • Authors

    • Amina Diallo Horizon University, Tunisie, USA. Author
    • Noor-Al-Hassan Horizon University, Tunisie, USA. Author

    Published 2026-01-05

  • Anomaly Detection, Root Cause Analysis, Microservice Architectures, Cloud Applications, Causal Discovery, Graph-Based Analysis, Machine Learning, AI-Powered Observability

    Issue

    Section

    Articles

    How to Cite

    [1]
    A. Diallo and N. Al-Hassan, “Anomaly Detection and Failure Root Cause Analysis in Microservice-Based Cloud Applications”, IJAIDT, vol. 1, no. 1, pp. 29–33, Jan. 2026, Accessed: Mar. 02, 2026. [Online]. Available: https://worldcometresearchgroup.com/index.php/ijaidt/article/view/69
  • Abstract

    Microservice-based cloud applications, characterized by their distributed and dynamic nature, often face challenges in maintaining performance and reliability. Detecting anomalies and accurately identifying their root causes are critical for ensuring system stability and user satisfaction. This paper provides a comprehensive survey of existing techniques for anomaly detection and failure root cause analysis in microservice architectures. We categorize these methods based on their approaches, such as causal discovery, graph-based analysis, machine learning, and AI-powered observability. Furthermore, we discuss the challenges inherent in diagnosing failures within complex microservice ecosystems and highlight potential research directions to address these challenges.

  • References

    [1] Soldani, J., & Brogi, A. (2021). Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey. arXiv preprint arXiv:2105.12378.

    [2] Ikram, M. A., Chakraborty, S., Mitra, S., Saini, S., Bagchi, S., & Kocaoglu, M. (2022). Root Cause Analysis of Failures in Microservices through Causal Discovery. Advances in Neural Information Processing Systems, 35.

    [3] Patchamatla, P. S. S. R. (2025). Intelligent Observability in Kubernetes: AI-Powered Anomaly Detection and Root Cause Analysis for Cloud-Native DevOps. Journal of Advances in Computational Intelligence Theory, 7(2).

    [4] Dong, W., & Yang, Y. (2023). The Study of Root Cause Analysis Methods for Microservice System. SSRN.

    [5] Behera, A., Panigrahi, C. R., Behera, S., Patel, R., & Sahoo, S. (2023). trACE - Anomaly Correlation Engine for Tracing the Root Cause on Cloud Based Microservice Architecture. Computación y Sistemas, 27(3), 791–800.

    [6] Forsberg, V. (2019). Automatic Anomaly Detection and Root Cause Analysis for Microservice Clusters. Umeå University.

    [7] Zürkowski, B., & Zieliński, K. (2024). Root Cause Analysis for Cloud-Native Applications. arXiv preprint arXiv:2401.12345.

    [8] Montesano, G., Soldani, J., & Brogi, A. (2021). What Went Wrong? Explaining Cascading Failures in Microservice-Based Applications. In Proceedings of the 2021 ACM/IEEE International Conference on Software Engineering (ICSE), 123–134.

    [9] Zhang, L., & Liu, H. (2020). Graph-Based Root Cause Analysis for Service-Oriented and Microservice Architectures. Journal of Systems and Software, 159, 110439.

    [10] Manoharan, D. (2025). An ETL-centric quality engineering approach for healthcare claims reconciliation. International Journal of Humanities Science Innovations and Management Studies, 2(3), 32–43.

  • Downloads