AI-powered Kubernetes incident analysis and Root Cause Analysis
Turn Kubernetes alerts into actionable root-cause analysis — in seconds, not hours.
⭐ If KubeRCA looks useful, please consider starring the main repository. It helps the project reach more operators and brings in more contributors.
A few screens from a running KubeRCA install. Dashboards correlate alerts to incidents, and every incident gets an LLM-generated RCA summary that lands in Slack and the UI together.
KubeRCA is an open-source tool that turns Kubernetes alerts into actionable incident context, AI-assisted analysis, and operator workflows.
It is built for the gap between "an alert fired" and "we understand what happened." Instead of manually gathering evidence across Kubernetes, observability tools, chat, and dashboards, KubeRCA connects alert intake, RCA generation, Slack delivery, and incident search into one operator-facing flow.
KubeRCA is a strong fit for teams that already use Alertmanager, want more consistent RCA, and need searchable incident history instead of one-off alert handling.
- Kubernetes environments with Alertmanager-based alerting
- Teams using Slack threads or dashboards during incident triage
- Workloads where recurring incidents benefit from historical reuse
- Organizations that want LLM-assisted triage without replacing their existing stack
- Log-only workflows without structured alerts
- Fully autonomous remediation expectations
- Generic APM replacement use cases
flowchart TD
AM[Alertmanager]
SL[Slack]
LLM[LLM Provider]
K8S[Kubernetes API]
PR[Prometheus]
TP[Tempo]
subgraph KubeRCA
FE[Frontend]
BE[Backend]
AG[Agent]
DB[(PostgreSQL + pgvector)]
end
AM -->|Webhook| BE
FE <-->|REST + SSE| BE
BE -->|Analyze / Summarize / Chat| AG
BE -->|Thread notifications| SL
BE <-->|Incidents / alerts / embeddings| DB
AG -->|Cluster context| K8S
AG -->|Metrics| PR
AG -.->|Trace context| TP
AG -->|Inference| LLM
- Alertmanager sends alerts to the Backend.
- Backend creates or updates incidents and stores alert history.
- Agent collects Kubernetes and observability context, then runs RCA with an LLM provider.
- Results are published to Slack and streamed to the dashboard.
- Operators can resolve incidents, manually resolve alerts, search similar incidents, leave feedback, and use in-app chat.
Read the full runtime walkthrough in the Architecture Details.
- Alert-driven incident intake through Alertmanager
- Kubernetes and observability context collection
- Multi-provider RCA with
gemini,openai, andanthropic
- Slack thread delivery for incident and RCA updates
- Realtime dashboard sync with SSE
- Manual resolve, feedback, webhook settings, and context-aware chat
- Similar incident search with PostgreSQL + pgvector
- Local auth and Google OIDC support
- Helm-based deployment for Kubernetes environments
helm upgrade --install kube-rca oci://public.ecr.aws/r5b7j2e4/kube-rca-ecr/charts/kube-rca \
--namespace kube-rca --create-namespace \
-f values.yamlPoint your Alertmanager receiver at:
http://kube-rca-backend.kube-rca.svc.cluster.local:8080/webhook/alertmanager
- Trigger or forward an alert
- Verify analysis arrives in the dashboard
- Enable Slack if you want threaded incident delivery
For installation details and step-by-step setup, use the documents below.
- Main Repository
- Architecture Details
- Project Background
- 한국어 — 설치 가이드
- English — Installation Guide
- Troubleshooting
- FAQ
- Helm Chart README
- Backend README
- Agent README
- Frontend README
- GitHub Discussions — questions, ideas, and proposals
- Issues — bug reports and feature requests (use the templates)
- Security — private vulnerability reporting
Issues, pull requests, and design feedback are all welcome. Before opening a PR, please read:
- CONTRIBUTING.md — development setup per component, Conventional Commits, DCO sign-off, PR workflow
- CODE_OF_CONDUCT.md — community expectations
- GOVERNANCE.md — roles, decision making, and how Maintainers are added
- SECURITY.md — how to report vulnerabilities privately
⭐ Liked what you saw? A star on the main repository is the simplest way to help the project grow.
This project is licensed under the Apache License, Version 2.0. See LICENSE and NOTICE for details.
Made for Kubernetes operators who need faster incident context and RCA





