feat(plane-enterprise): native OpenTelemetry APM support (v2.5.1)#241
feat(plane-enterprise): native OpenTelemetry APM support (v2.5.1)#241pratapalakshmi wants to merge 1 commit into
Conversation
Add first-class OTEL configuration to the backend instead of relying on ad-hoc extraEnv. Bumps chart version to 2.5.1. - values.yaml: new `observability.otel` block (off by default) with a nested `collector` sub-block for an optional bundled OTLP collector. - config-secrets/app-env.yaml: when `observability.otel.enabled`, inject OTEL_* into the backend `-app-vars` ConfigMap (scoped to the six workloads that envFrom it: api, worker, beat-worker, automation-consumer, outbox-poller, migrator). Auth headers go into `-app-secrets`. When the endpoint is blank and the bundled collector is enabled, the backend auto-targets the in-cluster collector Service. - templates/observability/otel-collector.yaml: bundled collector (ConfigMap/Service/Deployment), gated on `observability.otel.collector.enabled`, with an overridable config that defaults to an OTLP-in -> debug-out pipeline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughThis PR adds OpenTelemetry observability support to the Plane Enterprise Helm chart. It introduces OTEL configuration defaults, environment variable injection for application services, and an optional in-cluster OpenTelemetry Collector deployment with OTLP gRPC/HTTP endpoints. The chart version is bumped to 2.5.1. ChangesOpenTelemetry Observability Integration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@charts/plane-enterprise/templates/config-secrets/app-env.yaml`:
- Line 122: The OTEL_EXPORTER_OTLP_ENDPOINT value is hardcoded to
"cluster.local" which breaks clusters with custom domains; update the
OTEL_EXPORTER_OTLP_ENDPOINT entry to construct the service FQDN using the same
pattern used elsewhere by interpolating {{ .Release.Name }}, {{
.Release.Namespace }} and the parameterized cluster domain via {{
.Values.env.default_cluster_domain | default "cluster.local" }} so the endpoint
resolves correctly for custom cluster domains while preserving the collector
host and port.
- Around line 121-126: The OTEL_EXPORTER_OTLP_ENDPOINT currently hardcodes port
4317 (gRPC) regardless of .Values.observability.otel.protocol, causing
protocol/port mismatch; update the template that sets
OTEL_EXPORTER_OTLP_ENDPOINT to choose port based on the protocol
(OTEL_EXPORTER_OTLP_PROTOCOL) — use 4317 for "grpc" and 4318 for "http/protobuf"
(or map other protocol values accordingly) when
.Values.observability.otel.collector.enabled is true, keeping the same host
formation ({{ .Release.Name }}-otel-collector.{{ .Release.Namespace
}}.svc.cluster.local) so the exporter speaks the correct port for the configured
protocol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dc26bcbe-f3cf-4ff4-8e2b-d3198a45510e
📒 Files selected for processing (4)
charts/plane-enterprise/Chart.yamlcharts/plane-enterprise/templates/config-secrets/app-env.yamlcharts/plane-enterprise/templates/observability/otel-collector.yamlcharts/plane-enterprise/values.yaml
| {{- else if .Values.observability.otel.collector.enabled }} | ||
| OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4317" | ||
| {{- else }} | ||
| OTEL_EXPORTER_OTLP_ENDPOINT: "" | ||
| {{- end }} | ||
| OTEL_EXPORTER_OTLP_PROTOCOL: {{ .Values.observability.otel.protocol | default "grpc" | quote }} |
There was a problem hiding this comment.
Auto-target endpoint hardcodes gRPC port 4317, ignoring protocol.
When endpoint is empty and the bundled collector is enabled, the backend is always wired to port 4317 (gRPC). If the user sets observability.otel.protocol: http/protobuf, the exporter (line 126 emits OTEL_EXPORTER_OTLP_PROTOCOL) will speak HTTP/protobuf against the gRPC port and exports will fail silently. The collector listens on both ports, so the client must select the matching one.
🐛 Proposed fix: pick port by protocol
{{- else if .Values.observability.otel.collector.enabled }}
- OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4317"
+ {{- if eq (.Values.observability.otel.protocol | default "grpc") "grpc" }}
+ OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4317"
+ {{- else }}
+ OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4318"
+ {{- end }}
{{- else }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@charts/plane-enterprise/templates/config-secrets/app-env.yaml` around lines
121 - 126, The OTEL_EXPORTER_OTLP_ENDPOINT currently hardcodes port 4317 (gRPC)
regardless of .Values.observability.otel.protocol, causing protocol/port
mismatch; update the template that sets OTEL_EXPORTER_OTLP_ENDPOINT to choose
port based on the protocol (OTEL_EXPORTER_OTLP_PROTOCOL) — use 4317 for "grpc"
and 4318 for "http/protobuf" (or map other protocol values accordingly) when
.Values.observability.otel.collector.enabled is true, keeping the same host
formation ({{ .Release.Name }}-otel-collector.{{ .Release.Namespace
}}.svc.cluster.local) so the exporter speaks the correct port for the configured
protocol.
| {{- if .Values.observability.otel.endpoint }} | ||
| OTEL_EXPORTER_OTLP_ENDPOINT: {{ .Values.observability.otel.endpoint | quote }} | ||
| {{- else if .Values.observability.otel.collector.enabled }} | ||
| OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4317" |
There was a problem hiding this comment.
Hardcoded cluster.local breaks custom cluster domains.
Elsewhere in this same template the cluster domain is parameterized (e.g. Line 36 and Line 97 use {{ .Values.env.default_cluster_domain | default "cluster.local" }}). The auto-target endpoint here pins cluster.local, so clusters with a custom DNS domain won't resolve the collector Service.
♻️ Align with the existing pattern
- OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.cluster.local:4317"
+ OTEL_EXPORTER_OTLP_ENDPOINT: "http://{{ .Release.Name }}-otel-collector.{{ .Release.Namespace }}.svc.{{ .Values.env.default_cluster_domain | default "cluster.local" }}:4317"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@charts/plane-enterprise/templates/config-secrets/app-env.yaml` at line 122,
The OTEL_EXPORTER_OTLP_ENDPOINT value is hardcoded to "cluster.local" which
breaks clusters with custom domains; update the OTEL_EXPORTER_OTLP_ENDPOINT
entry to construct the service FQDN using the same pattern used elsewhere by
interpolating {{ .Release.Name }}, {{ .Release.Namespace }} and the
parameterized cluster domain via {{ .Values.env.default_cluster_domain | default
"cluster.local" }} so the endpoint resolves correctly for custom cluster domains
while preserving the collector host and port.
Summary
Adds first-class OpenTelemetry APM support to the
plane-enterprisechart so self-hosters can enable backend tracing/metrics/log-correlation via values instead of hand-rolledextraEnv. Pairs with thefeat/otel-api-observabilitywork inplane-ee(which adds theconfigure_otel()bootstrap to the Django backend). Chart version bumped 2.5.0 → 2.5.1.What changed
values.yaml— newobservability.otelblock (off by default) with a nestedcollectorsub-block for an optional bundled OTLP collector.templates/config-secrets/app-env.yaml— whenobservability.otel.enabled, injectsOTEL_*into the backend-app-varsConfigMap. Scoped to the six workloads thatenvFromit (api, worker, beat-worker, automation-consumer, outbox-poller, migrator) — no frontend pods touched. Auth headers (if any) go into-app-secrets. Whenendpointis blank and the bundled collector is enabled, the backend auto-targets the in-cluster collector Service.templates/observability/otel-collector.yaml— bundled collector (ConfigMap/Service/Deployment), gated onobservability.otel.collector.enabled, with an overridableconfigdefaulting to anOTLP-in → debug-outpipeline.How to enable
Point at an external collector instead by setting
observability.otel.endpointand leavingcollector.enabled: false.Testing
Deployed to a live EKS cluster (namespace
gpotel, isolated DB, backend image built fromfeat/otel-api-observability) and drove real traffic.Render gating —
helm template+helm lint:helm lint1 chart(s) linted, 0 chart(s) failedOTEL_*envOTEL_EXPORTER_OTLP_ENDPOINT: ""(external-collector mode)http://<release>-otel-collector.<ns>.svc.cluster.local:4317Live spans received by the bundled collector (
kubectl logs deploy/plane-gpotel-otel-collector):Log ↔ trace correlation: worker/Celery logs carry populated
trace_id(32-hex), confirming theTraceContextFilterpath. (Follow-up, app-side not chart-side: the API request access log currently emits emptytrace_id/span_idbecause it logs outside the active span context — tracked against theplane-eebranch, not this chart.)Notes
debugexporter (verification only) — setobservability.otel.collector.configto export to a real backend (Tempo/Datadog/Honeycomb), or disable it and useobservability.otel.endpoint.otel/opentelemetry-collector-contrib:0.115.1.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores