Eai 5821 add envoy ai gateway#724
Closed
johnl-amd wants to merge 38 commits into
Closed
Conversation
envoy_gateway_cluster_role_for_ai_resources.yaml referenced the undefined ai-gateway-helm.fullname helper, breaking ArgoCD manifest generation. Use ai-gateway-helm.controller.fullname instead, which is defined in _helpers.tpl and resolves to ai-gateway-controller.
aim-gateway is a thin Helm chart (v0.1.0) that creates the AIGatewayRoute and Backend resources for routing AIM Engine inference traffic through Envoy AI Gateway. The patches directory documents the two fixes applied to the ai-gateway-controller v0.6.0 source to make it work alongside TLS listeners: insertRouterLevelAIGatewayExtProc and insertRequestHeaderToMetadataFilter both returned errors instead of continuing when encountering non-HCM filter chains (e.g. the EmptyCluster default-reject chain on TLS listeners). Both are changed to continue, consistent with other functions in the same codebase.
Add aim-gateway to root/values.yaml and all cluster size enabledApps lists (small/medium/large) with syncWave 5 (after envoy-ai-gateway). Make the backends list empty by default with an example comment so clusters override it in cluster-values rather than modifying chart source. Guard the AIGatewayRoute template against an empty backends list to avoid index-out-of-range failures during Helm render.
The TLSRoute created a non-HCM filter chain on the HTTPS listener that triggered the ai-gateway v0.6.0 bug. Removing it means the stock v0.6.0 image works without patching. The route is out of scope for EAI-5821 and it is unclear whether anything depended on it.
- Add llm_input_token, llm_output_token, llm_total_token dynamic metadata
fields to the Envoy JSON access log via %DYNAMIC_METADATA(envoy.filters.http.ext_proc:...)%
- Add model field from x-ai-eg-model header for per-model attribution
- Add OpenTelemetryCollector DaemonSet to aim-gateway chart that reads
Envoy proxy access logs and emits aim_gateway_inference_requests_total{model}
counter to Mimir via the OTel count connector
- Token values are also available in Loki for sum_over_time queries
Closes EAI-6232
Adds the extensionManager block so Envoy Gateway registers the AI Gateway controller (port 1063) as its xDS extension server. Without this, the ext_proc HTTP filter is never injected into the Envoy listener, leaving llm_input_token / llm_output_token / llm_total_token null in every access log entry despite the extproc sidecar running. Also persists enableBackend: true (required for Backend CRD references) which had previously only been applied as a live cluster patch.
Add Backend + EnvoyExtensionPolicy + GatewayConfig to make llm_input_token, llm_output_token, llm_total_token populate in Envoy access logs for inference requests. Root cause: Envoy's setDynamicMetadata() silently drops ext_proc response metadata unless writableNamespaces lists the namespace. The EnvoyExtensionPolicy now declares io.envoy.ai_gateway as a writable namespace so token metadata reaches StreamInfo. GatewayConfig with globalLLMRequestCosts ensures the sidecar emits token metadata for K8s Service backends (non-InferencePool routes), where aigw_route_name is absent and route-scoped costs never match. Gateway annotation links to the GatewayConfig so the AI Gateway controller includes global costs in the filter config secret.
Replace the aim-gateway DaemonSet filelog collector (which parsed JSON access logs to derive aim_gateway_inference_requests_total) with a single Prometheus scrape job in the cluster-wide otel-collector-metrics-rest. The ai-gateway-extproc sidecar exposes native gen_ai_* metrics (gen_ai_client_token_usage, gen_ai_server_request_duration_seconds, etc.) at port 1064/metrics on the Envoy data-plane pod. Pod discovery filters on app.kubernetes.io/component=proxy in the envoy-gateway-system namespace.
Wires Envoy Gateway native apiKeyAuth into the aim-gateway chart so that inference requests require a valid Bearer token validated against a k8s Secret. The matched client ID flows downstream as x-api-key-id for future rate limiting and metrics labeling. Targets the HTTPRoute (aim-inference-route) rather than the Gateway to avoid conflicting with the cluster-auth ExtAuth SecurityPolicy at Gateway level. Gated by both .Values.backends and .Values.apiKeyAuth.enabled — no-op when disabled or no backends configured. Enables apiKeyAuth for the small cluster profile via values_small.yaml.
…QuotaPolicy Routes x-api-key-id (injected by SecurityPolicy) and x-aim-service-id (injected per backendRef via headerMutation) through to gen_ai_* metric labels via the extproc's native metricsRequestHeaderAttributes mapping. No custom code needed. Adds a QuotaPolicy template that creates per-API-key token quota buckets using clientSelectors type:Distinct on x-api-key-id. Shadow mode by default so quota consumption is observed in metrics before enforcement is turned on.
QuotaPolicy defaultBucket (global safety-net) has no shadowMode field, so it enforces immediately even when per-key rules are in shadow mode. Gate it behind quotaPolicy.globalEnforced (default false) so the initial rollout only observes per-key consumption without hard limits. Add BOOTSTRAP comment to security-policy.yaml documenting that the API key Secret must be pre-created; it is not managed by this chart.
…cy namespace Envoy Gateway requires apiKeyAuth credentialRefs to be in the same namespace as the SecurityPolicy. All AIM SecurityPolicies are created in the AIM deployment namespace (workbench), so the secret must also live there. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ster-auth Per-AIM SecurityPolicies are created in the workload namespace (e.g., workbench) but need to reference the cluster-auth extAuth Service in the cluster-auth namespace. The existing ReferenceGrant only allowed envoy-gateway-system. Without this fix, Envoy Gateway sets direct_response: 500 on all AIGatewayRoute rules, breaking inference. Also includes AIWB RBAC additions for Envoy Gateway and InferencePool resources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The extproc sidecar exposes gen_ai_client_token_usage with api_key_id and aim_service_id labels on port 1064 (named aigw-admin). Wire Prometheus scraping via a PodMonitor targeting the Envoy pods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aim-gateway-controller now owns all gateway resources (AIGatewayRoute, InferencePool, SecurityPolicy) — the static aim-gateway Helm chart is superseded. extproc metrics are scraped by otel-collector-metrics-rest so the per-chart PodMonitor is redundant.
Authentication flows through cluster-auth extAuth — no Kubernetes secret with raw API keys is written or read. The gatewayApiKeys values block and the two GATEWAY_API_KEYS_SECRET_* env vars were from an earlier APIKeyAuth design that was replaced.
…21-add-envoy-ai-gateway
…ok cert The TLSRoute on the shared Gateway caused envoy-gateway xDS translation to fail when inserting the ext-proc request header filter — TLS passthrough filter chains have no HTTPConnectionManager. This blocked all xDS pushes to the envoy proxy, leaving it stuck in a startup probe loop. - Remove k8s-passthrough listener from Gateway spec - Delete tlsroute-k8s-passthrough.yaml - Fix admission_webhook.yaml: hoist cert variable declarations above the cert-manager/self-signed branch so caBundle is always populated when using self-signed certs
- Delete gateway-extension-kgateway-system.yaml and job-restart-kgateway.yaml from cluster-auth 0.5.9 — these were kgateway artifacts carried over from 0.5.0 and have no effect (nothing references them post-migration), but their presence is misleading and wrong. - Add valuesObject for envoy-ai-gateway in root/values.yaml to override the default-insecure-seed with a comment requiring per-deployment override. The seed was previously invisible in root/values.yaml, making it easy to ship the insecure default to production unnoticed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-gateway # Conflicts: # scripts/utils/job-cluster-tls-copy.yaml # sources/cluster-auth/0.5.0/templates/referencegrant.yaml # sources/cluster-auth/0.5.9/templates/job-restart-envoygateway.yaml
…s.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-crds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restores kubectl access to the cluster via k8s.<domain>:443. The TLS passthrough will be moved to a separate Gateway to avoid the ext-proc/HTTPConnectionManager conflict properly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ay xDS The SecurityPolicy targeting the https Gateway caused an xDS translation failure — extAuth cannot be applied to TLS passthrough filter chains which have no HTTPConnectionManager. Removing it unblocks envoy-gateway and restores kubectl access via k8s-passthrough. Will re-implement auth at the HTTPRoute level to avoid the listener conflict. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fig annotation The aigateway.envoyproxy.io/gateway-config annotation was the root cause of the xDS crash — it tells envoy-ai-gateway to inject ext-proc globally across all Gateway filter chains, including TLS passthrough chains which have no HTTPConnectionManager. Removing it fixes the conflict. The extAuth SecurityPolicy is safe to coexist with TLS passthrough and is restored (it was already live on the cluster without issues). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6524972 to
5393874
Compare
The InferencePool CRD (inference.networking.k8s.io/v1) is not installed on app-dev-1. Declaring it in backendResources causes envoy-gateway to start a watcher for it at startup. When the CRD is absent the watcher fails immediately and the controller enters CrashLoopBackOff, dropping the xDS connection and taking the whole cluster down.
9a8c4f6 to
99c945d
Compare
Removes the global SecurityPolicy/ReferenceGrant that enforced cluster-auth on the entire gateway with failOpen: false. With the envoy-gateway controller previously crash-looping (due to missing InferencePool CRD), EDS went stale and extAuth requests timed out, causing 403s across all routes. Will be restored once the cluster is stable.
…sources Install InferencePool CRD (inference-extension-crds/v1.5.0) at syncWave -35, before envoy-gateway at -30. This ensures the CRD exists when the controller starts watching for InferencePool resources, preventing the CrashLoopBackOff that took down app-dev-1. Re-enables backendResources: InferencePool in extensionManager now that the CRD installation order is guaranteed. Restores the global SecurityPolicy and ReferenceGrant for cluster-auth extAuth.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.