Eai 5821 add envoy ai gateway by johnl-amd · Pull Request #724 · silogen/cluster-forge

johnl-amd · 2026-05-29T04:47:05Z

No description provided.

envoy_gateway_cluster_role_for_ai_resources.yaml referenced the undefined ai-gateway-helm.fullname helper, breaking ArgoCD manifest generation. Use ai-gateway-helm.controller.fullname instead, which is defined in _helpers.tpl and resolves to ai-gateway-controller.

aim-gateway is a thin Helm chart (v0.1.0) that creates the AIGatewayRoute and Backend resources for routing AIM Engine inference traffic through Envoy AI Gateway. The patches directory documents the two fixes applied to the ai-gateway-controller v0.6.0 source to make it work alongside TLS listeners: insertRouterLevelAIGatewayExtProc and insertRequestHeaderToMetadataFilter both returned errors instead of continuing when encountering non-HCM filter chains (e.g. the EmptyCluster default-reject chain on TLS listeners). Both are changed to continue, consistent with other functions in the same codebase.

Add aim-gateway to root/values.yaml and all cluster size enabledApps lists (small/medium/large) with syncWave 5 (after envoy-ai-gateway). Make the backends list empty by default with an example comment so clusters override it in cluster-values rather than modifying chart source. Guard the AIGatewayRoute template against an empty backends list to avoid index-out-of-range failures during Helm render.

The TLSRoute created a non-HCM filter chain on the HTTPS listener that triggered the ai-gateway v0.6.0 bug. Removing it means the stock v0.6.0 image works without patching. The route is out of scope for EAI-5821 and it is unclear whether anything depended on it.

- Add llm_input_token, llm_output_token, llm_total_token dynamic metadata fields to the Envoy JSON access log via %DYNAMIC_METADATA(envoy.filters.http.ext_proc:...)% - Add model field from x-ai-eg-model header for per-model attribution - Add OpenTelemetryCollector DaemonSet to aim-gateway chart that reads Envoy proxy access logs and emits aim_gateway_inference_requests_total{model} counter to Mimir via the OTel count connector - Token values are also available in Loki for sum_over_time queries Closes EAI-6232

Adds the extensionManager block so Envoy Gateway registers the AI Gateway controller (port 1063) as its xDS extension server. Without this, the ext_proc HTTP filter is never injected into the Envoy listener, leaving llm_input_token / llm_output_token / llm_total_token null in every access log entry despite the extproc sidecar running. Also persists enableBackend: true (required for Backend CRD references) which had previously only been applied as a live cluster patch.

Add Backend + EnvoyExtensionPolicy + GatewayConfig to make llm_input_token, llm_output_token, llm_total_token populate in Envoy access logs for inference requests. Root cause: Envoy's setDynamicMetadata() silently drops ext_proc response metadata unless writableNamespaces lists the namespace. The EnvoyExtensionPolicy now declares io.envoy.ai_gateway as a writable namespace so token metadata reaches StreamInfo. GatewayConfig with globalLLMRequestCosts ensures the sidecar emits token metadata for K8s Service backends (non-InferencePool routes), where aigw_route_name is absent and route-scoped costs never match. Gateway annotation links to the GatewayConfig so the AI Gateway controller includes global costs in the filter config secret.

Replace the aim-gateway DaemonSet filelog collector (which parsed JSON access logs to derive aim_gateway_inference_requests_total) with a single Prometheus scrape job in the cluster-wide otel-collector-metrics-rest. The ai-gateway-extproc sidecar exposes native gen_ai_* metrics (gen_ai_client_token_usage, gen_ai_server_request_duration_seconds, etc.) at port 1064/metrics on the Envoy data-plane pod. Pod discovery filters on app.kubernetes.io/component=proxy in the envoy-gateway-system namespace.

Wires Envoy Gateway native apiKeyAuth into the aim-gateway chart so that inference requests require a valid Bearer token validated against a k8s Secret. The matched client ID flows downstream as x-api-key-id for future rate limiting and metrics labeling. Targets the HTTPRoute (aim-inference-route) rather than the Gateway to avoid conflicting with the cluster-auth ExtAuth SecurityPolicy at Gateway level. Gated by both .Values.backends and .Values.apiKeyAuth.enabled — no-op when disabled or no backends configured. Enables apiKeyAuth for the small cluster profile via values_small.yaml.

…QuotaPolicy Routes x-api-key-id (injected by SecurityPolicy) and x-aim-service-id (injected per backendRef via headerMutation) through to gen_ai_* metric labels via the extproc's native metricsRequestHeaderAttributes mapping. No custom code needed. Adds a QuotaPolicy template that creates per-API-key token quota buckets using clientSelectors type:Distinct on x-api-key-id. Shadow mode by default so quota consumption is observed in metrics before enforcement is turned on.

QuotaPolicy defaultBucket (global safety-net) has no shadowMode field, so it enforces immediately even when per-key rules are in shadow mode. Gate it behind quotaPolicy.globalEnforced (default false) so the initial rollout only observes per-key consumption without hard limits. Add BOOTSTRAP comment to security-policy.yaml documenting that the API key Secret must be pre-created; it is not managed by this chart.

…way chart

…ream README

…cy namespace Envoy Gateway requires apiKeyAuth credentialRefs to be in the same namespace as the SecurityPolicy. All AIM SecurityPolicies are created in the AIM deployment namespace (workbench), so the secret must also live there. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ster-auth Per-AIM SecurityPolicies are created in the workload namespace (e.g., workbench) but need to reference the cluster-auth extAuth Service in the cluster-auth namespace. The existing ReferenceGrant only allowed envoy-gateway-system. Without this fix, Envoy Gateway sets direct_response: 500 on all AIGatewayRoute rules, breaking inference. Also includes AIWB RBAC additions for Envoy Gateway and InferencePool resources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The extproc sidecar exposes gen_ai_client_token_usage with api_key_id and aim_service_id labels on port 1064 (named aigw-admin). Wire Prometheus scraping via a PodMonitor targeting the Envoy pods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aim-gateway-controller now owns all gateway resources (AIGatewayRoute, InferencePool, SecurityPolicy) — the static aim-gateway Helm chart is superseded. extproc metrics are scraped by otel-collector-metrics-rest so the per-chart PodMonitor is redundant.

Authentication flows through cluster-auth extAuth — no Kubernetes secret with raw API keys is written or read. The gatewayApiKeys values block and the two GATEWAY_API_KEYS_SECRET_* env vars were from an earlier APIKeyAuth design that was replaced.

…21-add-envoy-ai-gateway

…ok cert The TLSRoute on the shared Gateway caused envoy-gateway xDS translation to fail when inserting the ext-proc request header filter — TLS passthrough filter chains have no HTTPConnectionManager. This blocked all xDS pushes to the envoy proxy, leaving it stuck in a startup probe loop. - Remove k8s-passthrough listener from Gateway spec - Delete tlsroute-k8s-passthrough.yaml - Fix admission_webhook.yaml: hoist cert variable declarations above the cert-manager/self-signed branch so caBundle is always populated when using self-signed certs

- Delete gateway-extension-kgateway-system.yaml and job-restart-kgateway.yaml from cluster-auth 0.5.9 — these were kgateway artifacts carried over from 0.5.0 and have no effect (nothing references them post-migration), but their presence is misleading and wrong. - Add valuesObject for envoy-ai-gateway in root/values.yaml to override the default-insecure-seed with a comment requiring per-deployment override. The seed was previously invisible in root/values.yaml, making it easy to ship the insecure default to production unnoticed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-gateway # Conflicts: # scripts/utils/job-cluster-tls-copy.yaml # sources/cluster-auth/0.5.0/templates/referencegrant.yaml # sources/cluster-auth/0.5.9/templates/job-restart-envoygateway.yaml

…s.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-crds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Restores kubectl access to the cluster via k8s.<domain>:443. The TLS passthrough will be moved to a separate Gateway to avoid the ext-proc/HTTPConnectionManager conflict properly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ay xDS The SecurityPolicy targeting the https Gateway caused an xDS translation failure — extAuth cannot be applied to TLS passthrough filter chains which have no HTTPConnectionManager. Removing it unblocks envoy-gateway and restores kubectl access via k8s-passthrough. Will re-implement auth at the HTTPRoute level to avoid the listener conflict. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fig annotation The aigateway.envoyproxy.io/gateway-config annotation was the root cause of the xDS crash — it tells envoy-ai-gateway to inject ext-proc globally across all Gateway filter chains, including TLS passthrough chains which have no HTTPConnectionManager. Removing it fixes the conflict. The extAuth SecurityPolicy is safe to coexist with TLS passthrough and is restored (it was already live on the cluster without issues). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The InferencePool CRD (inference.networking.k8s.io/v1) is not installed on app-dev-1. Declaring it in backendResources causes envoy-gateway to start a watcher for it at startup. When the CRD is absent the watcher fails immediately and the controller enters CrashLoopBackOff, dropping the xDS connection and taking the whole cluster down.

Removes the global SecurityPolicy/ReferenceGrant that enforced cluster-auth on the entire gateway with failOpen: false. With the envoy-gateway controller previously crash-looping (due to missing InferencePool CRD), EDS went stale and extAuth requests timed out, causing 403s across all routes. Will be restored once the cluster is stable.

…sources Install InferencePool CRD (inference-extension-crds/v1.5.0) at syncWave -35, before envoy-gateway at -30. This ensures the CRD exists when the controller starts watching for InferencePool resources, preventing the CrashLoopBackOff that took down app-dev-1. Re-enables backendResources: InferencePool in extensionManager now that the CRD installation order is guaranteed. Restores the global SecurityPolicy and ReferenceGrant for cluster-auth extAuth.

johnl-amd and others added 30 commits May 11, 2026 13:40

EAI-5821: Add Envoy AI Gateway v0.6.0 sources and enable in all profiles

e4374de

EAI-5821: Add POC notes and GitOps TODOs for Envoy AI Gateway

5c8b1a7

fix: use correct dynamic metadata namespace for AI Gateway token fields

853c45b

fix: use named port aigw-admin for AI Gateway extproc scrape

060972c

EAI-164: expose gateway API key Secret env vars in AIWB chart

3496fbe

EAI-5821: Remove dead comment file and debug annotation from aim-gate…

92ec9bc

…way chart

EAI-5821: Remove temp migration jobs, unused patch, and vendored upst…

c16f7d4

…ream README

EAI-6233: Enable InferencePool backendResources in EG extensionManager

00e0aa2

EAI-6038: Remove stale POC notes doc

e6aa059

Merge branch 'EAI_5821_evaluate_envoy_gateway_merge_main' into EAI-58…

587014e

…21-add-envoy-ai-gateway

Merge remote-tracking branch 'origin/main' into EAI-5821-add-envoy-ai…

1b1b9e1

…-gateway # Conflicts: # scripts/utils/job-cluster-tls-copy.yaml # sources/cluster-auth/0.5.0/templates/referencegrant.yaml # sources/cluster-auth/0.5.9/templates/job-restart-envoygateway.yaml

EAI-5821: Add envoy-ai-gateway and envoy-ai-gateway-crds to component…

ec49c31

…s.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

johnl-amd and others added 4 commits May 29, 2026 09:09

EAI-5821: Add SBOM metadata for envoy-ai-gateway and envoy-ai-gateway…

0bc126e

…-crds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

johnl-amd force-pushed the EAI-5821-add-envoy-ai-gateway branch 2 times, most recently from 6524972 to 5393874 Compare May 29, 2026 13:57

johnl-amd force-pushed the EAI-5821-add-envoy-ai-gateway branch from 9a8c4f6 to 99c945d Compare June 1, 2026 05:16

johnl-amd added 3 commits June 1, 2026 05:24

EAI-5821: Enable inference-extension-crds in all cluster size profiles

8dc1abb

johnl-amd closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eai 5821 add envoy ai gateway#724

Eai 5821 add envoy ai gateway#724
johnl-amd wants to merge 38 commits into
mainfrom
EAI-5821-add-envoy-ai-gateway

johnl-amd commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johnl-amd commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants