Skip to content

Eai 5821 add envoy ai gateway#724

Closed
johnl-amd wants to merge 38 commits into
mainfrom
EAI-5821-add-envoy-ai-gateway
Closed

Eai 5821 add envoy ai gateway#724
johnl-amd wants to merge 38 commits into
mainfrom
EAI-5821-add-envoy-ai-gateway

Conversation

@johnl-amd

Copy link
Copy Markdown

No description provided.

johnl-amd and others added 30 commits May 11, 2026 13:40
envoy_gateway_cluster_role_for_ai_resources.yaml referenced the
undefined ai-gateway-helm.fullname helper, breaking ArgoCD manifest
generation. Use ai-gateway-helm.controller.fullname instead, which
is defined in _helpers.tpl and resolves to ai-gateway-controller.
aim-gateway is a thin Helm chart (v0.1.0) that creates the
AIGatewayRoute and Backend resources for routing AIM Engine
inference traffic through Envoy AI Gateway.

The patches directory documents the two fixes applied to the
ai-gateway-controller v0.6.0 source to make it work alongside
TLS listeners: insertRouterLevelAIGatewayExtProc and
insertRequestHeaderToMetadataFilter both returned errors instead
of continuing when encountering non-HCM filter chains (e.g. the
EmptyCluster default-reject chain on TLS listeners). Both are
changed to continue, consistent with other functions in the same
codebase.
Add aim-gateway to root/values.yaml and all cluster size enabledApps
lists (small/medium/large) with syncWave 5 (after envoy-ai-gateway).

Make the backends list empty by default with an example comment so
clusters override it in cluster-values rather than modifying chart
source. Guard the AIGatewayRoute template against an empty backends
list to avoid index-out-of-range failures during Helm render.
The TLSRoute created a non-HCM filter chain on the HTTPS listener that
triggered the ai-gateway v0.6.0 bug. Removing it means the stock v0.6.0
image works without patching. The route is out of scope for EAI-5821
and it is unclear whether anything depended on it.
- Add llm_input_token, llm_output_token, llm_total_token dynamic metadata
  fields to the Envoy JSON access log via %DYNAMIC_METADATA(envoy.filters.http.ext_proc:...)%
- Add model field from x-ai-eg-model header for per-model attribution
- Add OpenTelemetryCollector DaemonSet to aim-gateway chart that reads
  Envoy proxy access logs and emits aim_gateway_inference_requests_total{model}
  counter to Mimir via the OTel count connector
- Token values are also available in Loki for sum_over_time queries

Closes EAI-6232
Adds the extensionManager block so Envoy Gateway registers the AI
Gateway controller (port 1063) as its xDS extension server. Without
this, the ext_proc HTTP filter is never injected into the Envoy
listener, leaving llm_input_token / llm_output_token / llm_total_token
null in every access log entry despite the extproc sidecar running.

Also persists enableBackend: true (required for Backend CRD references)
which had previously only been applied as a live cluster patch.
Add Backend + EnvoyExtensionPolicy + GatewayConfig to make
llm_input_token, llm_output_token, llm_total_token populate in
Envoy access logs for inference requests.

Root cause: Envoy's setDynamicMetadata() silently drops ext_proc
response metadata unless writableNamespaces lists the namespace.
The EnvoyExtensionPolicy now declares io.envoy.ai_gateway as a
writable namespace so token metadata reaches StreamInfo.

GatewayConfig with globalLLMRequestCosts ensures the sidecar emits
token metadata for K8s Service backends (non-InferencePool routes),
where aigw_route_name is absent and route-scoped costs never match.

Gateway annotation links to the GatewayConfig so the AI Gateway
controller includes global costs in the filter config secret.
Replace the aim-gateway DaemonSet filelog collector (which parsed JSON
access logs to derive aim_gateway_inference_requests_total) with a single
Prometheus scrape job in the cluster-wide otel-collector-metrics-rest.

The ai-gateway-extproc sidecar exposes native gen_ai_* metrics
(gen_ai_client_token_usage, gen_ai_server_request_duration_seconds, etc.)
at port 1064/metrics on the Envoy data-plane pod. Pod discovery filters on
app.kubernetes.io/component=proxy in the envoy-gateway-system namespace.
Wires Envoy Gateway native apiKeyAuth into the aim-gateway chart so that
inference requests require a valid Bearer token validated against a k8s Secret.
The matched client ID flows downstream as x-api-key-id for future rate limiting
and metrics labeling.

Targets the HTTPRoute (aim-inference-route) rather than the Gateway to avoid
conflicting with the cluster-auth ExtAuth SecurityPolicy at Gateway level.
Gated by both .Values.backends and .Values.apiKeyAuth.enabled — no-op when
disabled or no backends configured.

Enables apiKeyAuth for the small cluster profile via values_small.yaml.
…QuotaPolicy

Routes x-api-key-id (injected by SecurityPolicy) and x-aim-service-id (injected
per backendRef via headerMutation) through to gen_ai_* metric labels via the
extproc's native metricsRequestHeaderAttributes mapping. No custom code needed.

Adds a QuotaPolicy template that creates per-API-key token quota buckets using
clientSelectors type:Distinct on x-api-key-id. Shadow mode by default so quota
consumption is observed in metrics before enforcement is turned on.
QuotaPolicy defaultBucket (global safety-net) has no shadowMode field,
so it enforces immediately even when per-key rules are in shadow mode.
Gate it behind quotaPolicy.globalEnforced (default false) so the
initial rollout only observes per-key consumption without hard limits.

Add BOOTSTRAP comment to security-policy.yaml documenting that the
API key Secret must be pre-created; it is not managed by this chart.
…cy namespace

Envoy Gateway requires apiKeyAuth credentialRefs to be in the same namespace
as the SecurityPolicy. All AIM SecurityPolicies are created in the AIM
deployment namespace (workbench), so the secret must also live there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ster-auth

Per-AIM SecurityPolicies are created in the workload namespace (e.g., workbench)
but need to reference the cluster-auth extAuth Service in the cluster-auth namespace.
The existing ReferenceGrant only allowed envoy-gateway-system. Without this fix,
Envoy Gateway sets direct_response: 500 on all AIGatewayRoute rules, breaking inference.

Also includes AIWB RBAC additions for Envoy Gateway and InferencePool resources.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The extproc sidecar exposes gen_ai_client_token_usage with api_key_id
and aim_service_id labels on port 1064 (named aigw-admin). Wire
Prometheus scraping via a PodMonitor targeting the Envoy pods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aim-gateway-controller now owns all gateway resources (AIGatewayRoute,
InferencePool, SecurityPolicy) — the static aim-gateway Helm chart is
superseded. extproc metrics are scraped by otel-collector-metrics-rest
so the per-chart PodMonitor is redundant.
Authentication flows through cluster-auth extAuth — no Kubernetes secret
with raw API keys is written or read. The gatewayApiKeys values block and
the two GATEWAY_API_KEYS_SECRET_* env vars were from an earlier APIKeyAuth
design that was replaced.
…ok cert

The TLSRoute on the shared Gateway caused envoy-gateway xDS translation to
fail when inserting the ext-proc request header filter — TLS passthrough
filter chains have no HTTPConnectionManager. This blocked all xDS pushes to
the envoy proxy, leaving it stuck in a startup probe loop.

- Remove k8s-passthrough listener from Gateway spec
- Delete tlsroute-k8s-passthrough.yaml
- Fix admission_webhook.yaml: hoist cert variable declarations above the
  cert-manager/self-signed branch so caBundle is always populated when using
  self-signed certs
- Delete gateway-extension-kgateway-system.yaml and job-restart-kgateway.yaml
  from cluster-auth 0.5.9 — these were kgateway artifacts carried over from
  0.5.0 and have no effect (nothing references them post-migration), but their
  presence is misleading and wrong.

- Add valuesObject for envoy-ai-gateway in root/values.yaml to override the
  default-insecure-seed with a comment requiring per-deployment override.
  The seed was previously invisible in root/values.yaml, making it easy to
  ship the insecure default to production unnoticed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-gateway

# Conflicts:
#	scripts/utils/job-cluster-tls-copy.yaml
#	sources/cluster-auth/0.5.0/templates/referencegrant.yaml
#	sources/cluster-auth/0.5.9/templates/job-restart-envoygateway.yaml
…s.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
johnl-amd and others added 4 commits May 29, 2026 09:09
…-crds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restores kubectl access to the cluster via k8s.<domain>:443.
The TLS passthrough will be moved to a separate Gateway to avoid
the ext-proc/HTTPConnectionManager conflict properly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ay xDS

The SecurityPolicy targeting the https Gateway caused an xDS translation
failure — extAuth cannot be applied to TLS passthrough filter chains which
have no HTTPConnectionManager. Removing it unblocks envoy-gateway and
restores kubectl access via k8s-passthrough. Will re-implement auth at the
HTTPRoute level to avoid the listener conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fig annotation

The aigateway.envoyproxy.io/gateway-config annotation was the root cause of
the xDS crash — it tells envoy-ai-gateway to inject ext-proc globally across
all Gateway filter chains, including TLS passthrough chains which have no
HTTPConnectionManager. Removing it fixes the conflict.

The extAuth SecurityPolicy is safe to coexist with TLS passthrough and is
restored (it was already live on the cluster without issues).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@johnl-amd johnl-amd force-pushed the EAI-5821-add-envoy-ai-gateway branch 2 times, most recently from 6524972 to 5393874 Compare May 29, 2026 13:57
The InferencePool CRD (inference.networking.k8s.io/v1) is not installed
on app-dev-1. Declaring it in backendResources causes envoy-gateway to
start a watcher for it at startup. When the CRD is absent the watcher
fails immediately and the controller enters CrashLoopBackOff, dropping
the xDS connection and taking the whole cluster down.
@johnl-amd johnl-amd force-pushed the EAI-5821-add-envoy-ai-gateway branch from 9a8c4f6 to 99c945d Compare June 1, 2026 05:16
johnl-amd added 3 commits June 1, 2026 05:24
Removes the global SecurityPolicy/ReferenceGrant that enforced cluster-auth
on the entire gateway with failOpen: false. With the envoy-gateway controller
previously crash-looping (due to missing InferencePool CRD), EDS went stale
and extAuth requests timed out, causing 403s across all routes.

Will be restored once the cluster is stable.
…sources

Install InferencePool CRD (inference-extension-crds/v1.5.0) at syncWave -35,
before envoy-gateway at -30. This ensures the CRD exists when the controller
starts watching for InferencePool resources, preventing the CrashLoopBackOff
that took down app-dev-1.

Re-enables backendResources: InferencePool in extensionManager now that the
CRD installation order is guaranteed.

Restores the global SecurityPolicy and ReferenceGrant for cluster-auth extAuth.
@johnl-amd johnl-amd closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants