test(e2e-proxy): exercise application-pod egress through the proxy (Charité setup)#264
Conversation
The squid harness proved NODE egress (image pulls) but stopped before any application pod — so it never caught client-runtime#119, where the spawned ingestion Job carried no proxy env and dialled the backend directly. Add a section that runs a pod WITH the ingestion-style proxy env (must traverse the squid to reach the backend) and a pod WITHOUT it (must bypass it / go direct), asserting both against the squid access log. Models the Charité proxy-only setup at the application layer; pairs with the behavioural unit tests on client-runtime#119. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
👋 Heads-up — Code review queue is at 32 / 30 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
|
Ran this against a real k3d cluster — found a bug, do not merge as-is. The node image-pull section (§3) is fine. The new app-pod egress section (§4) is broken: a pod cannot resolve Fix direction: point the app-pod test at a pod-reachable proxy — an in-cluster squid Deployment+Service (reachable via Service DNS, and a closer model of a real corporate proxy reachable by name) rather than the host squid. Locally the Service DNS resolves; finalising the squid pod's serving config. Keeping this draft until it runs green. Note the routing contract (emitted env → backend via proxy, in-cluster bypassed, no-proxy → direct) is already verified by the behavioural unit tests on client-runtime#123 — that's the coverage to gate the #122 merge on; this harness E2E is defence-in-depth. |
Running the first version on a real k3d cluster surfaced that a POD cannot resolve host.k3d.internal (it is a node-level alias for image pulls, not pod DNS), so the proxied probe failed with `curl (5) Could not resolve proxy`. Rework: stand up an in-cluster squid Deployment+Service the test pods reach by Service DNS (also a closer model of a real corporate proxy reachable by name), with a readiness probe gating rollout on squid actually listening (fixes the probe-before-bind race seen in the first attempt). A pod WITH the ingestion proxy env must reach the backend through the squid; a pod WITHOUT it must bypass it. Auth survival stays covered by the host-squid sections (1-3). bash -n + shellcheck + embedded-YAML parse all clean; Service-DNS resolution verified locally. Full proxied-curl run is exercised by the e2e-proxy CI job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Reworked + out of draft. Running it against a real k3d cluster (thanks for the docker) is what flagged the original bug: a pod can't resolve Fix (latest commit): the app-pod section now stands up an in-cluster squid Deployment+Service that the test pods reach by Service DNS — pod-reachable, and a closer model of a real corporate proxy reachable by name. A readiness probe gates rollout on squid actually listening, which fixes the "connect refused after 1ms" race the first in-cluster attempt hit. Auth-survival stays covered by §1-3's host squid; this section is purely about proxy-env routing. Verified locally: |
|
👋 Heads-up — Code review queue is at 33 / 30 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
…pod, curl -v)
§4 now uses ONE pod carrying the ingestion-style proxy env that makes two
calls to the same backend: WITH the env it must tunnel via the in-cluster
squid (a CONNECT tunnel); with the env unset it must dial direct. Proof is
taken client-side from `curl -v` (the CONNECT-tunnel lines), not by reading
squid's access.log — that file is buffered by the log daemon and came back
empty when read right after the probe, producing false failures.
Also set BOTH proxy-env cases: curl honours the lower-case `https_proxy`
for HTTPS and the upper-case alone is not reliably picked up, so the probe
must emit both — exactly as the real ingestion env does. A single pod with
a single log also removes the multi-pod scheduling / log-flush races that
made the earlier two-pod form flaky.
Validated end-to-end on k3d:
A (proxy env) -> "Establish HTTP proxy tunnel to api.tracebloc.io:443"
+ "CONNECT tunnel established, response 200" + 200 OK
B (env unset) -> direct connect to the backend IP, no proxy tunnel, 200
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ Validated green end-to-end on k3dStood up the in-cluster squid + the §4 probe against a live k3d cluster and proved both directions. Pushed in A (pod carries the ingestion-style proxy env) → backend reached through the in-cluster squid (a real CONNECT tunnel). B (same pod, proxy env unset) → the same call dials the backend's IP directly. That is exactly the #119 property: ingestion-style backend egress is proxied when the env is present, and only then. Two substantive changes from the first cut
Design noteCollapsed the earlier two-pod form into one pod making two calls (with env / env-unset). One pod + one log removes the multi-pod scheduling and log-flush races, so the assertion is deterministic in CI.
|
Draft — needs a CI run to validate (I can't run k3d/squid/docker locally).
What
Extends the
e2e-proxy.shsquid harness ("the Charité/hospital archetype") to cover application-pod egress, not just node image pulls. After the cluster is up behind the authenticated squid, it:HTTP(S)_PROXY= squid, cluster-safeNO_PROXY) → asserts its backendCONNECT api.tracebloc.ioappears in the squid access log (authenticated);Why
The harness proved NODE egress but stopped before any application pod, so it never caught client-runtime#119 — the spawned ingestion Job carried no proxy env and dialled the backend directly (Charité:
[Errno 111] Connection refused). This is the layer the fix lives at.Why draft
bash -n+shellcheckpass, but I can't runk3d/squid/dockerin my environment, so the runtime behaviour is unverified locally. Theinstaller-tests.yamle2e-proxyjob is the validator. The one assumption to confirm there: a pod can reach the host's squid viahost.k3d.internal:3128(k3d publisheshost.k3d.internalinto CoreDNS NodeHosts, so it should — but CI confirms). The behavioural routing contract itself is already verified by the unit tests on client-runtime#119.Note
In a real proxy-only network (Charité) the no-proxy pod's direct dial is refused; the test cluster's nodes have direct egress, so this asserts the absence of a proxied CONNECT instead. A true "direct refused" would need egress-blocking (k3d's flannel doesn't enforce NetworkPolicy) — tracked as a follow-up if we want it.