test(e2e-proxy): deflake §4 app-egress — hermetic target + non-silent diagnostics#269
Merged
Merged
Conversation
… diagnostics
§4 ("APPLICATION-pod egress through a proxy", client-runtime#119) was a flaky
required check ("E2E auth-proxy (squid)") that intermittently red-X'd develop
(~1 in 4; e.g. run 27765964135) and randomly blocked unrelated PRs. Two causes:
1. Silent failure. Under `set -euo pipefail` the diagnostic `grep | sed` lines
ran before the real assertion; an empty section made grep exit 1 → pipefail
→ set -e killed the script with NO output (CI showed only "pod/egress-app
created" then "exit code 1"). Append `|| true` so the diagnostics are
non-fatal and the assertion fires with its reason. Same footgun fixed in §3.
2. External-network dependency (the real flake). §4 curled the real
https://api.tracebloc.io/ through the in-cluster squid, depending on the
runner's internet to a production host at test time. Make it hermetic: target
a reserved-TLD stand-in host (backend.tracebloc-e2e.test) aliased via
hostAliases on both the squid and app pods to the cluster's own kube-apiserver
ClusterIP — a guaranteed in-cluster HTTPS:443 listener. The CONNECT tunnel now
terminates in-cluster with zero external I/O, preserving the #119 intent
(WITH proxy env → CONNECT tunnel via squid; env unset → direct dial).
Validated: 3/3 deterministic local passes; both calls hit 10.43.0.1 in-cluster
(no api.tracebloc.io reachout). bash -n + shellcheck --severity=error clean.
Closes #268
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
shujaatTracebloc
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
scripts/tests/e2e-proxy.sh§4 — theegress-appapplication-pod egress test (client-runtime#119) — is the flaky required check "E2E auth-proxy (squid)". It fails intermittently ondevelopitself (~1 in 4 recent runs; e.g. run 27765964135 failed while neighbouring runs passed), so it randomly blocks unrelated PRs. This is a flaky-required-check fix, branched offdevelop(it's develop's test, not specific to any feature branch).Two distinct root causes, both fixed:
1. Silent failure masked the real cause
Under
set -euo pipefail, the diagnosticgrep | sedlines ran before the real assertion. When the captured curl section was empty (the failure case),grepexited 1 →pipefailfailed the pipeline →set -ekilled the script with no output. The CI log showed onlypod/egress-app createdthen##[error]Process completed with exit code 1— the informativeerror "App pod WITH the ingestion proxy env did NOT tunnel…"never ran.Fix: the two diagnostic
grep | sedlines now end in|| true, so they're non-fatal and the real assertions fire and report the actual reason. (The identical footgun one section up — §3's squid-access-log preview — is fixed the same way.)2. External-network dependency (the actual flake)
§4's
egress-apppod curled the realhttps://api.tracebloc.io/through a freshly-deployed in-cluster squid. That depends on the in-cluster squid having working egress to a production host at the exact moment the test runs (DNS/latency/transient unavailability + pod-startup timing) — inherently flaky on CI runners. The failing run exited ~13s after creating the pod.Fix — make it hermetic. Section A/B now target a reserved-TLD stand-in host,
backend.tracebloc-e2e.test(RFC 6761.test, guaranteed never to resolve publicly), aliased via/etc/hosts(hostAliases) on both the squid pod and the app pod to the cluster's own kube-apiserver ClusterIP — a guaranteed, always-up in-cluster HTTPS:443 listener. The squid's CONNECT tunnel terminates against a real in-cluster TLS endpoint; the test never leaves the cluster.The client-runtime#119 intent is intact:
(
-k: the apiserver presents the cluster-CA cert, untrusted here — we assert proxy routing, not TLS trust, so verification is skipped and both calls complete to a real 401.)Validation
3/3 deterministic local passes on a real k3d cluster (arm64). Representative §4 output:
Both calls hit
10.43.0.1(the in-cluster apiserver) — noapi.tracebloc.ioreachout.bash -n scripts/tests/e2e-proxy.sh✓shellcheck --severity=error --shell=bash(the CI gate) ✓ — and--severity=warningclean too.Closes #268
Note
Low Risk
Test-only changes to an E2E shell script; no production runtime, auth, or deployment logic.
Overview
Stabilizes the E2E auth-proxy check in
scripts/tests/e2e-proxy.shby fixing two failure modes in §3–§4.Diagnostics no longer abort the script: preview
grep | sedpipelines (squid access log and curl log sections) now end with|| true, so underset -euo pipefailan empty match cannot exit before the realerrorassertions run.§4 app-pod egress is hermetic: instead of curling production
api.tracebloc.iothrough the in-cluster squid (CI internet flake), both the squid deployment andegress-apppod usehostAliasesto mapbackend.tracebloc-e2e.testto the cluster kube-apiserver ClusterIP. Manifests are applied via unquoted heredocs so${APISERVER_IP}/${BACKEND_HOST}substitute; curl uses-kto assert proxy routing (CONNECT tunnel vs direct dial), not TLS trust. Assertions and pass messaging were updated for the stand-in host.Reviewed by Cursor Bugbot for commit ca66e5c. Bugbot is set up for automated code reviews on this repo. Configure here.