fix(env,sql,cdc): prevent env name collision from corrupting SQL DSNs#367
Merged
Conversation
Why: A single same-named env.variable in any namespace could take down every db.sql.postgres service non-deterministically, crash-looping with "password authentication failed for user \"password=\"" — the fingerprint of an empty resolved field, not a real credential. Boot ordering decided which service lost the race, so the same config booted fine on some restarts and failed on others. What: Fixes a chain of three independent defects so each layer fails safely and loudly. env registry (root cause): env variables were keyed in a global flat namespace by their short Name; a Name collision rejected the second registration before storing it by ID, so full-ID lookups failed. Now the variable is always stored by ID and variablesByName is a first-wins shortcut; the shortcut delete is ownership-guarded so a departing variable cannot remove another variable's shortcut. sql and cdc managers: a configured *_env that failed to resolve silently left the field blank. resolveEnv now fails fast with NewUnresolvedEnvError naming the field and variable, distinguishing "not configured" (keep the static value) from "configured but unresolvable" (hard error). sql DSN builder: unquoted fields let an empty username produce "user= password=secret", which lib/pq misparses by absorbing the next token. buildDSN now refuses empty required fields and quotes all Postgres keyword/value fields; CDC buildDSNs likewise rejects empty resolved fields and runs before the Update teardown so a bad DSN aborts non-destructively.
wolfy-j
approved these changes
Jun 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A single same-named
env.variablein any namespace could take down everydb.sql.postgresservice non-deterministically, crash-looping withpq: password authentication failed for user "password="(or"password=secret") — the fingerprint of an empty resolved field, not a real credential. Boot ordering decided which service lost the race, so the same config booted fine on some restarts and failed on others.What
Fixes a chain of three independent defects so each layer fails safely and loudly:
Name; aNamecollision rejected the second registration before storing it by ID, so full-ID lookups failed. Now always store by ID;variablesByNameis a first-wins shortcut, and the shortcut delete is ownership-guarded so a departing variable cannot remove another's shortcut.*_envthat failed to resolve silently left the field blank.resolveEnvnow fails fast withNewUnresolvedEnvErrornaming the field/var, distinguishing "not configured" (keep static) from "configured but unresolvable" (hard error).user= password=secret, which lib/pq misparses by absorbing the next token.buildDSNnow rejects empty required fields and quotes all Postgres keyword/value fields; CDCbuildDSNslikewise rejects empty resolved fields and runs before the Update teardown.Testing
Validated locally on darwin/arm64 (no live Postgres; CDC
//go:build integrationtests not run):make build-wippy— green.go test -race ./system/env/... ./service/sql/... ./service/cdc/postgres/...— all pass.golangci-lint runon the three packages — 0 issues.environment variable not found) and passes on the fix; a standalone demo reproduces the exact corrupt DSN and error fingerprints.New tests cover cross-namespace coexistence + full-ID lookup, ownership-guarded delete, fail-fast on unresolvable env, and DSN validation/quoting (including the empty-username "absorb next token" case).