Skip to content

fix(env,sql,cdc): prevent env name collision from corrupting SQL DSNs#367

Merged
skhaz merged 1 commit into
mainfrom
fix/env-name-collision-dsn
Jun 21, 2026
Merged

fix(env,sql,cdc): prevent env name collision from corrupting SQL DSNs#367
skhaz merged 1 commit into
mainfrom
fix/env-name-collision-dsn

Conversation

@skhaz

@skhaz skhaz commented Jun 19, 2026

Copy link
Copy Markdown
Member

Why

A single same-named env.variable in any namespace could take down every db.sql.postgres service non-deterministically, crash-looping with pq: password authentication failed for user "password=" (or "password=secret") — the fingerprint of an empty resolved field, not a real credential. Boot ordering decided which service lost the race, so the same config booted fine on some restarts and failed on others.

What

Fixes a chain of three independent defects so each layer fails safely and loudly:

  • env registry (root cause) — variables were keyed in a global flat namespace by short Name; a Name collision rejected the second registration before storing it by ID, so full-ID lookups failed. Now always store by ID; variablesByName is a first-wins shortcut, and the shortcut delete is ownership-guarded so a departing variable cannot remove another's shortcut.
  • sql/cdc managers — a configured *_env that failed to resolve silently left the field blank. resolveEnv now fails fast with NewUnresolvedEnvError naming the field/var, distinguishing "not configured" (keep static) from "configured but unresolvable" (hard error).
  • sql DSN builder — unquoted fields let an empty username produce user= password=secret, which lib/pq misparses by absorbing the next token. buildDSN now rejects empty required fields and quotes all Postgres keyword/value fields; CDC buildDSNs likewise rejects empty resolved fields and runs before the Update teardown.

Testing

Validated locally on darwin/arm64 (no live Postgres; CDC //go:build integration tests not run):

  • make build-wippy — green.
  • go test -race ./system/env/... ./service/sql/... ./service/cdc/postgres/... — all pass.
  • golangci-lint run on the three packages — 0 issues.
  • Mutation testing (gremlins): every mutant in the changed functions killed (sql efficacy 86→95%, cdc 71→80%); surviving mutants are all in pre-existing untouched code or equivalent.
  • Before/after proof: the new cross-namespace test fails on the original registry (environment variable not found) and passes on the fix; a standalone demo reproduces the exact corrupt DSN and error fingerprints.

New tests cover cross-namespace coexistence + full-ID lookup, ownership-guarded delete, fail-fast on unresolvable env, and DSN validation/quoting (including the empty-username "absorb next token" case).

Why:
A single same-named env.variable in any namespace could take down every
db.sql.postgres service non-deterministically, crash-looping with
"password authentication failed for user \"password=\"" — the fingerprint
of an empty resolved field, not a real credential. Boot ordering decided
which service lost the race, so the same config booted fine on some
restarts and failed on others.

What:
Fixes a chain of three independent defects so each layer fails safely and
loudly.

env registry (root cause): env variables were keyed in a global flat
namespace by their short Name; a Name collision rejected the second
registration before storing it by ID, so full-ID lookups failed. Now the
variable is always stored by ID and variablesByName is a first-wins
shortcut; the shortcut delete is ownership-guarded so a departing variable
cannot remove another variable's shortcut.

sql and cdc managers: a configured *_env that failed to resolve silently
left the field blank. resolveEnv now fails fast with NewUnresolvedEnvError
naming the field and variable, distinguishing "not configured" (keep the
static value) from "configured but unresolvable" (hard error).

sql DSN builder: unquoted fields let an empty username produce
"user= password=secret", which lib/pq misparses by absorbing the next
token. buildDSN now refuses empty required fields and quotes all Postgres
keyword/value fields; CDC buildDSNs likewise rejects empty resolved fields
and runs before the Update teardown so a bad DSN aborts non-destructively.
@skhaz skhaz merged commit 8709c7a into main Jun 21, 2026
4 checks passed
@skhaz skhaz deleted the fix/env-name-collision-dsn branch June 21, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants