refactor(P4b): inject the run's Config into validators via map_validators#272
Merged
Merged
Conversation
Structural refactor phase P4 (backend#796), first slice. Twenty modules (every validator + database + file_transfer) called setup_logging(config) at IMPORT time — reconfiguring the root logger's handlers globally, 20x on every import: an import-time side effect a library shouldn't have. The entrypoints (cli/run.main and the template scripts) already call setup_logging() at startup, so the per-module calls were redundant. Removes the import-time setup_logging(config) call + its now-unused import from all 20 modules. Each module's `config = Config()` and `logger.setLevel(config.LOG_LEVEL)` are kept (full config-by-construction + the env-var-bridge removal are the next P4 slices). Logging is now configured once, at the app entrypoint. Behaviour-preserving: root logging is still configured by the entrypoint; child loggers inherit. Full unit suite 1078 passed, 96.9% coverage; e2e (real MySQL) 23 passed, 1 xpassed — characterization goldens unchanged. Next: P4b (inject Config into the file validators) and P4c (file_transfer takes paths; delete the cli/run env-var bridge) — the config-by-construction core, now gateable with the local e2e harness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tors Structural refactor phase P4 (backend#796), config-by-construction slice 1. The path-reading validators (FileType/Image/FilePairing/BIOLabel/XML/Tokenizer read SRC_PATH, Duplicate reads DEST_PATH, TableName reads TABLE_NAME) each read a module-global `config = Config()` that snapshots os.environ. That global is what the cli/run env-var bridge exists to feed; to remove that bridge (P4c) the validators must read the ingestor's explicitly-resolved Config instead. Adds a single injection seam rather than threading `config=` through the ~40 factory construction sites: - BaseValidator gains a class-level `_config = None` + `bind_config(config)`. - map_validators(category, options, config=None) binds the run's Config to every validator it builds. Optional, so direct callers/tests keep the prior module-global fallback. - BaseIngestor.validate_data passes self.database.config (the injected one). - Each path-reading site now reads `(self._config or config).X` — injected when bound, module-global otherwise. DuplicateValidator.dest_path became a lazy property so it resolves AFTER the bind (it read DEST_PATH in __init__). Behaviour-preserving: during P4b the bridge still sets env and the injected Config reads the same env, so both paths resolve identically — pure plumbing that lets P4c flip the source of paths from env to Config overrides. New tests pin the seam (injected Config wins over a divergent env-backed global; omitting it falls back). Full unit suite 1080 passed, 96.9% coverage; e2e (real MySQL) 23 passed, 1 xpassed — characterization goldens unchanged. Next: P4c — file_transfer takes the resolved paths and cli/run builds Config with overrides instead of writing os.environ (delete _set_legacy_env_vars). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
👋 Heads-up — Code review queue is at 12 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
This was referenced Jun 16, 2026
divyasinghds
approved these changes
Jun 16, 2026
…t-config-validators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Structural refactor — phase P4 (config injection), slice 1 of the config-by-construction core. Stacked on #271 (P4a).
The path-reading validators each read a module-global
config = Config()that snapshotsos.environ:SRC_PATHDEST_PATHTABLE_NAMEThat global is precisely what the
cli/runenv-var bridge (_set_legacy_env_vars) exists to feed. To remove that bridge (P4c), the validators must read the ingestor's explicitly-resolved Config instead.Approach — one injection seam, not 40 edits
Rather than thread
config=through the ~40 factory construction sites inmodalities/validators.py, inject at themap_validatorsboundary:BaseValidatorgains a class-level_config = None+bind_config(config).map_validators(category, options, config=None)binds the run's Config to every validator it builds. Optional — direct callers/tests that omit it keep the prior module-global fallback.BaseIngestor.validate_datapassesself.database.config(the injected one).(self._config or config).X— injected when bound, module-global otherwise.DuplicateValidator.dest_pathbecame a lazy property (it readDEST_PATHin__init__, before the bind).Behaviour preservation
During P4b the bridge still sets env and the injected Config reads the same env, so both paths resolve identically — this is pure plumbing that lets P4c flip the source of paths from env → Config overrides. New tests pin the seam: the injected Config wins over a divergent env-backed global, and omitting it falls back.
Next
P4c —
file_transfertakes the resolved paths, andcli/runbuildsConfigwith overrides instead of writingos.environ(delete_set_legacy_env_vars). That's the slice that actually retires the env-var bridge.🤖 Generated with Claude Code