Skip to content

refactor(P4b): inject the run's Config into validators via map_validators#272

Merged
divyasinghds merged 3 commits into
developfrom
refactor/p4b-inject-config-validators
Jun 16, 2026
Merged

refactor(P4b): inject the run's Config into validators via map_validators#272
divyasinghds merged 3 commits into
developfrom
refactor/p4b-inject-config-validators

Conversation

@LukasWodka

Copy link
Copy Markdown
Collaborator

Summary

Structural refactor — phase P4 (config injection), slice 1 of the config-by-construction core. Stacked on #271 (P4a).

The path-reading validators each read a module-global config = Config() that snapshots os.environ:

field validators
SRC_PATH FileType, Image, FilePairing, BIOLabel, XML, Tokenizer
DEST_PATH Duplicate
TABLE_NAME TableName

That global is precisely what the cli/run env-var bridge (_set_legacy_env_vars) exists to feed. To remove that bridge (P4c), the validators must read the ingestor's explicitly-resolved Config instead.

Approach — one injection seam, not 40 edits

Rather than thread config= through the ~40 factory construction sites in modalities/validators.py, inject at the map_validators boundary:

  • BaseValidator gains a class-level _config = None + bind_config(config).
  • map_validators(category, options, config=None) binds the run's Config to every validator it builds. Optional — direct callers/tests that omit it keep the prior module-global fallback.
  • BaseIngestor.validate_data passes self.database.config (the injected one).
  • Each read site is now (self._config or config).X — injected when bound, module-global otherwise.
  • DuplicateValidator.dest_path became a lazy property (it read DEST_PATH in __init__, before the bind).

Behaviour preservation

During P4b the bridge still sets env and the injected Config reads the same env, so both paths resolve identically — this is pure plumbing that lets P4c flip the source of paths from env → Config overrides. New tests pin the seam: the injected Config wins over a divergent env-backed global, and omitting it falls back.

Next

P4cfile_transfer takes the resolved paths, and cli/run builds Config with overrides instead of writing os.environ (delete _set_legacy_env_vars). That's the slice that actually retires the env-var bridge.

🤖 Generated with Claude Code

LukasWodka and others added 2 commits June 16, 2026 08:47
Structural refactor phase P4 (backend#796), first slice. Twenty modules (every
validator + database + file_transfer) called setup_logging(config) at IMPORT
time — reconfiguring the root logger's handlers globally, 20x on every import:
an import-time side effect a library shouldn't have. The entrypoints
(cli/run.main and the template scripts) already call setup_logging() at
startup, so the per-module calls were redundant.

Removes the import-time setup_logging(config) call + its now-unused import from
all 20 modules. Each module's `config = Config()` and
`logger.setLevel(config.LOG_LEVEL)` are kept (full config-by-construction + the
env-var-bridge removal are the next P4 slices). Logging is now configured once,
at the app entrypoint.

Behaviour-preserving: root logging is still configured by the entrypoint; child
loggers inherit. Full unit suite 1078 passed, 96.9% coverage; e2e (real MySQL)
23 passed, 1 xpassed — characterization goldens unchanged.

Next: P4b (inject Config into the file validators) and P4c (file_transfer takes
paths; delete the cli/run env-var bridge) — the config-by-construction core,
now gateable with the local e2e harness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tors

Structural refactor phase P4 (backend#796), config-by-construction slice 1.
The path-reading validators (FileType/Image/FilePairing/BIOLabel/XML/Tokenizer
read SRC_PATH, Duplicate reads DEST_PATH, TableName reads TABLE_NAME) each read
a module-global `config = Config()` that snapshots os.environ. That global is
what the cli/run env-var bridge exists to feed; to remove that bridge (P4c) the
validators must read the ingestor's explicitly-resolved Config instead.

Adds a single injection seam rather than threading `config=` through the ~40
factory construction sites:
- BaseValidator gains a class-level `_config = None` + `bind_config(config)`.
- map_validators(category, options, config=None) binds the run's Config to
  every validator it builds. Optional, so direct callers/tests keep the prior
  module-global fallback.
- BaseIngestor.validate_data passes self.database.config (the injected one).
- Each path-reading site now reads `(self._config or config).X` — injected
  when bound, module-global otherwise. DuplicateValidator.dest_path became a
  lazy property so it resolves AFTER the bind (it read DEST_PATH in __init__).

Behaviour-preserving: during P4b the bridge still sets env and the injected
Config reads the same env, so both paths resolve identically — pure plumbing
that lets P4c flip the source of paths from env to Config overrides. New tests
pin the seam (injected Config wins over a divergent env-backed global; omitting
it falls back). Full unit suite 1080 passed, 96.9% coverage; e2e (real MySQL)
23 passed, 1 xpassed — characterization goldens unchanged.

Next: P4c — file_transfer takes the resolved paths and cli/run builds Config
with overrides instead of writing os.environ (delete _set_legacy_env_vars).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LukasWodka

Copy link
Copy Markdown
Collaborator Author

👋 Heads-up — Code review queue is at 12 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@divyasinghds divyasinghds merged commit 057763d into develop Jun 16, 2026
3 checks passed
@divyasinghds divyasinghds deleted the refactor/p4b-inject-config-validators branch June 16, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants