Parent epic: tracebloc/client-runtime#116 (WS3). Follow-up to #88 / PR #89, which shipped the tracebloc cluster doctor MVP (6 checks) and deferred these three.
Checks to add
Sequencing
PR 1: node fit + image pullability (read-only, fit doctor's existing client-go pattern).
PR 2: in-cluster egress probe (after confirming the egress-proxy's probe contract in the client repo).
Target branch: develop.
Parent epic: tracebloc/client-runtime#116 (WS3). Follow-up to #88 / PR #89, which shipped the
tracebloc cluster doctorMVP (6 checks) and deferred these three.Checks to add
RESOURCE_REQUESTS,GPU_REQUESTSenv). If no node can fit a job, surface ✖ "training jobs can't schedule" (the silent "Pending forever, no node" class). Read-only.tracebloc.useImagePullSecrets), verify that secret exists and is a well-formedkubernetes.io/dockerconfigjsonin the namespace, so private-image pulls don'tImagePullBackOff. Read-only.Backend egresscheck probes from the CLI host, which isn't the cluster's egress path (the cluster egresses via the egress-proxy). A real probe needs to run inside the cluster — either port-forward to the egress-proxy's own connectivity probe endpoint (preferred; reusesinternal/submit/portforward.go, stays side-effect-light), or exec/spawn a probe pod (side-effecting; breaks doctor's read-only contract). Separate design + PR — do not rush into the read-only checks PR.Sequencing
PR 1: node fit + image pullability (read-only, fit doctor's existing client-go pattern).
PR 2: in-cluster egress probe (after confirming the egress-proxy's probe contract in the
clientrepo).Target branch:
develop.