Reliable cleanup of delivered config and secrets on cells (no orphan leak)#144
Open
scotwells wants to merge 8 commits into
Open
Reliable cleanup of delivered config and secrets on cells (no orphan leak)#144scotwells wants to merge 8 commits into
scotwells wants to merge 8 commits into
Conversation
b4714a8 to
b244bcb
Compare
02f4ac6 to
883e7ac
Compare
b244bcb to
758a87f
Compare
883e7ac to
60844b5
Compare
758a87f to
90721bf
Compare
60844b5 to
c07ab0a
Compare
90721bf to
14299a1
Compare
c07ab0a to
a05ae62
Compare
14299a1 to
7598e21
Compare
a05ae62 to
443cf5d
Compare
7598e21 to
0b5a5c5
Compare
15117eb to
ba309d3
Compare
The referenced-data controller materializes companion copies of the ConfigMaps and Secrets a Workload references into its hub namespace, where Karmada propagates them to the cell so kraftlet can mount them. The compute-manager ClusterRole on the Karmada hub had no core ConfigMap/Secret rule, so every reconcile failed with a forbidden error: companions were never written, the expected-referenced-data annotation was never stamped, and nothing propagated to the cell. ConfigMap/Secret mounts silently did nothing on the edge. Grant the controller full lifecycle access (it owns the companions, including ref-count deletion) to configmaps and secrets in its namespace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit c2a7794)
…ed before Karmada finishes GC
Part 1 — ordering guard: WorkloadDeploymentFederator.Finalize now checks
whether any companion ConfigMaps or Secrets (referenced-data=true label)
remain in the downstream namespace before calling
cleanupPropagationPolicyIfUnused. The guard only fires when this is the last
WD for its city code (mirroring cleanupPropagationPolicyIfUnused's condition),
so deleting WD-A cannot block on a live WD-B's companion in the shared
namespace. If companions are present Finalize returns an errCompanionsStillPresent
sentinel; Reconcile intercepts it (walking the kerrors.Aggregate the finalizer
framework returns), logs at Info, and sets RequeueAfter — no error-metric
inflation. After companionGuardTimeout (2 min) the guard bypasses itself so a
wedged referenced-data controller cannot permanently block deletion.
Part 2 — authoritative cell-side GC: CompanionGCReconciler watches
WorkloadDeployment events on each cell and deletes companion ConfigMaps/Secrets
whose every referenced-by entry resolves to no live WD on the local cell.
Critical fix: the referenced-by annotation is written by the hub
ReferencedDataController as "projectNamespace/wdName" (e.g.
"default/mount-pristine-default-dfw"), but the cell WD lives in ns-{uid}.
hasLiveReferrer now ignores the namespace in the key and looks up by NAME
ONLY in the companion's own namespace — preventing false "referrer absent"
conclusions that would delete an actively-mounted companion. SetupWithManager
uses WithEngageWithLocalCluster(false) to match all other cell controllers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit f97a3cb)
…r-wide informer OOM
The CompanionGCReconciler previously registered For(&corev1.ConfigMap{}) which
established a cluster-wide ConfigMap informer with no label-scoped cache on the
cell manager. On a cell cluster this caused controller-runtime to sync and hold
every ConfigMap (and, via lazy cache reads, every Secret) in the cluster, pushing
the cell compute-manager well past its 128Mi limit → OOMKilled / CrashLoopBackOff.
Fix: change the For type to WorkloadDeployment. WorkloadDeployments are already
cached on the cell by the sibling WorkloadDeploymentReconciler, so this adds no
new cluster-wide informer. A WD delete event still enqueues the object's namespace
and fires Reconcile, which is the deletion trigger the GC needs.
All companion reads (ConfigMap/Secret List) inside Reconcile go through
cl.GetAPIReader() (uncached). A one-shot List via the API reader does not
establish a persistent informer, so it does not re-introduce the OOM. WD
liveness checks use the cached client because WDs are already in the cell cache.
Reconcile now sweeps the entire WD namespace (listing companions by label via
the uncached reader) rather than reconciling a single named object, and returns
ctrl.Result{} (event-driven only, no per-WD requeue).
The periodic backstop coverage gap is closed by a companionGCBackstop Runnable
(implements mcmanager.Runnable = manager.Runnable + multicluster.Aware). On
Engage it records each cell cluster; on each ticker interval it lists ALL
companions cluster-wide via the uncached APIReader, collects distinct namespaces,
and sends namespace-keyed GenericEvents into a buffered channel wired via
WatchesRawSource(source.TypedChannel). This covers namespaces whose last WD was
deleted before the controller started, which For(WD) would never enqueue. The
steady-state load is bounded to one cross-namespace List per interval regardless
of WD history, and no persistent CM/Secret informer is ever created.
The federator's companionsStillPresent check (workloaddeployment_federator.go)
reads from FederationClient (the Karmada hub client) and is unaffected by this
change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 9a064cc)
Remove the cell-side CompanionGCReconciler — deleting a Karmada-owned cell
copy directly causes a permanent delete/recreate thrash because the Work
object immediately re-applies the manifest. Hub-side cleanup is the only
correct layer.
Component 5: Delete companion_gc_controller.go and its test, and remove the
--enable-cell-controllers registration block from cmd/main.go.
Component 3: After deleting a hub companion (ref-count reaches zero), the
downstreamCompanionWriter now also deletes the companion's Karmada
ResourceBinding via hubClient. RB name follows the Karmada binding-controller
convention: "{companionName}-{configmap|secret}". Deleting the RB cascades:
binding-controller removes the Work, execution-controller removes the cell
copy permanently. IgnoreNotFound tolerates Karmada beating us to it.
localCompanionWriter (single-cluster dev mode) is a no-op.
Component 4: New OrphanRBReconciler runs on the Karmada hub federation
manager (alongside InstanceProjector). It watches ResourceBindings and
deletes any that are orphaned companion RBs — name ends with "-configmap" or
"-secret" AND propagationpolicy.karmada.io/name starts with "city-" AND the
hub companion no longer exists. WD RBs (suffix "-workloaddeployment") and
non-city-PP RBs are never touched. A periodic sweep Runnable fires every 5
minutes to catch RBs orphaned before the controller started, which will
automatically clean the existing stranded lab RBs on first deployment.
RBAC: add delete to work.karmada.io/resourcebindings in downstream-rbac
(hub ClusterRole). karmada-io/api work/v1alpha2 types added to global
scheme so the federation client can serialize ResourceBinding objects.
Tests: unit tests for Component 3 (RB teardown fires on ConfigMap/Secret
companion deletion, tolerates NotFound, no-op on localCompanionWriter) and
Component 4 (orphan detection, skip live/terminating companions, tight scope
guards against WD RBs and non-city-PP RBs, name pattern parsing).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit ac25bf7)
OrphanRBReconciler scoped referenced-data companion ResourceBindings by reading propagationpolicy.karmada.io/name from metadata.labels, but the running Karmada version stores that value in metadata.annotations (labels carry only permanent-id UUIDs). The label read always returned "", so the scope predicate rejected every ResourceBinding and the controller never reclaimed an orphaned binding — confirmed live in the lab, where the stranded cm-pristine/secret-pristine copies were never cleaned. Read from annotations in both isInScope and the watch predicate. The unit-test fixtures previously set the PP name as a label (matching the bug, which is why they passed against broken code); they now use the production annotation shape, so they fail against the label read and pass against the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit e321370)
Add CompanionGCReconciler, a level-triggered backstop for referenced-data
companions stranded by interrupted finalization. When the
ReferencedDataController's WD finalizer is interrupted mid-flight (pod restart,
SIGKILL), a hub companion can be left with a referenced-by annotation pointing
at WDs that no longer exist on the hub. Without this controller, the companion
persists forever (leaking data including Secret bytes), its ResourceBinding keeps
a Work alive on the hub, and Karmada continuously re-creates the cell copy —
exactly the cm-pristine/secret-pristine lab scenario.
Design:
- Watches referenced-data-labeled ConfigMaps and Secrets on the hub. The
federationMgr cache for ConfigMaps and Secrets is restricted to objects
carrying the ReferencedDataLabel via cache.Options.ByObject (cmd/main.go).
This is the OOM guard: predicates filter events, not cache contents; without
the cache-level scope the informer would list-and-watch every ConfigMap/Secret
on the Karmada hub — the same unscoped-informer pattern that OOMKilled the
cell CompanionGCReconciler. The label predicate on the controller is kept as
belt-and-suspenders but is NOT the primary memory guard.
- On each reconcile, parses the referenced-by annotation and checks each
referenced WD by name in the companion's own hub namespace (ns-{project-uid})
via HubClient.Get — no MCManager needed because WDs are federated to the hub.
- If ALL referrers are absent: deletes the companion and its ResourceBinding via
the downstreamCompanionWriter path, driving the full Karmada cascade
(RB → Work → cell copy deleted permanently).
- Conservative safety: terminating WDs count as present; corrupt annotations
and malformed WD keys are handled by skip-not-delete.
- A companionGCPeriodicSweep backstop fires every 5 minutes to catch companions
stranded before the controller started.
- Wired in setupManagementControllers on the federationMgr alongside
OrphanRBReconciler and InstanceProjector.
Unit tests cover: orphaned CM/Secret deleted + RB torn down; live referrer
preserved; terminating referrer counts as present; all-absent multi-referrer
deleted; partial multi-referrer preserved; corrupt annotation preserved;
empty annotation preserved; unlabeled object unaffected; RB-already-gone
tolerated; periodic sweep drives reconciliation.
Federated e2e (test/e2e/referenced-data-delete-cascade/):
- HAPPY-PATH CASCADE: create WD + ConfigMap/Secret → assert companions on hub +
cell + RBs present → delete WD → assert hub companion, RBs, cell copies all
deleted and stay deleted (30 s anti-thrash poll).
- STRANDED COMPANION BACKSTOP: inject a labeled companion with a dead WD key →
assert CompanionGCReconciler reclaims it within the sweep interval.
The e2e test is correct and self-contained but requires the Kind+Karmada
harness (task e2e:up) which is not runnable headless.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit e419b54)
The #37 predicate (wdReferencedDataChangedPredicate) on the WorkloadDeployment For() watch dropped metadata-only Update events. A new WD's first reconcile adds the workload-controller finalizer and returns; that finalizer-add produces an Update with no generation bump and no ReferencedDataReady change, so the predicate filtered it and the WD never got a second reconcile — wedging every new WD at the finalizer stage until a controller restart. Remove the predicate entirely. The equality.Semantic.DeepEqual guard before Status().Update (added in the same #37 change) already prevents the self-trigger loop the predicate was meant to stop, and watching all updates lets new WDs proceed past the finalizer stage immediately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit b94a40a)
Activates the scheduling-gate + Instance-level Ready=False/SourceNotFound contract for the lab (datum-us-central-1-lab). The flag is set in the cell overlay's ConfigMap patch so it only affects deployments using the cell OCI bundle (lab); management-plane and dev overlays are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 2cf963f)
0b5a5c5 to
ada425d
Compare
ba309d3 to
8a65d48
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Value
Cells no longer accumulate orphaned config and secrets. Companion ConfigMaps/Secrets delivered to cells are reliably cleaned up when their WorkloadDeployment goes away or a PropagationPolicy is deleted before Karmada finishes GC — so operators don't leak stale referenced data across cells over time.
This cleanup is isolated into its own PR on purpose: it touches cluster-wide informer scoping and Karmada ResourceBinding handling, which is the riskiest surface in the whole feature and deserves focused review.
What
The federation manager's cache is scoped via
cache.Options.ByObjectwith a companion label selector for both ConfigMap and Secret, andCompanionGCReconciler/OrphanRBReconciler/InstanceProjectorall run on that single scoped manager. This is the load-bearing protection: a label predicate filters events but does not scope the informer cache — an unscopedFor(&ConfigMap{})on the hub would cache every CM/Secret in the cluster and OOM (the same pattern that previously killed the cell-side reconciler). Thecmd/main.gosetup documents this explicitly. Confirm theByObjectscope is intact and that nothing on this manager establishes an unscoped cluster-wide CM/Secret informer.Stack
Stacks on the blocking-reason PR.
go build/vet/test/golangci-lintgreen at tip.🤖 Generated with Claude Code
Also folded in (from #145): enables the
ReferencedDataGatefeature flag on the lab cell overlay.