Skip to content

Reliable cleanup of delivered config and secrets on cells (no orphan leak)#144

Open
scotwells wants to merge 8 commits into
split/refdata-blocking-reasonfrom
split/refdata-hub-gc
Open

Reliable cleanup of delivered config and secrets on cells (no orphan leak)#144
scotwells wants to merge 8 commits into
split/refdata-blocking-reasonfrom
split/refdata-hub-gc

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

@scotwells scotwells commented Jun 4, 2026

Value

Cells no longer accumulate orphaned config and secrets. Companion ConfigMaps/Secrets delivered to cells are reliably cleaned up when their WorkloadDeployment goes away or a PropagationPolicy is deleted before Karmada finishes GC — so operators don't leak stale referenced data across cells over time.

This cleanup is isolated into its own PR on purpose: it touches cluster-wide informer scoping and Karmada ResourceBinding handling, which is the riskiest surface in the whole feature and deserves focused review.

What

  • Grants the referenced-data controller ConfigMap/Secret access on the Karmada hub.
  • Drives companion GC off WorkloadDeployment events and a level-triggered backstop, rather than a cluster-wide informer.
  • Prevents orphaned cell companions when a PropagationPolicy is deleted before Karmada GC; reads the Karmada PP name from ResourceBinding annotations; removes a WD-watch predicate that wedged new WorkloadDeployments.

⚠️ OOM guard (please verify in review)

The federation manager's cache is scoped via cache.Options.ByObject with a companion label selector for both ConfigMap and Secret, and CompanionGCReconciler / OrphanRBReconciler / InstanceProjector all run on that single scoped manager. This is the load-bearing protection: a label predicate filters events but does not scope the informer cache — an unscoped For(&ConfigMap{}) on the hub would cache every CM/Secret in the cluster and OOM (the same pattern that previously killed the cell-side reconciler). The cmd/main.go setup documents this explicitly. Confirm the ByObject scope is intact and that nothing on this manager establishes an unscoped cluster-wide CM/Secret informer.

Stack

Stacks on the blocking-reason PR. go build/vet/test/golangci-lint green at tip.

🤖 Generated with Claude Code


Also folded in (from #145): enables the ReferencedDataGate feature flag on the lab cell overlay.

@scotwells scotwells changed the title fix: hub-side cleanup of referenced-data companions (cache-scoped to avoid OOM) Reliable cleanup of delivered config and secrets on cells (no orphan leak) Jun 4, 2026
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from b4714a8 to b244bcb Compare June 5, 2026 01:42
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from 02f4ac6 to 883e7ac Compare June 5, 2026 01:42
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from b244bcb to 758a87f Compare June 5, 2026 15:13
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from 883e7ac to 60844b5 Compare June 5, 2026 15:13
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from 758a87f to 90721bf Compare June 5, 2026 15:25
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from 60844b5 to c07ab0a Compare June 5, 2026 15:25
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from 90721bf to 14299a1 Compare June 5, 2026 16:19
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from c07ab0a to a05ae62 Compare June 5, 2026 16:19
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from 14299a1 to 7598e21 Compare June 5, 2026 16:46
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from a05ae62 to 443cf5d Compare June 5, 2026 16:46
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from 7598e21 to 0b5a5c5 Compare June 5, 2026 17:49
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from 15117eb to ba309d3 Compare June 5, 2026 17:49
scotwells and others added 8 commits June 5, 2026 13:36
The referenced-data controller materializes companion copies of the
ConfigMaps and Secrets a Workload references into its hub namespace, where
Karmada propagates them to the cell so kraftlet can mount them. The
compute-manager ClusterRole on the Karmada hub had no core ConfigMap/Secret
rule, so every reconcile failed with a forbidden error: companions were
never written, the expected-referenced-data annotation was never stamped,
and nothing propagated to the cell. ConfigMap/Secret mounts silently did
nothing on the edge.

Grant the controller full lifecycle access (it owns the companions,
including ref-count deletion) to configmaps and secrets in its namespace.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit c2a7794)
…ed before Karmada finishes GC

Part 1 — ordering guard: WorkloadDeploymentFederator.Finalize now checks
whether any companion ConfigMaps or Secrets (referenced-data=true label)
remain in the downstream namespace before calling
cleanupPropagationPolicyIfUnused. The guard only fires when this is the last
WD for its city code (mirroring cleanupPropagationPolicyIfUnused's condition),
so deleting WD-A cannot block on a live WD-B's companion in the shared
namespace. If companions are present Finalize returns an errCompanionsStillPresent
sentinel; Reconcile intercepts it (walking the kerrors.Aggregate the finalizer
framework returns), logs at Info, and sets RequeueAfter — no error-metric
inflation. After companionGuardTimeout (2 min) the guard bypasses itself so a
wedged referenced-data controller cannot permanently block deletion.

Part 2 — authoritative cell-side GC: CompanionGCReconciler watches
WorkloadDeployment events on each cell and deletes companion ConfigMaps/Secrets
whose every referenced-by entry resolves to no live WD on the local cell.
Critical fix: the referenced-by annotation is written by the hub
ReferencedDataController as "projectNamespace/wdName" (e.g.
"default/mount-pristine-default-dfw"), but the cell WD lives in ns-{uid}.
hasLiveReferrer now ignores the namespace in the key and looks up by NAME
ONLY in the companion's own namespace — preventing false "referrer absent"
conclusions that would delete an actively-mounted companion. SetupWithManager
uses WithEngageWithLocalCluster(false) to match all other cell controllers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit f97a3cb)
…r-wide informer OOM

The CompanionGCReconciler previously registered For(&corev1.ConfigMap{}) which
established a cluster-wide ConfigMap informer with no label-scoped cache on the
cell manager. On a cell cluster this caused controller-runtime to sync and hold
every ConfigMap (and, via lazy cache reads, every Secret) in the cluster, pushing
the cell compute-manager well past its 128Mi limit → OOMKilled / CrashLoopBackOff.

Fix: change the For type to WorkloadDeployment. WorkloadDeployments are already
cached on the cell by the sibling WorkloadDeploymentReconciler, so this adds no
new cluster-wide informer. A WD delete event still enqueues the object's namespace
and fires Reconcile, which is the deletion trigger the GC needs.

All companion reads (ConfigMap/Secret List) inside Reconcile go through
cl.GetAPIReader() (uncached). A one-shot List via the API reader does not
establish a persistent informer, so it does not re-introduce the OOM. WD
liveness checks use the cached client because WDs are already in the cell cache.

Reconcile now sweeps the entire WD namespace (listing companions by label via
the uncached reader) rather than reconciling a single named object, and returns
ctrl.Result{} (event-driven only, no per-WD requeue).

The periodic backstop coverage gap is closed by a companionGCBackstop Runnable
(implements mcmanager.Runnable = manager.Runnable + multicluster.Aware). On
Engage it records each cell cluster; on each ticker interval it lists ALL
companions cluster-wide via the uncached APIReader, collects distinct namespaces,
and sends namespace-keyed GenericEvents into a buffered channel wired via
WatchesRawSource(source.TypedChannel). This covers namespaces whose last WD was
deleted before the controller started, which For(WD) would never enqueue. The
steady-state load is bounded to one cross-namespace List per interval regardless
of WD history, and no persistent CM/Secret informer is ever created.

The federator's companionsStillPresent check (workloaddeployment_federator.go)
reads from FederationClient (the Karmada hub client) and is unaffected by this
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 9a064cc)
Remove the cell-side CompanionGCReconciler — deleting a Karmada-owned cell
copy directly causes a permanent delete/recreate thrash because the Work
object immediately re-applies the manifest. Hub-side cleanup is the only
correct layer.

Component 5: Delete companion_gc_controller.go and its test, and remove the
--enable-cell-controllers registration block from cmd/main.go.

Component 3: After deleting a hub companion (ref-count reaches zero), the
downstreamCompanionWriter now also deletes the companion's Karmada
ResourceBinding via hubClient. RB name follows the Karmada binding-controller
convention: "{companionName}-{configmap|secret}". Deleting the RB cascades:
binding-controller removes the Work, execution-controller removes the cell
copy permanently. IgnoreNotFound tolerates Karmada beating us to it.
localCompanionWriter (single-cluster dev mode) is a no-op.

Component 4: New OrphanRBReconciler runs on the Karmada hub federation
manager (alongside InstanceProjector). It watches ResourceBindings and
deletes any that are orphaned companion RBs — name ends with "-configmap" or
"-secret" AND propagationpolicy.karmada.io/name starts with "city-" AND the
hub companion no longer exists. WD RBs (suffix "-workloaddeployment") and
non-city-PP RBs are never touched. A periodic sweep Runnable fires every 5
minutes to catch RBs orphaned before the controller started, which will
automatically clean the existing stranded lab RBs on first deployment.

RBAC: add delete to work.karmada.io/resourcebindings in downstream-rbac
(hub ClusterRole). karmada-io/api work/v1alpha2 types added to global
scheme so the federation client can serialize ResourceBinding objects.

Tests: unit tests for Component 3 (RB teardown fires on ConfigMap/Secret
companion deletion, tolerates NotFound, no-op on localCompanionWriter) and
Component 4 (orphan detection, skip live/terminating companions, tight scope
guards against WD RBs and non-city-PP RBs, name pattern parsing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit ac25bf7)
OrphanRBReconciler scoped referenced-data companion ResourceBindings by reading
propagationpolicy.karmada.io/name from metadata.labels, but the running Karmada
version stores that value in metadata.annotations (labels carry only permanent-id
UUIDs). The label read always returned "", so the scope predicate rejected every
ResourceBinding and the controller never reclaimed an orphaned binding — confirmed
live in the lab, where the stranded cm-pristine/secret-pristine copies were never
cleaned.

Read from annotations in both isInScope and the watch predicate. The unit-test
fixtures previously set the PP name as a label (matching the bug, which is why they
passed against broken code); they now use the production annotation shape, so they
fail against the label read and pass against the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit e321370)
Add CompanionGCReconciler, a level-triggered backstop for referenced-data
companions stranded by interrupted finalization. When the
ReferencedDataController's WD finalizer is interrupted mid-flight (pod restart,
SIGKILL), a hub companion can be left with a referenced-by annotation pointing
at WDs that no longer exist on the hub. Without this controller, the companion
persists forever (leaking data including Secret bytes), its ResourceBinding keeps
a Work alive on the hub, and Karmada continuously re-creates the cell copy —
exactly the cm-pristine/secret-pristine lab scenario.

Design:
- Watches referenced-data-labeled ConfigMaps and Secrets on the hub. The
  federationMgr cache for ConfigMaps and Secrets is restricted to objects
  carrying the ReferencedDataLabel via cache.Options.ByObject (cmd/main.go).
  This is the OOM guard: predicates filter events, not cache contents; without
  the cache-level scope the informer would list-and-watch every ConfigMap/Secret
  on the Karmada hub — the same unscoped-informer pattern that OOMKilled the
  cell CompanionGCReconciler. The label predicate on the controller is kept as
  belt-and-suspenders but is NOT the primary memory guard.
- On each reconcile, parses the referenced-by annotation and checks each
  referenced WD by name in the companion's own hub namespace (ns-{project-uid})
  via HubClient.Get — no MCManager needed because WDs are federated to the hub.
- If ALL referrers are absent: deletes the companion and its ResourceBinding via
  the downstreamCompanionWriter path, driving the full Karmada cascade
  (RB → Work → cell copy deleted permanently).
- Conservative safety: terminating WDs count as present; corrupt annotations
  and malformed WD keys are handled by skip-not-delete.
- A companionGCPeriodicSweep backstop fires every 5 minutes to catch companions
  stranded before the controller started.
- Wired in setupManagementControllers on the federationMgr alongside
  OrphanRBReconciler and InstanceProjector.

Unit tests cover: orphaned CM/Secret deleted + RB torn down; live referrer
preserved; terminating referrer counts as present; all-absent multi-referrer
deleted; partial multi-referrer preserved; corrupt annotation preserved;
empty annotation preserved; unlabeled object unaffected; RB-already-gone
tolerated; periodic sweep drives reconciliation.

Federated e2e (test/e2e/referenced-data-delete-cascade/):
- HAPPY-PATH CASCADE: create WD + ConfigMap/Secret → assert companions on hub +
  cell + RBs present → delete WD → assert hub companion, RBs, cell copies all
  deleted and stay deleted (30 s anti-thrash poll).
- STRANDED COMPANION BACKSTOP: inject a labeled companion with a dead WD key →
  assert CompanionGCReconciler reclaims it within the sweep interval.

The e2e test is correct and self-contained but requires the Kind+Karmada
harness (task e2e:up) which is not runnable headless.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit e419b54)
The #37 predicate (wdReferencedDataChangedPredicate) on the WorkloadDeployment
For() watch dropped metadata-only Update events. A new WD's first reconcile adds
the workload-controller finalizer and returns; that finalizer-add produces an
Update with no generation bump and no ReferencedDataReady change, so the predicate
filtered it and the WD never got a second reconcile — wedging every new WD at the
finalizer stage until a controller restart.

Remove the predicate entirely. The equality.Semantic.DeepEqual guard before
Status().Update (added in the same #37 change) already prevents the self-trigger
loop the predicate was meant to stop, and watching all updates lets new WDs
proceed past the finalizer stage immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit b94a40a)
Activates the scheduling-gate + Instance-level Ready=False/SourceNotFound
contract for the lab (datum-us-central-1-lab). The flag is set in the
cell overlay's ConfigMap patch so it only affects deployments using the
cell OCI bundle (lab); management-plane and dev overlays are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 2cf963f)
@scotwells scotwells force-pushed the split/refdata-blocking-reason branch from 0b5a5c5 to ada425d Compare June 5, 2026 18:38
@scotwells scotwells force-pushed the split/refdata-hub-gc branch from ba309d3 to 8a65d48 Compare June 5, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant