Add diagnostic for missing MV targets in Kafka#4
Open
filimonov wants to merge 1 commit into
Open
Conversation
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
… + Tier 2 G1/G3) Resolves the B69 attended-review gate. Framing correction (verified): the sweep's current safety net is a generation-BLIND full reachability re-validate scan, so #1/#2 are leak/log-drift today, NOT data-loss — they become data-loss only when Tier 2 (#4) removes that scan. Hence Tier 1 (generation accounting #1/ClickHouse#6, fail-closed sticky session #2, race ClickHouse#5, contracts ClickHouse#7) MUST precede Tier 2 (lock-free GcLogWriter I/O #3 + sealed-tombstone index #4). New lockless-path oracles are the gate. #2 retain-session = sticky-exempt-from-reaping + bounded re-log-retry (not reconciliation). ClickHouse#6 = settled generation in the .meta bundle sidecar. Rest -> backlog. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
…hor) Scan A (candidate discovery, collectSealedTombstoneCandidates) is what #4 replaces with gc/sealed/<shard>. Scan B (the delete gate, markReachableBlobs/identity_reachable_in) is the generation-blind over-protective safety net and SURVIVES this remediation. So #1/#2 stay leak-only today AND after Tier 2; data-loss transition is a FUTURE follow-up that swaps Scan B for the §6.2 sessions+compaction gate. Tier-1-before-Tier-2 re-justified: sound practice (#4 index bookkeeping needs #1's generations), not '#4 removes the net'; the safety-load-bearing coupling is Tier 1+oracles -> the future Scan-B replacement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
…-B replacement) B70: #4 replaces Scan A (candidate discovery), not Scan B (the markReachableBlobs delete gate). Scan B survives this remediation, so #1/#2/ClickHouse#6 stay leak-only through Tier 2; Tier-1-first is sound practice (gc/sealed bookkeeping needs #1's generations), not '#4 removes the net'. B78 (NEW): replace Scan B with the §6.2 sessions+compaction authoritative gate — the true G3/authority completion and the actual data-loss transition. Gated on B70 Tier 1 + the new lockless-path oracles (the review author's explicit required follow-up). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
… 1-6) 18 tasks across 7 phases, grounded in verbatim current code. Tier 1 first (#1 splitDeltaByShard generations, ClickHouse#6 sidecar drop-keying, #2 fail-closed sticky session + bounded re-log, ClickHouse#5 pin-snapshot race, ClickHouse#7 *Locked rename) then Tier 2 (#3 lock-free GcLogWriter I/O + fold-ins, #4 gc/sealed index). Scan B (markReachableBlobs delete gate) deliberately untouched — its replacement is B78. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
Add the per-shard sealed-tombstone index key namespace (`gc/sealed/<shard>/`) to replace Scan A (full `blobs/`+`parts/` LIST) in the sweep loop. Key encoding: `<prefix>/gc/sealed/<shard>/<identity>.<generation>.<b|p>` where `identity` is a lowercase hex digest (no `.`), `generation` is decimal, and the type suffix is `b` (blob) or `p` (part). Splitting the basename on `.` yields exactly 3 fields — unambiguously parseable without escaping. New symbols in `PoolPaths`: - `shardForPartId(const PartId &)` — canonical free function; folds the part_id's hex prefix via the same nibble-fold as `shardForHash`. This is the single source of truth; `GcLogWriter::shardForPartId` now delegates here. - `gcSealedPrefix(prefix, shard)` — LIST prefix for one shard's index. - `gcSealedKey(prefix, shard, identity, generation, is_blob)` — full entry key. - `SealedIndexEntry` struct + `parseSealedIndexKey(prefix, key)` — inverse parser; returns `nullopt` on any malformed key (wrong shape, bad type char, non-numeric generation) so stray objects under gc/sealed/ are ignored. Round-trip test added to `gtest_content_addressed_gc_s4.cpp` (`ContentAddressedSealedIndex.RoundTrip`): blob+part at generations 0 and 5, plus 5 rejection cases (garbage, wrong prefix, missing segment, bad type, non- numeric generation). Suite: 153/153 passed (was 152). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
…bucket scan (Scan A, G3) Seal adds a compact index entry; recover/sweep removes it; the sweep discovers candidates by LISTing only gc/sealed/<shard> (16 small prefixes) instead of the full blobs/+parts/ tree. Does NOT touch Scan B (the markReachableBlobs delete gate) — perf/G3 fix only. Oracle 6 proves re-presentation across rounds + index lifecycle. The generations-per-hash observability tally (ContentAddressedGenerationsObserved / ContentAddressedHashesObserved) moved from the retired full-tree Scan A into the reconciliation full-scan (collectReconciliationCandidates), which still walks the whole tree, so the counters still reflect the true generation population without double-counting. Two S4 oracles that simulated a GC seal by writing the .tombstone directly were made faithful to the real seal (they now also seed the matching gc/sealed entry via the seal helper / an explicit condCreateIfAbsent), since Scan A no longer re-discovers a tombstone that has no index entry. The grace=0 GcRecheckBefore oracle now sums the deleted_blobs across both rounds (with grace=0 the seal and sweep collapse into round 1; a swept generation is no longer re-presented in round 2). No assertion was weakened. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 5, 2026
…); B78 still open Tier 1 (#1 splitDeltaByShard generations, ClickHouse#6 sidecar drop-keying, #2 fail-closed sticky session, ClickHouse#5 pin-snapshot race, ClickHouse#7 *Locked rename) + Tier 2 (#3 lock-free GcLogWriter I/O + fold-ins, #4 gc/sealed index) landed with oracles 1-6 green, 156 ContentAddressed gtests + CA stateless smoke + non-CA regression passing. Scan B (the markReachableBlobs delete gate) untouched — B78 (replace it with the sources+compaction authoritative gate) remains the deferred, data-loss-critical follow-up with its own attended-review gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov
added a commit
that referenced
this pull request
Jun 12, 2026
…econcile rebuild; drop Keeper epoch cache) External review (Codex) found real edge-accounting holes. Fixes: #1/#2 Stale root positives. The §4.1 orderings (+-before-setRef, removeRef-before--) already bias a crash to over-count (leak), never under-count (loss). The real bug was that they could leak FOREVER because reconcile (zero-weight markers) could not subtract a stale +. Fix: reconcile is now an AUTHORITATIVE REBUILD (§4.5) — it recomputes in-degree from real refs/ reachability + the physical LIST, with a high-watermark (snap/<E> authoritative-through-E; discard logs ≤ E), so a stale + recomputes to its true value and dies. Keep logging root edges (so the routine fold needs no refs/ LIST — answers the "full traversal each round" concern); reconcile is the periodic authoritative truth-maker. #3 Generation ABA when reclaim lags. Added a durable per-hash floor (floors/<H> = 1+max-condemned-gen); reuse iff g ≥ floor(H) else resurrect to floor. Replaces the bounded recent-condemned window as the reuse authority. #4 Closed-epoch reappend. Concrete protocol: leader writes a durable seal (gc/sealed/<e>) at close; a writer whose append target is sealed re-syncs and reappends to the open epoch. The fold processes only sealed epochs. ClickHouse#5 gc/condemned is now a FULL reclaim record (hash, gen, kind, child-edges, fold-epoch); R4 cascade reads children from it (crash-safe successor). #5b Bounds: per-writer caps ≤3 (tree + 2 children) so multi-child commit is reachable in the model. Epoch cache: DROPPED from Keeper (v1) per three concurring reviews. The epoch lives only in S3 gc/epoch; writers read it with a short process-memory TTL (lag-only = safe; the seal is the event-invalidation). Removes the fragile "Keeper never ahead of S3" invariant and the ghost-epoch recovery hazard entirely. Keeper now holds ONLY leader election + per-writer leases. Threaded through layout, writer/GC/recovery/reconcile protocols, invariants, hinges, failure table, decisions (now D1–D6), verification scope, §11 open items, and the formal appendix (floors/sealed variables, floor-based reuse, seal+reappend, authoritative-rebuild Reconcile, updated bounds + scenarios). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
system.kafka_consumersTesting
pytest -q tests/integration/test_storage_kafka/test_mv_target_missing.py(fails: ModuleNotFoundError: No module named 'requests')https://chatgpt.com/codex/tasks/task_b_685b14ca56848323834460f201b10a92