Skip to content

Add diagnostic for missing MV targets in Kafka#4

Open
filimonov wants to merge 1 commit into
masterfrom
codex/add-error-handling-for-missing-materialized-view-in-kafka
Open

Add diagnostic for missing MV targets in Kafka#4
filimonov wants to merge 1 commit into
masterfrom
codex/add-error-handling-for-missing-materialized-view-in-kafka

Conversation

@filimonov

Copy link
Copy Markdown
Owner

Summary

  • report missing materialized view target tables for Kafka tables
  • expose the error through system.kafka_consumers
  • add integration test covering this scenario

Testing

  • pytest -q tests/integration/test_storage_kafka/test_mv_target_missing.py (fails: ModuleNotFoundError: No module named 'requests')

https://chatgpt.com/codex/tasks/task_b_685b14ca56848323834460f201b10a92

filimonov added a commit that referenced this pull request Jun 5, 2026
… + Tier 2 G1/G3)

Resolves the B69 attended-review gate. Framing correction (verified): the sweep's
current safety net is a generation-BLIND full reachability re-validate scan, so
#1/#2 are leak/log-drift today, NOT data-loss — they become data-loss only when
Tier 2 (#4) removes that scan. Hence Tier 1 (generation accounting #1/ClickHouse#6,
fail-closed sticky session #2, race ClickHouse#5, contracts ClickHouse#7) MUST precede Tier 2 (lock-free
GcLogWriter I/O #3 + sealed-tombstone index #4). New lockless-path oracles are the
gate. #2 retain-session = sticky-exempt-from-reaping + bounded re-log-retry (not
reconciliation). ClickHouse#6 = settled generation in the .meta bundle sidecar. Rest -> backlog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
…hor)

Scan A (candidate discovery, collectSealedTombstoneCandidates) is what #4 replaces
with gc/sealed/<shard>. Scan B (the delete gate, markReachableBlobs/identity_reachable_in)
is the generation-blind over-protective safety net and SURVIVES this remediation.
So #1/#2 stay leak-only today AND after Tier 2; data-loss transition is a FUTURE
follow-up that swaps Scan B for the §6.2 sessions+compaction gate. Tier-1-before-Tier-2
re-justified: sound practice (#4 index bookkeeping needs #1's generations), not
'#4 removes the net'; the safety-load-bearing coupling is Tier 1+oracles -> the
future Scan-B replacement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
…-B replacement)

B70: #4 replaces Scan A (candidate discovery), not Scan B (the markReachableBlobs
delete gate). Scan B survives this remediation, so #1/#2/ClickHouse#6 stay leak-only through
Tier 2; Tier-1-first is sound practice (gc/sealed bookkeeping needs #1's generations),
not '#4 removes the net'.

B78 (NEW): replace Scan B with the §6.2 sessions+compaction authoritative gate — the
true G3/authority completion and the actual data-loss transition. Gated on B70 Tier 1
+ the new lockless-path oracles (the review author's explicit required follow-up).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
… 1-6)

18 tasks across 7 phases, grounded in verbatim current code. Tier 1 first (#1 splitDeltaByShard
generations, ClickHouse#6 sidecar drop-keying, #2 fail-closed sticky session + bounded re-log, ClickHouse#5 pin-snapshot
race, ClickHouse#7 *Locked rename) then Tier 2 (#3 lock-free GcLogWriter I/O + fold-ins, #4 gc/sealed index).
Scan B (markReachableBlobs delete gate) deliberately untouched — its replacement is B78.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
Add the per-shard sealed-tombstone index key namespace (`gc/sealed/<shard>/`)
to replace Scan A (full `blobs/`+`parts/` LIST) in the sweep loop.

Key encoding: `<prefix>/gc/sealed/<shard>/<identity>.<generation>.<b|p>`
where `identity` is a lowercase hex digest (no `.`), `generation` is decimal,
and the type suffix is `b` (blob) or `p` (part). Splitting the basename on `.`
yields exactly 3 fields — unambiguously parseable without escaping.

New symbols in `PoolPaths`:
- `shardForPartId(const PartId &)` — canonical free function; folds the
  part_id's hex prefix via the same nibble-fold as `shardForHash`. This is the
  single source of truth; `GcLogWriter::shardForPartId` now delegates here.
- `gcSealedPrefix(prefix, shard)` — LIST prefix for one shard's index.
- `gcSealedKey(prefix, shard, identity, generation, is_blob)` — full entry key.
- `SealedIndexEntry` struct + `parseSealedIndexKey(prefix, key)` — inverse
  parser; returns `nullopt` on any malformed key (wrong shape, bad type char,
  non-numeric generation) so stray objects under gc/sealed/ are ignored.

Round-trip test added to `gtest_content_addressed_gc_s4.cpp`
(`ContentAddressedSealedIndex.RoundTrip`): blob+part at generations 0 and 5,
plus 5 rejection cases (garbage, wrong prefix, missing segment, bad type, non-
numeric generation).

Suite: 153/153 passed (was 152).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
…bucket scan (Scan A, G3)

Seal adds a compact index entry; recover/sweep removes it; the sweep discovers candidates by
LISTing only gc/sealed/<shard> (16 small prefixes) instead of the full blobs/+parts/ tree. Does
NOT touch Scan B (the markReachableBlobs delete gate) — perf/G3 fix only. Oracle 6 proves
re-presentation across rounds + index lifecycle.

The generations-per-hash observability tally (ContentAddressedGenerationsObserved /
ContentAddressedHashesObserved) moved from the retired full-tree Scan A into the reconciliation
full-scan (collectReconciliationCandidates), which still walks the whole tree, so the counters
still reflect the true generation population without double-counting.

Two S4 oracles that simulated a GC seal by writing the .tombstone directly were made faithful to
the real seal (they now also seed the matching gc/sealed entry via the seal helper / an explicit
condCreateIfAbsent), since Scan A no longer re-discovers a tombstone that has no index entry. The
grace=0 GcRecheckBefore oracle now sums the deleted_blobs across both rounds (with grace=0 the
seal and sweep collapse into round 1; a swept generation is no longer re-presented in round 2).
No assertion was weakened.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 5, 2026
…); B78 still open

Tier 1 (#1 splitDeltaByShard generations, ClickHouse#6 sidecar drop-keying, #2 fail-closed sticky
session, ClickHouse#5 pin-snapshot race, ClickHouse#7 *Locked rename) + Tier 2 (#3 lock-free GcLogWriter I/O +
fold-ins, #4 gc/sealed index) landed with oracles 1-6 green, 156 ContentAddressed gtests +
CA stateless smoke + non-CA regression passing. Scan B (the markReachableBlobs delete gate)
untouched — B78 (replace it with the sources+compaction authoritative gate) remains the
deferred, data-loss-critical follow-up with its own attended-review gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
filimonov added a commit that referenced this pull request Jun 12, 2026
…econcile rebuild; drop Keeper epoch cache)

External review (Codex) found real edge-accounting holes. Fixes:

#1/#2 Stale root positives. The §4.1 orderings (+-before-setRef,
removeRef-before--) already bias a crash to over-count (leak), never
under-count (loss). The real bug was that they could leak FOREVER because
reconcile (zero-weight markers) could not subtract a stale +. Fix: reconcile
is now an AUTHORITATIVE REBUILD (§4.5) — it recomputes in-degree from real
refs/ reachability + the physical LIST, with a high-watermark (snap/<E>
authoritative-through-E; discard logs ≤ E), so a stale + recomputes to its
true value and dies. Keep logging root edges (so the routine fold needs no
refs/ LIST — answers the "full traversal each round" concern); reconcile is
the periodic authoritative truth-maker.

#3 Generation ABA when reclaim lags. Added a durable per-hash floor
(floors/<H> = 1+max-condemned-gen); reuse iff g ≥ floor(H) else resurrect to
floor. Replaces the bounded recent-condemned window as the reuse authority.

#4 Closed-epoch reappend. Concrete protocol: leader writes a durable seal
(gc/sealed/<e>) at close; a writer whose append target is sealed re-syncs and
reappends to the open epoch. The fold processes only sealed epochs.

ClickHouse#5 gc/condemned is now a FULL reclaim record (hash, gen, kind, child-edges,
fold-epoch); R4 cascade reads children from it (crash-safe successor).

#5b Bounds: per-writer caps ≤3 (tree + 2 children) so multi-child commit is
reachable in the model.

Epoch cache: DROPPED from Keeper (v1) per three concurring reviews. The epoch
lives only in S3 gc/epoch; writers read it with a short process-memory TTL
(lag-only = safe; the seal is the event-invalidation). Removes the fragile
"Keeper never ahead of S3" invariant and the ghost-epoch recovery hazard
entirely. Keeper now holds ONLY leader election + per-writer leases.

Threaded through layout, writer/GC/recovery/reconcile protocols, invariants,
hinges, failure table, decisions (now D1–D6), verification scope, §11 open
items, and the formal appendix (floors/sealed variables, floor-based reuse,
seal+reappend, authoritative-rebuild Reconcile, updated bounds + scenarios).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant