Oss upstream#602
Draft
Davis-Zhang-Onehouse wants to merge 52 commits into
Draft
Conversation
added 9 commits
April 10, 2026 19:24
Brainstormed plan for porting org.apache.hudi.expression (Predicate
hierarchy) plus the org.apache.hudi.internal.schema.{Type,Types}
subset it depends on, then wiring keyFilterOpt through ReaderContext
and the file-group reader so a key-based In-predicate actually filters
output rows on a v9 MOR + COMMIT_TIME_ORDERING table.
Three-phase landing strategy with manual cross-check against
readerContext_callstack.md as Phase 2 verification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the previous \"trait hierarchy + downcast in create_key_spec\"
plan with a kind() accessor approach (Option A from the brainstorm).
Each Predicate / Expression trait gets a kind() -> {Predicate,Expression}Kind<'_>
method returning a borrowed enum view, so create_key_spec is a clean
Rust match instead of Any::downcast_ref scaffolding.
Trait hierarchy stays Java-faithful; kind() is pure inspection sugar.
Documented as deviation #2 in §6, with the Literal { value: LiteralValue }
shape (deviation #1) re-justified by the kind()-driven extraction pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-phase plan (1: port Predicate hierarchy + Type/Types; 2: wire key_filter_opt through ReaderContext + reader; 3: e2e smoke test). Each task is a TDD cycle with concrete code, exact file paths, and explicit commit instructions. Spec at docs/superpowers/specs/2026-05-07-keyfilteropt-port-design.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stub modules to be populated in subsequent tasks of the keyFilterOpt port. See docs/superpowers/plans/2026-05-07-keyfilteropt-port-implementation.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Java's ArrayData (org.apache.hudi.expression.ArrayData) is a concrete class implementing StructLike, not a separate interface. Replace the incorrectly-defined trait with a Vec<Box<dyn Any + Send + Sync>>-backed struct that implements StructLike. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExpressionKind variants land in subsequent tasks as concrete types are introduced (Literal, NameReference, BoundReference, Predicate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PredicateKind variants are added in Task 1.14 once concrete predicate types exist. ExpressionKind now has all four variants and the _Placeholder is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…th*,StringContains} + factories Finalizes PredicateKind to 12 variants. All Predicates inner classes from Java's Predicates.java are now ported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Visitors are ported as declarations for hierarchy completeness. The bind logic is not wired into the reader path — see spec §6 deviation apache#6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors Java HoodieMergedLogRecordReader.createKeySpec via Rust kind() pattern matching. Pure addition; not yet wired into scanner. Tests cover In, StringStartsWithAny, non-matching, and KeySpec.matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors Java HoodieReaderContext.keyFilterOpt. Default None on all construction sites including FFI bridge and ReaderContext::empty(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors Java BaseHoodieLogRecordReader.scanInternal(Option<KeySpec>, boolean). KeyBasedFileGroupRecordBuffer overrides set_key_spec to store the spec and skip non-matching records in process_data_block/process_delete_block. All existing callers pass None; behavior unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the existing TODO. Mirrors Java performScan() lines 95-107. key_filter_opt defaults to None on all current call paths so behavior is unchanged for existing users. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors Java HoodieAvroReaderContext.getFileRecordIterator (lines 218-228) "fall through to row-level filter" path. Parquet has no native key predicate, so we filter via Arrow filter_record_batch on the _hoodie_record_key column. No-op when key_filter_opt is None. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Maps each Java readerContext interaction in the call-stack doc to its hudi-rs file:line equivalent. Validates Phase 2 implementation parity for the keyFilterOpt path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
read_file_group_with_key_filter is a self-contained variant of the existing read_file_group helper that accepts a key_filter_opt. lookup_record_key and extract_row_with_id_opt are small Arrow accessors used by the filter test in the next task. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads city=sf partition of v9_mor_8i4u_commit_time with and without an In predicate on _hoodie_record_key. Asserts: * baseline returns 2 rows * filtered returns 1 row matching baseline content for id=1 * id=2 (base-only, not in filter) is excluded End-to-end validation of the keyFilterOpt port: ReaderContext field, log-scan KeySpec, base-file row filter all confirmed working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three sites in cpp/ build ReaderContext via struct literal: - cpp/src/lib.rs:372 (FFI bridge) - cpp/tests/read_record_batch_tests.rs:79 (test helper) - cpp/tests/read_record_batch_tests.rs:609 (per-test setup) All initialize key_filter_opt to None. FFI integration is out of scope per spec §2 — this commit just keeps the workspace compiling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 new tests across 3 fixtures, AB-pattern validation, isolated log-scan vs base-file filter coverage, no-op regression guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8-test plan in 4 phases (A: helper + refactor; B: tests 2-5 on V9Mor8I4UCommitTime; C: tests 6-7 on V9MorNonpart3Commits with delete-block path; D: test 8 on MorLayoutLogOnly with log-only isolation). Each task is a TDD cycle with concrete code and fixture filenames pre-resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds: - FilterAbResult struct + 4 assertion methods (narrowed/noop/empty/ids_eq) - ab_read_with_filter shared driver - 3 fixture locators: sf_file_group, nonpart_3commits_file_group, log_only_file_group - ids_in_batch helper Renames fg_reader_with_key_filter_filters_rows → fg_filter_in_log_updated_key, refactored to use the AB helper. Behavior unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extract_row_with_id_opt_v9nonpart helper Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads MorLayoutLogOnly with and without an In filter on _hoodie_record_key. Since the fixture has no base file, ALL output flows through KeyBasedFileGroupRecordBuffer's process_data_block / process_delete_block — this test isolates the log-scan filter. Probe confirmed: _hoodie_record_key is present (String, values "k1"/"k2"), baseline row count = 2 (k3 deleted), so primary path was taken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…jection contract The previous assertions expected a third 'age' column even though the ReaderContext-supplied schema_handler projects output down to the requested_schema [id, name]. Update the test to verify the projected schema, and add a companion test test_read_record_batch_no_projection_when_requested_equals_data that covers the requested == data case where OutputConverter is None and all columns flow through. Mirrors Java HoodieFileGroupReader constructor lines 119-122 and the next() projection loop at lines 264-265 (hudi-internal).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
How are the changes test-covered