ZeroDB v2 by qdequele · Pull Request #2 · qdequele/ZeroDB

qdequele · 2025-11-23T13:42:39Z

No description provided.

- Error enum with LMDB-compatible error types - EnvFlags, DatabaseFlags, PutFlags matching LMDB constants - Type aliases for PageNo and common types

- Page header, meta page, leaf/branch page structures - Overflow page support for large values - MemoryMap abstraction with read-only and read-write modes - DataFile with batch writes and fdatasync support

- PageAllocator with freelist management - Sorted freelist for sequential allocation (cache locality) - PagePool for reusing page buffers - DirtyPages tracking for write transactions

- Node operations for leaf and branch pages - Binary search with LMDB-compatible key comparison - Cursor with stateful traversal (first, last, next, prev) - Page builder for constructing new pages - Insert operations with page splitting

- Env with mmap management and transaction handling - RoTxn/RwTxn with MVCC semantics - Optimized commit path with batch I/O and single fdatasync - WRITEMAP mode for direct mmap writes - Database handle with cursor-based reads and writes

- Export all public modules and types - Add memmap2, bitflags, page_size dependencies - Add dev dependencies for benchmarks (criterion, heed, rocksdb)

- Basic environment and transaction tests - B+tree operations tests - Cursor traversal tests

- Sequential write benchmarks with fsync - Random read benchmarks - Iteration benchmarks - B+tree search microbenchmarks - Transaction overhead benchmarks

- LMDB architecture reference document - Action plan for ZeroDB development - Heed compatibility wrapper (work in progress)

Comprehensive list of 24 performance optimizations: - 7 implemented (single fsync, fdatasync, batch I/O, etc.) - 17 pending (cursor caching, prefetch, SIMD, io_uring, etc.) Includes benchmarking checklist and platform-specific notes.

Optimizations implemented: - Cursor page cache with LRU eviction (up to 16 pages) - search_cached() method for cached tree traversal - madvise() support for Sequential/Random/WillNeed/DontNeed hints - Page prefetching via MADV_WILLNEED - macOS F_FULLFSYNC for guaranteed durability (not just drive cache) - libc dependency for Unix syscalls

Ensure implemented and pending counts match the documented optimizations.

Add #[cold] and #[inline] hints to optimize branch prediction: - #[cold] on error constructors in error.rs - #[inline(always)] on frequently called methods (num_keys, is_leaf, key, value) - #[inline] on search and parse methods in page_ops, node, and header

Implement hardware prefetch instructions for improved cache utilization: - prefetch_read<T>() using x86_64 _mm_prefetch and aarch64 inline asm - prefetch_range() for prefetching cache-line sized chunks - Integrated into CursorOps::next() to prefetch next node during iteration Supports x86_64, x86, and aarch64 architectures with no-op fallback.

Add infrastructure for deferred freelist loading: - freelist_loaded flag to track loading state - is_freelist_loaded() to check if freelist is loaded - mark_freelist_loaded() for marking as loaded - needs_freelist_load() for checking if loading is needed This allows faster environment open by deferring freelist loading until pages are actually needed.

Update OPTIMIZATIONS.md to reflect newly implemented features: - Branch prediction hints (now implemented) - CPU prefetch for sequential scans (now implemented) - Lazy freelist loading (now implemented) Total: 13 implemented, 11 pending optimizations.

Implement hardware-accelerated key comparison for keys >= 16 bytes: - x86_64: SSE2 using _mm_loadu_si128 and _mm_cmpeq_epi8 - aarch64: NEON using vld1q_u8 and vceqq_u8 - Fallback to standard comparison for short keys or unsupported archs Compares 16 bytes at a time, finding first differing byte position using movemask/trailing_ones pattern. Includes comprehensive tests for short keys, long keys, and edge cases.

Move SIMD key comparison from pending to implemented. Total: 14 implemented, 10 pending optimizations.

Implement bump-pointer arena allocator to reduce heap allocation overhead: - Allocates from pre-allocated chunks (default 64KB) - 8-byte alignment for all allocations - Grows automatically with new chunks when needed - reset() allows memory reuse without deallocation - alloc_zeroed() for zero-initialized memory - alloc_with<T>() for typed allocations with proper alignment Includes comprehensive tests for basic allocation, growth, and reset.

- Arena allocator moved from pending to implemented (Memory #9) - Renumbered pending items 16-24 - Updated summary table: 15 implemented, 9 pending 🤖 Generated with [Claude Code](https://claude.com/claude-code)

The Error type is only used in tests, so mark the import with cfg(test) to avoid the unused import warning. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Added prefetch_range calls when positioning on leaf pages: - first(): prefetch leaf for forward iteration - last(): prefetch leaf for backward iteration - descend_left(): prefetch when moving to next leaf - descend_right(): prefetch when moving to previous leaf This warms the CPU cache with the first 2KB of leaf page data to improve iteration performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Added is_empty flag to commit_txn to detect transactions with no changes: - No dirty pages - No new pages allocated - No freed pages This infrastructure is reserved for future optimization when proper WRITE_MAP mode with async flush is implemented. Currently all transactions still persist to maintain durability guarantees. Also added has_freed_pages() helper to PageAllocator. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Updated current performance status: - Sequential Writes: ~455ms vs LMDB ~404ms (89%) - Point Lookup: ~150ns vs LMDB ~130ns (87%) - Read Transaction: ~14ns vs LMDB ~37ns (264% - faster than LMDB!) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Use get_page_buffer() for the meta page serialization buffer instead of allocating a new Vec each commit. The buffer is returned to the pool after the write completes. This eliminates one 4KB allocation per transaction commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Removed the buf.fill(0) call when returning buffers to the pool. Pages are always fully overwritten before being written to disk: - Meta pages: MetaPage::write_to() writes the entire page - Data pages: copied in full via copy_from_slice() This eliminates a 4KB memset per page returned to the pool, providing ~3.5% write performance improvement. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Updated current performance status: - Sequential Writes: ~436ms vs LMDB ~372ms (85%) - Improvement from meta buffer pooling and skip zeroing 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Add Heed-compatible typed database API with support for multiple named databases within a single environment. Each named database gets a unique DBI and is tracked in an in-memory registry that manages root page updates.

Add comprehensive Heed-compatible API for drop-in replacement in Meilisearch: TLS Mode Support: - WithTls/WithoutTls markers for transaction Send-ability - read_txn_with_tls() and read_txn_without_tls() methods - static_read_txn() for 'static lifetime transactions that own the Env - Generic Env<T: TlsUsage> and RoTxn<'e, T> types New Types (Meilisearch required): - DecodeIgnore for skipping value decoding - U16<O>, U32BE<O>, U64BE<O>, U128<O>, I128<O> endian-aware integers - BEU16, BEU32, BEU64, BEU128, BEI128 big-endian type aliases - SerdeJson<T> and SerdeBincode<T> (serde feature) - MdbError type alias for Error compatibility Re-exports: - byteorder crate types (BigEndian, LittleEndian, NativeEndian, etc.)

- Add U8 type codec for 8-bit unsigned integers - Add BoxedError type alias for codec operations - Export U8 and BoxedError in public API Note: RoTxn::commit() already exists in the codebase.

Add ignored tests to verify large value handling: - test_2gib_value: Attempts to insert a 2GiB value - test_max_value_size: Finds the maximum working value size Results show max value size is ~8-16KB due to missing overflow page support. Values exceeding page capacity return PageFull error. Run with: cargo test test_2gib_value -- --ignored

Add LMDB-compatible overflow page support allowing values larger than ~2KB (for 4KB pages) to be stored in overflow pages. Changes: - Add node_max() calculation matching LMDB's me_nodemax formula - Add overflow_pages() function using LMDB's OVPAGES formula - Implement write_overflow_pages() to allocate and write overflow data - Implement read_overflow_pages() to read values from overflow pages - Update put/get/first/last to detect and handle overflow nodes - Preserve overflow nodes when copying entries during page splits This enables storing values up to 2GiB+ (tested and verified). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Fix unit_cmp error in types.rs test - Remove unused imports across codebase - Add #[allow(dead_code)] for code planned for future use - Add #[allow(clippy::type_complexity)] for complex return types - Add #[allow(clippy::large_enum_variant)] for transaction enums - Fix collapsible_if patterns - Apply cargo fmt formatting 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Added 'heed/' to .gitignore to exclude the Heed subproject from version control. - Deleted ACTION_PLAN.md and OPTIMIZATIONS.md as they are no longer needed. - Removed the Heed subproject reference from the repository. This cleanup helps streamline the project structure and maintain focus on the core implementation.

- Create comprehensive README.md with usage examples, API docs, and benchmarks - Add CONTRIBUTING.md with development guidelines and code style - Fix lib.rs doctests to be runnable (add tempdir, max_dbs, proper syntax) - Add iteration and safety documentation to lib.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Add fast path for in-place insertion when page has enough space - Directly manipulates page buffer without rebuilding - Shifts only node pointers (2 bytes), not node data - Avoids O(n) key/value copying to intermediate Vec - Optimize rebuild path for splits - Calculate split requirement upfront - Single-pass page building instead of try-then-rebuild - Direct iteration over source page nodes - Add free_space() method to LeafPage - Add profiling examples (profile_writes, profile_reads, bench_insert) - Enable debug symbols in release profile for profiling Performance improvement: - insert_into_leaf CPU time: -89% (50.6% -> 5.5%) - insert_into_tree CPU time: -48% (69.9% -> 36.3%) - Bottleneck shifted from CPU to I/O (fsync) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Add thread-local page buffer pool to avoid repeated allocations - PageBuilder::new_leaf and new_branch now reuse buffers from pool - Buffers are zeroed before reuse for safety - Pool limited to 8 buffers per thread to avoid memory bloat - Export return_page_buffer/return_page_buffers for callers to recycle This reduces allocation pressure during page splits and rebuilds, though the impact is minimal since insertion is now I/O bound. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Ignore *.svg flamegraph files - Ignore *.trace profiling traces - Ignore perf.data files 🤖 Generated with [Claude Code](https://claude.com/claude-code)

qdequele added 30 commits November 23, 2025 14:37

Initial commit for v2: Rust project kickoff

66f66d8

feat: add core error types, flags, and type definitions

9b4d630

- Error enum with LMDB-compatible error types - EnvFlags, DatabaseFlags, PutFlags matching LMDB constants - Type aliases for PageNo and common types

feat: add page structures and memory mapping

5140da8

- Page header, meta page, leaf/branch page structures - Overflow page support for large values - MemoryMap abstraction with read-only and read-write modes - DataFile with batch writes and fdatasync support

feat: add page allocator and buffer pool

d142d9e

- PageAllocator with freelist management - Sorted freelist for sequential allocation (cache locality) - PagePool for reusing page buffers - DirtyPages tracking for write transactions

feat: update lib.rs exports and add dependencies

4b50142

- Export all public modules and types - Add memmap2, bitflags, page_size dependencies - Add dev dependencies for benchmarks (criterion, heed, rocksdb)

test: add integration and correctness tests

b9649e5

- Basic environment and transaction tests - B+tree operations tests - Cursor traversal tests

bench: add comparison benchmarks vs LMDB and RocksDB

07ecb97

- Sequential write benchmarks with fsync - Random read benchmarks - Iteration benchmarks - B+tree search microbenchmarks - Transaction overhead benchmarks

docs: add architecture documentation and heed compatibility layer

120fba8

- LMDB architecture reference document - Action plan for ZeroDB development - Heed compatibility wrapper (work in progress)

docs: add optimization tracking document

83029db

Comprehensive list of 24 performance optimizations: - 7 implemented (single fsync, fdatasync, batch I/O, etc.) - 17 pending (cursor caching, prefetch, SIMD, io_uring, etc.) Includes benchmarking checklist and platform-specific notes.

docs: update optimization tracking with implemented features

b1bb45c

docs: update optimization tracking summary table

1abce44

Ensure implemented and pending counts match the documented optimizations.

docs: update optimization tracking with SIMD key comparison

c2172d8

Move SIMD key comparison from pending to implemented. Total: 14 implemented, 10 pending optimizations.

docs: update optimization tracking for arena allocator

98c724d

- Arena allocator moved from pending to implemented (Memory #9) - Renumbered pending items 16-24 - Updated summary table: 15 implemented, 9 pending 🤖 Generated with [Claude Code](https://claude.com/claude-code)

fix: move Error import to cfg(test) in cursor module

e4684d6

The Error type is only used in tests, so mark the import with cfg(test) to avoid the unused import warning. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

docs: update performance benchmarks after buffer optimizations

9fb27d1

Updated current performance status: - Sequential Writes: ~436ms vs LMDB ~372ms (85%) - Improvement from meta buffer pooling and skip zeroing 🤖 Generated with [Claude Code](https://claude.com/claude-code)

feat: implement named database support

d9c80b5

Add Heed-compatible typed database API with support for multiple named databases within a single environment. Each named database gets a unique DBI and is tracked in an in-memory registry that manages root page updates.

qdequele added 10 commits December 1, 2025 00:00

feat: add remaining Heed API compatibility types

60b0dc5

- Add U8 type codec for 8-bit unsigned integers - Add BoxedError type alias for codec operations - Export U8 and BoxedError in public API Note: RoTxn::commit() already exists in the codebase.

chore: add profiling artifacts to .gitignore

b3347a5

- Ignore *.svg flamegraph files - Ignore *.trace profiling traces - Ignore perf.data files 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDB v2#2

ZeroDB v2#2
qdequele wants to merge 40 commits into
mainfrom
v2

qdequele commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qdequele commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant