Bound fluxnode memory growth: file-backed block index, header pruning, jemalloc (depends on #284)#286
Open
MorningLightMountain713 wants to merge 76 commits into
Conversation
Vendors the ECVRF-SECP256K1-SHA256-TAI (CFRG VRF draft-05, suite 0xFE) module
from aergo/secp256k1-vrf (MIT) into the bundled libsecp256k1 as an optional
module (--enable-module-vrf, enabled in the root build), and adds src/crypto/
ecvrf.{h,cpp} as the C++ boundary (ECVRF_Prove/ECVRF_Verify over CKey/CPubKey).
This is the cryptographic primitive for the PON VRF leader-election fix that
closes the leader-election grinding vulnerability: block eligibility becomes
y = VRF(operator_sk, epoch_seed) <= target, which the proposer cannot grind.
Verified: builds under Flux's exact secp256k1 flags (--with-bignum=no) and
reproduces the published draft-05 test vector byte-for-byte (prove/verify/
proof_to_hash); cross-checked against Witnet vrf-rs and an independent Python
reference. Constant-time audit of secret paths still pending before activation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the grindable PON eligibility lottery (GetPONHash over the proposer-
chosen prevBlockHash) with VRF-based eligibility, gated by UPGRADE_PON_VRF:
eligible(C) <=> y <= target, y = VRF(operator_key, epoch_seed)
y is unforgeable (operator secret key) and seeded by a buried block window the
proposer did not author (GetEpochSeed), so a producer can no longer shape its
own block to win the next lottery. Builds on the ECVRF primitive + ecvrf C++
boundary added in the previous commit.
- block.h: PON_VRF_VERSION=101; nodesVrfOutput + nodesVrfProof header fields,
committed under SER_GETHASH (covered by the operator signature).
- consensus/params.h, upgrades.cpp, chainparams.cpp: UPGRADE_PON_VRF
(NO_ACTIVATION_HEIGHT on all networks for now).
- pon-fork: IsPONVRFActive().
- pon.cpp: GetEpochSeed (buried-window accumulator); VRF eligibility in
CheckPONBlockHeader; proof verification (recomputed beta == nodesVrfOutput)
in ContextualCheckPONBlockHeader.
- pon-minter.cpp: compute the VRF proof with the operator key; coordinate via a
self-computable priority (lower y => shorter delay) since other nodes' VRF
outputs are unknowable; set the header fields before signing.
Pre-activation blocks use the legacy GetPONHash path unchanged. fluxd builds.
NOT yet exercised on a regtest/testnet fork; coordination/liveness and the
constant-time audit are pending (see pon-vrf/REVIEW.md).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes competing same-height VRF blocks resolve deterministically so the network converges instead of forking back and forth: - CBlockIndexWorkComparator (main.cpp): for PON_VRF blocks at equal work/height, break ties by lowest nodesVrfOutput. The VRF output is un-grindable, so unlike GetPONHash (depends on proposer-chosen nTime, grindable to win ties) an attacker cannot bias which competitor wins. Legacy PON blocks keep the GetPONHash tie-break (mixed-version forks around activation). - block.h: commit only the VRF output to the block hash; exclude the proof (like the signature) — the proof is self-validating against the committed output. - chain.h / txdb.cpp: store nodesVrfOutput in CBlockIndex + CDiskBlockIndex so the comparator can read it and GetBlockHash() recomputes correctly across restarts. The minting-delay coordination (previous commit) is now only orphan reduction; convergence/safety rests on this deterministic comparator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gence Extracts the PON fork-choice tie-break from the anonymous-namespace comparator in main.cpp into a public, testable function ComparePonForkChoice (pon.cpp). The comparator now delegates to it, so the tests exercise the real deployed logic. Adds gtests (test_pon.cpp) verifying the convergence guarantee: - lowest VRF output is preferred (deterministic winner among competitors), - antisymmetric (swap args -> sign flips: all nodes agree), - deterministic (same inputs -> same result), - equal outputs -> undecided (fall back to first-seen). All 22 PONTest cases pass (flux-gtest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds gtests (test_pon.cpp) exercising the VRF block lifecycle with unbypassed crypto: - VrfBlockHeaderSerializationRoundTrip: PON_VRF header serializes/deserializes intact and the hash is stable. - VrfOutputCommittedProofExcludedFromHash: changing the proof does not change the block hash (excluded, like the signature) while changing the VRF output does (committed) — pins the design that lets CBlockIndex store only the 32-byte output. - EcvrfProveVerifyRoundTrip: real ECVRF_Prove -> ECVRF_Verify round trip (the same crypto ContextualCheckPONBlockHeader runs); tampered proof, wrong key, and wrong seed are all rejected; proving is deterministic (RFC 6979). 25 PONTest cases pass (flux-gtest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roduced Live regtest testing revealed CreateNewBlock assembled a v100 PON block and ran TestBlockValidity on it BEFORE the minter/generate set the VRF fields — so once PON-VRF is active, block production failed with 'bad-pon-...' (version below PON_VRF_VERSION). The header build + validity check were producing/validating a block that could never pass the VRF eligibility rules. Fix: in CreateNewBlock, when PON-VRF is active, set nVersion = PON_VRF_VERSION and compute nodesVrfOutput/nodesVrfProof (via the operator key, or a deterministic placeholder when no key is configured, e.g. regtest generate) BEFORE TestBlockValidity. The minter and the regtest generate RPC now rely on this single authoritative path (generate's redundant post-assembly block removed). Verified on regtest: 'generate' past the PON-VRF activation height produces v101 blocks that pass validation (v100 before activation, v101 after). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…support
Three consensus/relay fixes (found via live testnet block production), all the same
root cause: the committed VRF output must be handled everywhere a block header is
built, hashed, validated, or relayed.
1. Per-slot VRF eligibility (pon.{h,cpp}, pon-minter.cpp, miner.cpp): the VRF input
is now H(epoch_seed || slot) (GetPonVrfMessage) instead of just epoch_seed. Without
the slot, a node's eligibility was constant for an entire epoch (eligible every slot
or none) — no leader rotation, broken liveness. The slot carries only the minor,
already-acknowledged ~10-slot future-time grind; the large prevBlockHash/coinbase
grind remains eliminated. Minter and ContextualCheckPONBlockHeader use it consistently.
2. CheckBlockHeader (main.cpp): for PON-VRF blocks, check the committed VRF output
(nodesVrfOutput) against target, not the legacy GetPONHash. The legacy value is
meaningless for VRF blocks and rejected ~half of valid v101 blocks as 'high-hash'.
3. CCompactBlockHeader (block.h): serialize the VRF output/proof for PON-VRF blocks.
It was omitted, so a peer decoded a v101 compact header with a null VRF output,
recomputed the wrong block hash, and rejected the chain as 'non-continuous
cmpheaders sequence' — breaking header sync between nodes.
Verified on a local testnet: a confirmed fluxnode mints v101 VRF blocks with clean
production (0 high-hash) and a second node syncs the VRF chain (0 non-continuous).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same root cause as the CheckBlockHeader fix: ReadBlockFromDisk re-validated every PON block against the legacy GetPONHash, so ~half of v101 blocks failed on disk-read with 'Errors in block header' — crashing the node shortly after it minted a VRF block. For PON-VRF blocks, check the committed nodesVrfOutput against target instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth and final instance of the same root cause: the on-disk block-index verification (LoadBlockIndex) re-checked every PON block against the legacy GetPONHash, so ~half of stored v101 blocks failed on startup with 'Error loading block database', preventing a node from restarting once it had synced/minted VRF blocks. Use the committed nodesVrfOutput for PON-VRF blocks. All header-eligibility check sites now agree: CheckPONBlockHeader, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sets the testnet PON_VRF upgrade to a placeholder height (9999999) so the activation
switch is staged in one obvious place. This is NOT a real schedule.
ACTION REQUIRED before tagging a testnet release:
- Replace 9999999 with a concrete testnet height comfortably above the current tip,
giving the fleet time to upgrade first.
Mainnet and regtest remain NO_ACTIVATION_HEIGHT (inert) and are unchanged here.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PON-VRF changes the wire serialization: v101 block headers carry the VRF output (committed) + proof, and the cmpheaders compact-header format carries them too. That must be a distinct protocol version so VRF-capable nodes are distinguishable from prior 170021 (compact-headers) nodes and can be gated at activation. - PROTOCOL_VERSION: 170021 -> 170022 (VRF-capable nodes advertise this) - UPGRADE_PON_VRF.nProtocolVersion: 170020 -> 170022 (all networks) so peers below 170022 are rejected once PON_VRF activates, guaranteeing all connected peers speak the VRF wire format. UPGRADE_PON stays 170020 (unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…under shared operator keys The VRF message was H(epoch_seed || slot), keyed by the operator key — but operator keys are shared across an owner's fleet in practice (review finding). With the key alone, N same-key nodes compute the identical VRF output, which: 1. Collapses N lottery draws into one, shrinking the fleet's share of block production N-fold. Minting pays the dev fund (not the minter), so this is a leadership/liveness distortion — block production silently concentrates in uniquely-keyed operators — rather than lost operator revenue. 2. On a win, makes all N nodes eligible with the same VRF-derived priority delay, so they broadcast competing blocks simultaneously (broadcast storm). 3. Voids the lowest-VRF fork-choice tie-break — the outputs are identical, so convergence degrades to first-seen on every such win. The message is now H(epoch_seed || slot || collateral). The collateral outpoint is the canonical per-node identity and is already committed in the header (nodesCollateral) and already used by the verifier to look up the operator pubkey, so verification needs no new wire data. The outpoint is fixed at node registration — before any future epoch seed exists — so it adds no grinding surface beyond the known key-grinding residual. Adds gtest VrfMessagePerNodeUnderSharedOperatorKey: distinct collaterals yield distinct messages and independent verifiable outputs under one shared key, and a proof for node A does not verify as node B. 26 PONTest cases pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LogPONEligibility predicts per-node eligibility with the legacy GetPONHash formula, which is dead once VRF leader election activates — and under VRF other nodes' eligibility cannot be computed at all (each draw needs that node's secret key). Anything it printed post-activation would be actively misleading to operators debugging minting from logs. Log-only change, gated on the same IsPONVRFActive height check as the consensus paths: no behavior change before activation on any network. Also skips a full confirmed-fluxnode-cache iteration per connected tip after activation. 26 PONTest cases pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reverts 85d252e to make way for the incremental PersistToDisk approach with full crash recovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…odes Fluxnode cache dirty entries were accumulated for up to 24 hours and flushed in a single operation holding cs_main. On memory-constrained CUMULUS nodes the fluxnode maps get swapped out, causing thousands of page faults during the flush — stalling RPC for minutes and triggering watchdog kills. Two changes: 1. Incremental fluxnode persistence: new PersistToDisk() writes dirty entries to LevelDB every 10 blocks using a batched write. Only holds the fluxnode lock, not cs_main. DumpFluxnodeCache is removed from the periodic flush path (kept only for shutdown via fForce). 2. Block index solution pruning: on fluxnodes, clear equihash solutions from PoW block index entries after load. Serialization paths (RPC, REST, P2P getheaders) fall back to reading from block files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The block index solution pruning needs fFluxnode to be set before LoadBlockIndexDB runs. Move the assignment from after genesis wait to before LoadBlockIndex in the init sequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move nNonce, nSolution, nodesCollateral, and vchBlockSig out of CBlockIndex into a separately-allocated HeaderData struct. On fluxnodes, this allocation is freed for all block index entries after load, saving ~116 bytes per entry across 2.5M blocks (~290 MB). hashMerkleRoot and hashFinalSaplingRoot remain in the core struct as they are accessed during consensus validation and reorgs. All serialization paths (RPC, REST, P2P getheaders) fall back to reading from block files when HeaderData is null. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move hashSproutAnchor, hashFinalSproutRoot, nSproutValue, nChainSproutValue, nSaplingValue, nChainSaplingValue, and nCachedBranchId into CBlockIndex::HeaderData alongside the proof fields. On fluxnodes, HeaderData is freed for blocks deeper than 100, saving ~244 bytes per entry across 2.5M blocks (~610 MB). The 100-block retention depth provides margin beyond the max reorg depth of 40 blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missed access sites for hashFinalSproutRoot, nChainSproutValue, nSproutValue, nChainSaplingValue, nSaplingValue in getblock and getblockchaininfo RPCs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pruning was inside LoadBlockIndexDB which runs before block rewinding. Rewinding accesses HeaderData fields on blocks that had already been pruned, causing a crash loop. Move to init.cpp after ActivateBestChain completes so all chain operations are done before we free the data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jemalloc is significantly better than glibc malloc at returning freed memory to the OS and avoiding fragmentation. This is critical for the HeaderData split where 2.5M allocations are freed after load — glibc retains the freed pages in its arena, jemalloc returns them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These fields are only accessed during reorgs (DisconnectBlock) and wallet merkle verification — both limited to recent blocks where HeaderData is retained. Moving them saves 64 bytes per entry across 2.5M blocks (~160 MB) on fluxnodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Data split Replace individual new/delete of CBlockIndex with a contiguous mmap-backed pool allocator. Benefits: - Zero per-element malloc overhead across 2.5M+ entries - Contiguous layout: old blocks at low addresses, recent at high - MADV_COLD hint tells the kernel which pages are cold after sync - OS manages working set: pages out old blocks under memory pressure - No heap fragmentation from millions of small allocations This reverts the HeaderData struct split (commits 1798652..7ed2691) which added null checks on every access site and caused build/crash issues. The mmap pool achieves the same goal (OS-managed memory for old blocks) without changing the CBlockIndex field layout. Also: - Reserve mapBlockIndex capacity before loading to avoid rehashes - Store block hashes in parallel pool array instead of relying on pointer-into-map-key (safer, pool-controlled memory) - Graceful fallback to heap allocation if mmap fails Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The git checkout reverted chain.h but main.cpp retained pHeaderData references from the earlier HeaderData commits. Restore direct field access on CBlockIndex. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Combines the mmap pool allocator with the HeaderData struct separation. The pool provides contiguous allocation with OS-managed paging for old blocks. HeaderData provides deterministic memory reduction by freeing the extended fields for buried blocks on fluxnodes. Together: the pool eliminates malloc overhead and fragmentation, jemalloc returns freed HeaderData pages to the OS, and MADV_COLD hints the kernel about cold pool pages. CBlockIndex shrinks from 424 to ~112 bytes for pruned entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use AC_CHECK_LIB to probe for jemalloc at configure time. Enabled by default, falls back gracefully with a warning if not found. Follows the same pattern as the existing Proton and ZMQ dependencies. Build with --disable-jemalloc to explicitly skip it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…very CMainCleanup's destructor called delete on CBlockIndex pointers that were placement-new'd into mmap pool memory, not heap-allocated. This caused SEGV/SIGABRT in jemalloc/glibc during process exit. Fix by using DestroyAll to run destructors on pool entries, then freeing the pool itself. Add RecoverFluxnodeCache startup check: compares the FluxnodeSyncState marker (written by PersistToDisk) to the chain tip. On mismatch (unclean shutdown, power cut, OOM kill), disconnects the stale blocks and lets ActivateBestChain reconnect them through normal ConnectBlock, rebuilding the fluxnode cache correctly. Cost: a few seconds to replay ≤10 blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the sync state block is on a different fork (crash happened mid-reorg), walk back from the sync state block through pprev to find the common ancestor with the active chain, then disconnect to there. ActivateBestChain reconnects everything through normal ConnectBlock, rebuilding the cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…active chain Fixes M1 (HIGH) from the memory-branch adversarial review; implements the design validated 2026-06-10 (MEM_REVIEW_FINDINGS.md §M1). The periodic write path commits the fluxnode DB and its sync marker at the in-memory tip every 15-20 minutes, while the coins DB flushes ~24h apart. A crash in that window restarts with chainActive BEHIND the marker. Old recovery treated marker-ahead as success (common ancestor == tip -> nothing to disconnect -> return true), so ActivateBestChain replayed blocks onto a cache that already contained their effects: UPDATE_CONFIRM undo records rebuilt from the already-updated cache overwrote the correct on-disk records (permanent corruption), starts were re-inserted into the tracker, and the confirm path reset nLastPaidHeight (payout-order divergence). The fork case had the same hole: fork-side effects were never rewound. Recovery now: - factors the fluxnode portion of DisconnectBlock into DisconnectFluxnodeOnly (undo-record read -> AddBackUndoData -> CheckForUndoExpiredStartTx -> reverse-tx loop -> delegate push, in exactly that order; no coins/chainstate access); - walks the MARKER's chain from the marker down to the common ancestor with the active chain, undoing each block's fluxnode effects into a fresh local cache flushed per block (per-block Flush is required: setAddToConfirmHeight semantics and AddBackUndoData's already-in- local guard both assume it, mirroring DisconnectTip); - runs the existing chainstate disconnect loop unchanged (verified convergent on a cache that never applied those blocks); - persists the repaired state with PersistToDisk(tip, fForce=true) (depends on the M7 fix) and clears the RPC list cache; - hard-fails (was: return true with a corrupt cache) on missing common ancestor, unreadable block, or a rewind deeper than the 5040-block undo retention window (matching M12), telling the operator to -reindex. init now aborts BEFORE ActivateBestChain on recovery failure - previously the error string was only checked afterwards, by which point ABC had already replayed onto the corrupt cache and overwritten good undo records. Crash mid-recovery is safe: phase 1 mutates only the in-memory cache (disk marker unchanged until the final atomic batch), so recovery re-runs identically on the next start. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fixes M3 from the memory-branch adversarial review (MEM_REVIEW_FINDINGS.md): GetDepthInMainChainINTERNAL compared the tx's merkle branch against pHeaderData->hashMerkleRoot, substituting uint256() when the header data had been pruned. CheckMerkleBranch never returns zero for a real tx (nIndex == -1 already early-returns), so the check always failed for txs buried in pruned-header blocks: depth 0, and GetDepthInMainChain turned that into -1 — every confirmed wallet tx reported as CONFLICTED. fMerkleVerified is memory-only and vMerkleBranch is serialized, so this re-fired on every restart of a fluxnode with a non-empty wallet (the wallet is not disabled by -zelnode, only by -prune). Uses the same read-from-disk fallback as the rescan path (wallet.cpp ChainTipAdded caller); the read happens once per wtx per session (fMerkleVerified caches the result). A failed disk read keeps depth 0 rather than verifying against garbage. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…reads Fixes M4 (HIGH) from the memory-branch adversarial review (MEM_REVIEW_FINDINGS.md): serving getheaders (160/message) or cmpheaders (2000/message) from a fluxnode's pruned block index called GetFullBlockHeader per entry, which fell back to ReadBlockFromDisk — deserializing EVERY transaction in each block and re-running the Equihash/PoW or PON proof check — all under cs_main. A single syncing peer could pin cs_main for seconds per message and turn a tiny request into tens-to-hundreds of MB of disk reads (cheap remote amplifier). The compact path was reading full blocks to recover an nSolution that CCompactBlockHeader then omits. A CBlock on disk serializes its CBlockHeader base first, so the new ReadBlockHeaderFromDisk deserializes only the header prefix at nDataPos: no transaction parsing, no proof recheck (the block was fully validated at accept; the reconstructed hash is still verified against the index entry, the same integrity check the full read performed). GetFullBlockHeader and EnsureHeaderDataFromDisk now use it, which also removes the full-block read from DisconnectBlock's pruned-pprev anchor restore and the M2 flush-side guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ives pruning Fixes M5 from the memory-branch adversarial review (MEM_REVIEW_FINDINGS.md): CBlockIndexWorkComparator computed GetPONHash(GetBlockHeader()) per comparison. For entries whose header data was pruned, GetBlockHeader() returns a zeroed nodesCollateral — the distinguishing PON input — so after a restart (load prunes ALL entries before they are inserted into setBlockIndexCandidates), equal-work PON ties could resolve opposite the rest of the network. Worse, ConnectBlock/DisconnectBlock restore header data on entries that may still sit in the candidate set: an in-place comparator-key change violates strict weak ordering and lets erase-by-key silently fail when an equal-chainwork same-height sibling coexists. The PON hash (32 bytes) is now a resident memory-only CBlockIndex member, computed where the header is guaranteed complete: at index load (it was already being computed there for the proof check, before the prune) and at entry creation (AddToBlockIndex, cmpheaders). The comparator reads the cached value, so its key never depends on pHeaderData and never mutates. No disk-format change: CDiskBlockIndex serialization is untouched (round-trip gtest unchanged and green). Note: trial/mem-on-pr284 is immune (resident nodesVrfOutput); this fix is specific to this branch's HeaderData split. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…op stale arena warning Fixes M8 and M9 from the memory-branch adversarial review (MEM_REVIEW_FINDINGS.md). M9: the pruned-header disk fallback was hand-rolled in five places with already-divergent behavior (rest.cpp best-effort, getblockheader hex-mode throwing only on one branch, blockheaderToJSON field-by-field with up to two full-block reads per header, wallet assert-style). GetFullBlockHeader is now exported via main.h with one deliberate contract: it fills the header (re-reading the header prefix from disk when pruned or when nSolution was omitted) and returns false on a failed read, leaving the partial in-memory view in place. P2P/REST serving stays best-effort; getblockheader hex mode throws RPC_INTERNAL_ERROR on any failed read; blockheaderToJSON emits empty strings (as before); the wallet sites fail their respective checks. All callers now benefit from the M4 header-prefix read instead of full-block deserialization. M8: delete the "arena over 90% full, bump POOL_CAPACITY" warning. It fired once on every mainnet fluxnode at ~90% of the FIRST 128MiB chunk and named a constant that no longer exists — the segmented arena grows on demand. Real exhaustion still warns via the heap-fallback path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Exercises RecoverFluxnodeCache end-to-end by rewriting the sync marker directly in determ_zelnodes (plyvel) between restarts: - clean restart skips recovery - stale marker is repaired exactly once (M7: the forced persist must write the marker even with a clean cache — asserted by reading the marker back from leveldb) and the next restart skips recovery - marker behind the active tip triggers the chainstate disconnect and the node replays back to the same tip - marker on a stale fork triggers the fluxnode-only rewind along the marker's chain (M1 phase 1) plus the chainstate disconnect, and the node converges to the best tip with the marker re-anchored there Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… completion fails Fixes a regression introduced by the M9 consolidation, caught by the per-fix gtest run (five wallet tests newly failing: FindUnspentSproutNotes, SproutNullifierIsSpent, SaplingNullifierIsSpent, NavigateFromSaplingNullifierToNote, SpentSaplingNoteIsFromMe). The consolidated GetFullBlockHeader returned false whenever any disk read failed — including the nSolution-completion read for a RESIDENT POW header with an empty solution. Callers that only need the header fields (the wallet merkle-depth check, the rescan sapling-anchor lookup) then treated a perfectly valid resident header as missing. Before M9 those sites read the resident fields directly and never touched disk. The gtest wallet fixtures (in-memory blocks, empty nSolution, no block files) hit exactly this. GetFullBlockHeader now reports failure only when the header FIELDS are unavailable (pruned entry and the disk read failed); a resident header whose omitted nSolution could not be completed is returned as success with the solution left empty — matching the pre-M9 per-field behavior of every call site. The two wallet sites go back to using the resident fields directly (no disk access at all when resident), falling back to the cheap ReadBlockHeaderFromDisk only when pruned. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…B snapshots The first version rewrote the sync marker with plyvel, but writes from a modern plyvel/leveldb produce a MANIFEST the daemon's older bundled leveldb silently ignores — fluxd never saw the rewritten marker (reads are compatible, writes are not; verified by writing ff..ff, reading it back with plyvel, and watching the daemon still report "sync state matches chain tip"). The test now uses directory snapshots of determ_zelnodes taken between restarts (only fluxd's own writes), a second never-connected node to supply the unknown-marker DB for the stale-marker scenario, and keeps plyvel strictly read-only for the marker assertions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Caught by the new crash-recovery regtest: every startup "recovered" cleanly even when the on-disk marker pointed somewhere else entirely. FlushStateToDisk passed fForce=true to PersistToDisk. Now that fForce is honored, that combination writes the sync marker even when the fluxnode cache is clean — and FlushStateToDisk also runs during init BEFORE RecoverFluxnodeCache (RewindBlockIndex ends with an unconditional FLUSH_STATE_ALWAYS). The forced write overwrote the on-disk marker with the current in-memory tip, destroying the marker/chain divergence that tells recovery a crash happened and leaving the stale fluxnode DB state in place unrepaired. The flush path now persists unforced: dirty data still goes out with the marker in the same atomic batch, and a clean cache leaves the marker untouched. The forced write remains where it is the point — recovery's stale-marker repair and post-rewind persist, and the manual RPC flush. Also rewords test comments to drop review-shorthand and historical phrasing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Follow-up to the previous commit: leaving the flush path permanently unforced meant that on a chain with no fluxnode transactions (regtest, quiet chains) the cache is never dirty, so the sync marker would never be written at all — recovery then has nothing to verify and the marker-tracking guarantees degrade to nothing. A new fFluxnodeCacheRecovered flag is set right after RecoverFluxnodeCache succeeds at init. Flushes before that point persist unforced (they must not overwrite the marker the recovery is about to read); flushes after it force the marker write, keeping the marker at the tip even when no fluxnode data is dirty. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nale Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… plyvel Merely OPENING a leveldb directory with modern plyvel compacts the write-ahead log into snappy-compressed table files. The daemon's bundled leveldb is built without snappy, so on the next start it dies in AppInit with "corrupted compressed block contents" — printed only to stderr, which the test framework swallows, leaving -rpcwait hanging forever. (Diagnosed from the file modes: the poisoned table was 0644, plyvel's umask, among the daemon's 0600 files.) Marker assertions now copy the DB directory to a throwaway path and open the copy. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rges Two isolated regtest nodes mine identical block hashes, so node1's "foreign" tip was actually a block on node0's active chain — recovery correctly treated the marker as merely behind (disconnect/replay) instead of taking the stale-marker path the scenario asserts. Running node1 on -mocktime gives its blocks different timestamps and therefore hashes node0 has never seen. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… fork Deterministic regtest mining strikes again: re-mining at the height of a just-invalidated block reproduces that exact block (same parent, same coinbase, same timestamp), which the daemon rejects as already-invalid. Mock the clock forward for chain B's blocks so they hash differently. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ives restart Blocks marked invalid by invalidateblock do not survive a restart in the block index: LoadBlockIndexDB skips nCachedBranchId reconstruction for blocks failing IsValid(BLOCK_VALID_CONSENSUS), so RewindBlockIndex deems them insufficiently validated and erases them. The fork scenario then exercised the stale-marker repair path instead of the marker-chain rewind. A real crash-during-reorg leaves the losing fork's blocks fully valid, so reconsiderblock the fork tip after mining the heavier chain — the failure flags clear, no reorg happens (chain B has more work), and the fork block survives the restart as recovery expects. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ardown restart
node1's blocks carry far-future timestamps from its mocked clock; the
final courtesy restart for framework teardown launched it with the real
clock, so startup verification rejected its own chain ("Corrupted block
database detected") and the framework's -rpcwait hung until timeout.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ComparePonForkChoice must be a pure function of resident index fields: legacy PON entries score by the cached hashPON, VRF entries by the committed nodesVrfOutput. Cover the activation-boundary fork shape (legacy vs VRF at the same height) and prove a pruned entry (pHeaderData == nullptr) compares identically to an unpruned one. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Eight deliberate-path stress tests for the fluxnode memory work, per the PR test plan: P2P header storm walking the full chain with wire-level field validation, regtest rehydration churn (reorg rehydrate + flush-side restore), regtest kill -9 crash matrix across persist windows, and operator runbooks for memory-pressure capping, restart storms, wallet rescans, and a fresh-peer IBD served exclusively by patched nodes. Python pieces are python3/stdlib, uv-managed, ruff and ty clean; regtest harness is self-contained (no python2 qa framework). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
blockindexpool.h unconditionally included sys/mman.h, which the x86_64-w64-mingw32 target does not provide (CI windows build failure). The POSIX includes move into the .cpp; on WIN32 the pool compiles as an inert stub whose Initialize() reports failure, so every caller takes its existing heap-fallback path and the node behaves like a stock build. The pool is only ever engaged on fluxnodes, which are Linux. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verified live: fluxnode mode on regtest needs txindex=1, fluxbenchd / fluxbench-cli files beside the daemon binary, and UNMANAGED_FLUXBENCHD to skip launching them — the harness now provides all three. The crash matrix accepts every legitimate RecoverFluxnodeCache outcome (a kill before the first persist legitimately boots with no marker). The header storm handles the version-gated server reply formats: legacy protocol gets full 'headers' (solutions on the wire), current protocol gets 'cmpheaders' (PoW solution omitted, explicit block hash, PON field order differs); PON block hashes are recomputed from the GETHASH serialization and chain continuity is asserted via hashPrevBlock. Both regtest tests pass on the build server; both header-storm modes validated against a mainnet fluxnode. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
sys/statvfs.h does not exist on the mingw target. On Windows the check is moot anyway: the pool's Initialize() reports failure there, so the node already takes the heap path regardless of free disk. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- ibd_sync: parse verificationprogress numerically (arrives in scientific notation early in IBD; the textual match declared success at height 1326); pin dedicated rpcport/port so a shared host can't collide - restart_storm: assert on the LAST RecoverFluxnodeCache line instead of a byte-offset log window (rotation/buffering produced a false failure while all 10 boots were actually clean) - memory_pressure: mirror the production unit's identity and environment (User/Group, HOME for .zcash-params, UNMANAGED_FLUXBENCHD, MALLOC_ARENA_MAX); wait for the datadir's own daemon to release the leveldb lock before starting the capped instance (scoped by -datadir, NOT pgrep -x fluxd — other instances on the host are not ours); tolerate transient empty MainPID reads while the unit is active Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Serving headers for buried (pruned) entries re-reads them from the block files; under memory pressure a cold 2000-entry compact batch takes whole seconds, and the handler held cs_main for all of it — starving RPC and validation (measured: multi-second getblockcount stalls and >20s peer ping RTTs on a 500 MB MemoryHigh-capped node under a 6-walker header storm). The handler now snapshots the resident header fields and the entries needing disk during a short cs_main hold (pure pointer-chasing), then performs the rehydration reads lock-free. Index entries are never freed at runtime and the read path uses only immutable fields. The snapshot is taken under the lock because reorg paths rehydrate pHeaderData concurrently; a lock-free reader could observe a half-filled allocation. A failed read keeps the resident snapshot (best-effort serving, unchanged). ReadBlockHeaderFromDisk's mismatch log now identifies the block by its immutable hash for the same reason. ProcessGetData has the same shape with full-block reads (worse) — left for a follow-up: its loop interleaves tx/filter serving that genuinely needs the lock. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eader data The arena and the HeaderData prune only covered the load-time path: blocks accepted at runtime took plain `new CBlockIndex` (pinned heap) in AddToBlockIndex and the compact-header handler, and their rebuildable header data was never freed after init -- so RssAnon drifted up ~0.5-0.7 MB/day with chain growth (the deferred M10 finding). Allocate runtime-accepted entries from the file-backed arena (reclaimable RssFile) via a shared AddBlockIndexFromHeader helper that mirrors InsertBlockIndex's arena/heap split, and free header data once an active-chain entry ages past a hot window (PruneAgedHeaderData, hooked after UpdateTip in ConnectTip), re-read from disk on demand exactly like the load-time prune -- fluxnode-gated and only for entries whose block data is on disk (header-only entries cannot be rebuilt and must stay resident). Steady-state runtime RssAnon growth is now bounded instead of climbing with the chain. Adds a gtest for the prune decision (ShouldFreeAgedHeaderData) covering the window boundary and the fluxnode / on-disk / resident-data guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DecrementNoteWitnesses only touches a note when its witnessHeight is at or below the decremented height, so the test's assertion that a reorg leaves the witness cache and final anchor untouched holds only when the decrement is strictly below the tip. The test decremented at heights 5 and 50; with MAX_REORG_LENGTH=40 the chain is WITNESS_CACHE_SIZE + 10 = 51 blocks, so 50 is the tip and the decrement pops the live witness. Derive the deeper decrement height from WITNESS_CACHE_SIZE so it stays below the tip regardless of the reorg-length constant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit f140688)
When the arena cannot map a chunk (disk full, unsupported filesystem), CBlockIndex entries fall back to heap allocation and live outside the pool. The two shutdown paths (UnloadBlockIndex, CMainCleanup) destructed the pool's own entries via DestroyAll but never deleted those heap-fallback entries, leaking them at process exit (and under leak sanitizers). Both paths now delete the non-Contains() entries before tearing the pool down, mirroring the runtime erase path. Also corrects a stale arena comment in the block-index-pool gtest header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five issues found reviewing the file-backed block index / fluxnode recovery: - CBlockIndex copy ctor and operator= dropped nodesVrfOutput. The hand-written copy operations (added to deep-copy pHeaderData) omitted this resident field, which is committed to the v101 block hash. The disk-write path builds CDiskBlockIndex(*pindex) through the copy ctor, so a persisted PON-VRF index entry would serialize a zeroed output, fail its hash check on the next load, and abort startup once PON-VRF activates. The copy ctor now delegates to operator= (one enumerated copy path) and operator= carries nodesVrfOutput. - rest_headers dereferenced pHeaderData off raw CBlockIndex* after releasing cs_main; PruneAgedHeaderData frees that allocation at runtime, so a /rest/headers request racing a ConnectTip was a use-after-free on -rest fluxnodes. The bulk binary/hex path now snapshots resident headers under cs_main and rehydrates pruned entries from disk lock-free (mirroring the getheaders handler); the JSON path is built under the lock. - RecoverFluxnodeCache could read an absent fluxnode undo record back as a silently-empty one and under-rewind, because the recovery depth cap and the CleanupOldFluxnodeData cutoff anchor to different heights. Recovery now probes ExistsBlockUndoFluxnodeData and fails closed to -reindex on a missing record (recovery-only flag; the live disconnect path is unchanged). - getheaders served a zeroed header when the lock-free disk rehydration of a pruned entry failed; it now truncates the batch at that entry. - RewindBlockIndex leaked the prunable HeaderData of arena-backed entries it removed (their destructor is skipped because arena slots are not freed individually); it now frees HeaderData explicitly. gtests: nodesVrfOutput survives the CBlockIndex copy and the CDiskBlockIndex round-trip (the coverage gap that let the first issue through), and ExistsBlockUndoFluxnodeData distinguishes an absent record from an empty one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A second adversarial pass plus the existing recovery regtest surfaced that three of the previous fixes needed tightening: - The missing-undo-record fail-closed (recovery) was too strict. It forced a full -reindex on any rewound block with no undo record, but a block that changed no fluxnode state legitimately writes none. The recovery regtest (and pre-fluxnode-activation / low-activity histories) hit exactly that and failed to start. Scope the fail-closed to blocks at or below the retention cutoff (tip - ONE_WEEK_OF_BLOCK_COUNT), where an absent record could mean 'pruned'; above it an absent record can only mean 'empty', so tolerate it. ONE_WEEK_OF_BLOCK_COUNT moves to fluxnodecachedb.h to be shared with recovery. - rest_headers' JSON branch still held cs_main across the on-disk header reads (up to 2000 cold reads per request), reintroducing the validation-thread stall the rest of the change set removes. It now snapshots index pointers + resident headers under cs_main, rehydrates pruned entries from disk lock-free, and builds the JSON under the lock from the prefetched headers (a new blockheaderToJSON override that skips the disk read). No disk I/O under the lock. - getheaders no longer sends an empty headers/cmpheaders batch when the first served entry is unreadable: an empty batch reads as 'end of chain, stop asking' to the peer, dropping this node as a header source. It now sends nothing and lets the peer retry elsewhere. Verified: full gtest 252/252; the fluxnode_cache_recovery regtest passes all scenarios (it deadlocked on the over-strict fail-closed before this). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PruneAgedHeaderData only sweeps the active chain, so a block's HeaderData (rehydrated to disconnect it during a reorg) was retained in RAM indefinitely once the block fell off the active chain — a bounded but unbounded-over-time leak on fluxnodes. DisconnectTip now frees it when the block leaves the chain; it is rebuilt from disk on demand and re-restored by ConnectTip if the block is reconnected. Surfaced by a second adversarial review pass; pre-existing, not a regression from the other fixes. Verified: full gtest 252/252 and the fluxnode_cache_recovery regtest (which exercises reorg disconnect) both pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #284 must merge first. This branch is based on
feat/pon-vrf-integrationand inherits its protocol bump (170022) and the VRF block-index fields (nodesVrfOutputis kept always-resident — the fork-choice comparator must never depend on prunable data, see "Fork choice" below). It will not apply to master until #284 lands.Why fluxnodes need this
A Cumulus node has 8 GB of RAM, and that RAM is the product — it's what FluxOS sells to applications. Every megabyte fluxd pins is a megabyte no app can use, and unlike CPU (which idles between blocks), memory is occupied 24/7. On a stock build, fluxd was consuming ~1.9 GB of pinned anonymous memory — roughly ONE QUARTER of a Cumulus node's entire RAM — and the dominant component of that grows linearly with chain height, forever:
mapBlockIndexholds oneCBlockIndexper block (~2.67 M today, +~52 K/yr) in anonymous heap memory for the process lifetime.This branch makes fluxd's steady-state memory bounded: the chain keeps growing, our resident memory doesn't.
Measured (same height, 62 GB host, uncapped):
salad)squidward)VmRSSRssAnon(pinned, needs swap)RssFile(reclaimable, no swap)Under a 500 MB
MemoryHighcgroup cap withMemorySwapMax=0, this branch ran fully synced with zero swap used; a stock node cannot fit in that envelope. On a Cumulus node this frees ~1.4 GB — about 18% of total RAM — back to apps, and the number stops growing with the chain.The optimizations, and the reasoning behind them
All memory-optimization behavior is gated on
fFluxnode; non-fluxnode builds behave like master.1. The block index moves into a file-backed arena — why a file
The kernel can only evict a page to wherever its backing store is. Anonymous (heap) pages are backed by swap — no swap, no eviction, pinned forever. Pages backed by a file evict to that file under memory pressure and fault back in on access, no swap needed. So the fix for "the OS can't page this out" is to give the OS something to page it out to.
Concretely:
CBlockIndexskeletons are placement-allocated from a segmented, file-backed arena — a single named scratch file (blockindex.arenain the datadir,mmapMAP_SHARED), each slot oneCBlockIndex+ its 32-byte hash. Hot entries (near the tip) stay in page cache and are exactly as fast as heap; cold entries (the millions of historic blocks, touched only by rescans/deep RPC) get evicted by the kernel under pressure and cost a page fault when touched. That converts the block index from pinned RssAnon into reclaimable RssFile — which is why the table above shows RssAnon dropping 1855 → 360 MB while RssFile rises.Design points:
ftruncate+ fixed mmap windows at increasing offsets; windows never move, so pointers stay valid). File size and VSZ track actual usage — no fixed reservation, no hard ceiling.O_TRUNCat startup,unlinkon clean shutdown).madvise MADV_COLD,MADV_DONTNEEDfallback; a recent window stays hot), so its pages are reclaimed early rather than only under memory pressure.2.
CBlockIndexis split, and rebuildable data is pruned — why a splitMost of a
CBlockIndexis data that exists byte-for-byte in the block files on disk (equihash solution, merkle root, nonce, collateral, block-sig). Keeping ~2.67 M copies of it resident is paying RAM for a second copy of the disk. The struct is split:nCachedBranchId, Sprout anchors, the VRF output, the cached PON hash.HeaderData(rebuildable from disk): merkle root, final sapling root, nonce, solution, nodes-collateral, block-sig.HeaderDatais resident only for a recent window of entries near the tip. At load it is freed for the whole historic chain (right after each entry's consistency check, not afterActivateBestChain— cutting the init-time memory transient ~2.1 GB → ~1.4 GB); at runtime each entry'sHeaderDatais freed once it ages past the window. It is rehydrated from the block files on demand (reorgs into pruned territory, header serving, rescans). Pruning frees the separately-allocatedHeaderData(~112 bytes) per entry; the resident skeleton (~312 bytes) stays.3. jemalloc — why the allocator matters here
The HeaderData prune frees ~2.5 million small allocations right after load. glibc malloc keeps those freed pages in its arena (they stay in RssAnon, invisible savings); jemalloc actually returns freed pages to the OS and fragments far less under this churn. Without it, optimization #2 would shrink our logical usage but not the number the node operator (and the OOM killer) sees.
Mechanics: fluxd links the system
libjemalloc.so.2dynamically (a malloc replacement needs dynamic linking for symbol interposition; the deb package will declarelibjemalloc2as a dependency). Detection is autotools (AC_CHECK_LIB), enabled by default, graceful fallback with a warning if absent,--disable-jemallocto opt out; Windows/macOS keep their system allocators. Deploy check:ldconfig -p | grep jemalloc.Fork choice stays resident (interaction with #284)
CBlockIndexWorkComparatortie-breaks same-height PON blocks viaComparePonForkChoice: VRF blocks compare by the committednodesVrfOutput, legacy PON blocks by a PON hash cached on the index entry at load/creation (hashPON). Both scores are deliberately resident fields: recomputing fromGetBlockHeader()would hash a zeroed collateral for pruned entries (wrong tie-break vs the network) and would mutate a comparator key in place for entries already insidesetBlockIndexCandidates. No disk-format change.Crash safety and correctness
The arena/prune machinery adds new failure surfaces a crash can expose — a flushed index entry may be header-pruned, and the fluxnode sync-marker and the coins DB are written on independent cadences, so a crash can leave them at different heights. These are the guarantees the branch holds:
DisconnectFluxnodeOnly) to the consistent height and replays to tip; it hard-fails rather than continue on a cache that already contains a block's effects, caps rewind depth at the undo-retention window, and aborts init beforeActivateBestChainif recovery fails. AnfFluxnodeCacheRecoveredgate keeps init-time flushes (includingRewindBlockIndex's forced flush) from overwriting the marker before recovery has read it.PersistToDiskhonoursfForce, so a stale marker is repaired rather than silently kept.cs_main— a 2000-entrycmpheadersrequest no longer triggers 2000 full-block deserializations and Equihash rechecks under the lock.GetFullBlockHeader, over a sharedReadBlockHeaderFromDiskprimitive.Compatibility / risk
CDiskBlockIndexserialization is byte-identical to Feat/pon vrf integration #284's (moved fields keep their serialized position/order;nodesVrfOutput's VRF-gated READWRITE preserved verbatim; memory-only fields stay memory-only).HeaderDatasplit (two allocations/block instead of one, ~50–85 MB at current height + a pointer indirection).invalidateblockinto pruned territory triggers on-demand disk reads (rehydration) rather than crashes.getblockheader(non-verbose) errors rather than returning a stale in-memory header,blockheaderToJSONemits empty strings for the affected fields, and/rest/headersserializes the resident-only view. These differ from Feat/pon vrf integration #284 only on a missing/unreadable block file.libjemalloc.so.2(libjemalloc2package) — graceful fallback if absent, but the memory numbers above assume it.Testing
Done
flux-gtestfull suite (on the Feat/pon vrf integration #284 base): green, including Feat/pon vrf integration #284's 7 VRF tests (fork-choice, serialization, ECVRF prove/verify) which exercise the resident-score fork-choice merge.CDiskBlockIndexserialization round-trip (no-reindex guard), prune→restore serializes byte-identically, resident state incl.nCachedBranchId+ cached PON hash survives the prune,CBlockIndexPoolalloc/exhaustion/Contains/DestroyAll, forcedPersistToDiskwrites the sync marker on a clean cache, the resident VRF output (nodesVrfOutput) survivesCBlockIndexcopy/assignment and a v101CDiskBlockIndexserialize round-trip (the disk-write path the fork-choice comparator depends on), andExistsBlockUndoFluxnodeDatadistinguishes an absent fluxnode undo record from an empty one (the recovery missing-record check).qa/rpc-tests/fluxnode_cache_recovery.py(green): asserts all four recovery shapes — clean restart skips recovery; a stale marker is repaired exactly once; a marker behind the tip disconnects/replays to the same tip; a marker on a stale fork triggers the fluxnode-only rewind along the marker's chain and converges to the best tip.-fsanitize=threadserved/rest/headersfrom multiple HTTP worker threads over the header-pruned region while the validation thread pruned aged header data on every block connect (PruneAgedHeaderData→FreeHeaderData); TSan reported no data race on the prunableHeaderData, the block index, or the arena — confirming the serving path snapshots header fields undercs_mainand rehydrates from disk lock-free, never touchingpHeaderDataoff-lock.RssAnonflat 356–467 MB,VmSwap0, all at tip, host reboots recovered with "no recovery needed".RssAnonbounded in the ~425–500 MB band,VmSwap0 (one memory-pressured host evicts cold anon pages to swap, by design), all at tip, no restarts.blockindex.arena,Dirty=0, counted inRssFile, kernel-evictable),VmSwap0;RssAnonoscillates within a bounded band and fully reclaims after each per-block allocation spike, with no monotonic growth across the soak.getreceivedbyaddresscross-checked to the satoshi against the independent address index (getaddressdeltas, coinbase-adjusted). This exercises the bounded-index historical read path — per-blockCBlockIndexlookup +ReadBlockFromDisk+ theGetFullBlockHeadernSolutiondisk fallback;RssAnonstayed bounded,VmSwap0.SIGKILL) twice with the wallet mid-write; both restarts recovered cleanly — chainstate behind the block files, fluxnode-cacheUNDO PREPAREto a consistent height then replay to tip, no reindex and no corruption.qa/stress/): restart storm (13/13 boots clean, "no recovery needed", post-settleRssAnon318–449 MB); memory-pressure torture (500 MBMemoryHigh+MemorySwapMax=0,VmSwap0 in every sample, arena pages evicting/faulting as designed); header-serving storm (exposed multi-secondcs_mainholds ingetheadersserving → fixed in this branch: disk rehydration now runs outsidecs_main, RPC stays at idle latency, 7–13 ms, while serving 40 k cold headers); rehydration churn and a 12-kill crash matrix on regtest; and mixed fork-choice at the VRF boundary — all green.progress=1.0) into a clean datadir sourcing only from two patched nodes, with no OOM and no crash and dbcache bounded (~89 MiB resident at tip); random-height block-hash cross-checks against a serving node matched throughout, and the serving nodes held at tip with boundedRssAnon.CachedWitnessesCleanIndexcase intest_wallet.cppdecremented note witnesses at a hardcoded height that could coincide with the chain tip under Flux'sMAX_REORG_LENGTH=40; the decrement point is now derived fromWITNESS_CACHE_SIZEso it stays below the tip. No production witness code is touched.Remaining gate
The fleet soak, stress suite, fresh-peer IBD, and wallet-rescan validations above are complete. The only remaining gate is the #284 merge dependency (shared protocol bump 170022) — this PR must land after it.
Reviewer guidance — scrutinize
CDiskBlockIndex::SerializationOpfield order vs. Feat/pon vrf integration #284 (must be byte-identical, including the VRF gating).pHeaderData->access: guarded, rehydrated, or pre-prune (seeEnsureHeaderDataFromBlock/FromDisk,GetFullBlockHeader).fFluxnodegating points (arena create, both prune sites).CBlockIndexPoollifetime:Contains()-based pool-vs-heap discrimination in cleanup paths; chunk-map failure → heap fallback;O_TRUNC/unlinkcleanup; mmap-window offsets stay valid as the file grows.ComparePonForkChoicereads only resident fields (nodesVrfOutput,hashPON).Deployment notes
libjemalloc2(ldconfig -p | grep jemalloc); the deb should declare it. Without it fluxd runs but RssAnon savings shrink (glibc retains freed pages).dbcache=200(bounds the anonymous UTXO cache, the dominant remainingRssAnon; a per-tier tunable — lower for Cumulus/Nimbus).blockindex.arenafrom datadir backups/snapshots — a visible scratch file regenerated from leveldb each run.🤖 Generated with Claude Code