Skip to content

Bound fluxnode memory growth: file-backed block index, header pruning, jemalloc (depends on #284)#286

Open
MorningLightMountain713 wants to merge 76 commits into
RunOnFlux:masterfrom
MorningLightMountain713:mem/bounded-blockindex
Open

Bound fluxnode memory growth: file-backed block index, header pruning, jemalloc (depends on #284)#286
MorningLightMountain713 wants to merge 76 commits into
RunOnFlux:masterfrom
MorningLightMountain713:mem/bounded-blockindex

Conversation

@MorningLightMountain713

@MorningLightMountain713 MorningLightMountain713 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

⚠️ Merge dependency

PR #284 must merge first. This branch is based on feat/pon-vrf-integration and inherits its protocol bump (170022) and the VRF block-index fields (nodesVrfOutput is kept always-resident — the fork-choice comparator must never depend on prunable data, see "Fork choice" below). It will not apply to master until #284 lands.

Why fluxnodes need this

A Cumulus node has 8 GB of RAM, and that RAM is the product — it's what FluxOS sells to applications. Every megabyte fluxd pins is a megabyte no app can use, and unlike CPU (which idles between blocks), memory is occupied 24/7. On a stock build, fluxd was consuming ~1.9 GB of pinned anonymous memory — roughly ONE QUARTER of a Cumulus node's entire RAM — and the dominant component of that grows linearly with chain height, forever:

  • mapBlockIndex holds one CBlockIndex per block (~2.67 M today, +~52 K/yr) in anonymous heap memory for the process lifetime.
  • Anonymous memory can only be reclaimed to swap — useless on the many swapless or swap-starved nodes — so it is effectively pinned.
  • Allocator tuning can't help: the memory is live. Without intervention, the footprint at 5 M blocks will be roughly double today's.

This branch makes fluxd's steady-state memory bounded: the chain keeps growing, our resident memory doesn't.

Measured (same height, 62 GB host, uncapped):

stock node (salad) this branch (squidward)
VmRSS 1906 MB 1075 MB
RssAnon (pinned, needs swap) 1855 MB 360 MB
RssFile (reclaimable, no swap) 52 MB 716 MB
Block index in… anonymous heap file-backed arena
Min footprint under no-swap pressure ~1.9 GB / OOM ~500 MB (measured)

Under a 500 MB MemoryHigh cgroup cap with MemorySwapMax=0, this branch ran fully synced with zero swap used; a stock node cannot fit in that envelope. On a Cumulus node this frees ~1.4 GB — about 18% of total RAM — back to apps, and the number stops growing with the chain.

The optimizations, and the reasoning behind them

All memory-optimization behavior is gated on fFluxnode; non-fluxnode builds behave like master.

1. The block index moves into a file-backed arena — why a file

The kernel can only evict a page to wherever its backing store is. Anonymous (heap) pages are backed by swap — no swap, no eviction, pinned forever. Pages backed by a file evict to that file under memory pressure and fault back in on access, no swap needed. So the fix for "the OS can't page this out" is to give the OS something to page it out to.

Concretely: CBlockIndex skeletons are placement-allocated from a segmented, file-backed arena — a single named scratch file (blockindex.arena in the datadir, mmap MAP_SHARED), each slot one CBlockIndex + its 32-byte hash. Hot entries (near the tip) stay in page cache and are exactly as fast as heap; cold entries (the millions of historic blocks, touched only by rescans/deep RPC) get evicted by the kernel under pressure and cost a page fault when touched. That converts the block index from pinned RssAnon into reclaimable RssFile — which is why the table above shows RssAnon dropping 1855 → 360 MB while RssFile rises.

Design points:

  • The file grows ~128 MiB at a time (ftruncate + fixed mmap windows at increasing offsets; windows never move, so pointers stay valid). File size and VSZ track actual usage — no fixed reservation, no hard ceiling.
  • Crash-safe by construction: the arena is pure scratch, rebuilt from leveldb every start (O_TRUNC at startup, unlink on clean shutdown).
  • If a chunk can't be mapped (disk full, unsupported FS) → heap fallback, no abort — the node degrades to stock behavior instead of dying.
  • At load the historic chain is proactively hinted cold (madvise MADV_COLD, MADV_DONTNEED fallback; a recent window stays hot), so its pages are reclaimed early rather than only under memory pressure.

2. CBlockIndex is split, and rebuildable data is pruned — why a split

Most of a CBlockIndex is data that exists byte-for-byte in the block files on disk (equihash solution, merkle root, nonce, collateral, block-sig). Keeping ~2.67 M copies of it resident is paying RAM for a second copy of the disk. The struct is split:

  • Resident (small, NOT reconstructable from the block; read by init-time consensus checks): chain work/tx/skiplist/status, shielded value pools, nCachedBranchId, Sprout anchors, the VRF output, the cached PON hash.
  • Prunable HeaderData (rebuildable from disk): merkle root, final sapling root, nonce, solution, nodes-collateral, block-sig.

HeaderData is resident only for a recent window of entries near the tip. At load it is freed for the whole historic chain (right after each entry's consistency check, not after ActivateBestChain — cutting the init-time memory transient ~2.1 GB → ~1.4 GB); at runtime each entry's HeaderData is freed once it ages past the window. It is rehydrated from the block files on demand (reorgs into pruned territory, header serving, rescans). Pruning frees the separately-allocated HeaderData (~112 bytes) per entry; the resident skeleton (~312 bytes) stays.

3. jemalloc — why the allocator matters here

The HeaderData prune frees ~2.5 million small allocations right after load. glibc malloc keeps those freed pages in its arena (they stay in RssAnon, invisible savings); jemalloc actually returns freed pages to the OS and fragments far less under this churn. Without it, optimization #2 would shrink our logical usage but not the number the node operator (and the OOM killer) sees.

Mechanics: fluxd links the system libjemalloc.so.2 dynamically (a malloc replacement needs dynamic linking for symbol interposition; the deb package will declare libjemalloc2 as a dependency). Detection is autotools (AC_CHECK_LIB), enabled by default, graceful fallback with a warning if absent, --disable-jemalloc to opt out; Windows/macOS keep their system allocators. Deploy check: ldconfig -p | grep jemalloc.

Fork choice stays resident (interaction with #284)

CBlockIndexWorkComparator tie-breaks same-height PON blocks via ComparePonForkChoice: VRF blocks compare by the committed nodesVrfOutput, legacy PON blocks by a PON hash cached on the index entry at load/creation (hashPON). Both scores are deliberately resident fields: recomputing from GetBlockHeader() would hash a zeroed collateral for pruned entries (wrong tie-break vs the network) and would mutate a comparator key in place for entries already inside setBlockIndexCandidates. No disk-format change.

Crash safety and correctness

The arena/prune machinery adds new failure surfaces a crash can expose — a flushed index entry may be header-pruned, and the fluxnode sync-marker and the coins DB are written on independent cadences, so a crash can leave them at different heights. These are the guarantees the branch holds:

  • Crash recovery is exact. The fluxnode cache writes a sync-state marker at its persisted tip while the coins DB flushes on its own (~24 h) cadence, so a crash can leave the two at different heights. On restart the node rewinds fluxnode state along the marker's chain (DisconnectFluxnodeOnly) to the consistent height and replays to tip; it hard-fails rather than continue on a cache that already contains a block's effects, caps rewind depth at the undo-retention window, and aborts init before ActivateBestChain if recovery fails. An fFluxnodeCacheRecovered gate keeps init-time flushes (including RewindBlockIndex's forced flush) from overwriting the marker before recovery has read it.
  • A pruned entry is never persisted as zeroes. The flush path rehydrates header data from disk before serialization and aborts on read failure; prunes skip header-only entries. PersistToDisk honours fForce, so a stale marker is repaired rather than silently kept.
  • Reorgs into pruned territory recover rather than crash. Undo records are written for delegate-only blocks and retained by height across all chains, so fork-side records survive reorgs; pruned header data is rehydrated from the block files on demand.
  • Pruning is invisible to the wallet. Confirmed transactions are not reported as conflicted after a fluxnode restart — the merkle check reads the rehydrated root, not a zeroed one.
  • Header serving is cheap and lock-light. Serving reads only the header prefix from disk (no tx parse, no proof recheck) instead of a full block, and rehydration runs outside cs_main — a 2000-entry cmpheaders request no longer triggers 2000 full-block deserializations and Equihash rechecks under the lock.
  • Consolidated pruned-header reads. The scattered pruned-header fallbacks now resolve through the exported GetFullBlockHeader, over a shared ReadBlockHeaderFromDisk primitive.

Compatibility / risk

  • No consensus change. No reindex. On-disk CDiskBlockIndex serialization is byte-identical to Feat/pon vrf integration #284's (moved fields keep their serialized position/order; nodesVrfOutput's VRF-gated READWRITE preserved verbatim; memory-only fields stay memory-only).
  • Non-fluxnodes: no optimization; functionally identical to Feat/pon vrf integration #284, with a small overhead from the HeaderData split (two allocations/block instead of one, ~50–85 MB at current height + a pointer indirection).
  • Init still peaks ~1.4 GB (the arena bounds steady state, not init). A ≤2 GB node should init with headroom, then run capped.
  • Reorg/invalidateblock into pruned territory triggers on-demand disk reads (rehydration) rather than crashes.
  • Pruned + unreadable header fields surface at the header RPC/REST endpoints: when an entry is header-pruned and its block file cannot be re-read, getblockheader (non-verbose) errors rather than returning a stale in-memory header, blockheaderToJSON emits empty strings for the affected fields, and /rest/headers serializes the resident-only view. These differ from Feat/pon vrf integration #284 only on a missing/unreadable block file.
  • New runtime dependency on Linux: system libjemalloc.so.2 (libjemalloc2 package) — graceful fallback if absent, but the memory numbers above assume it.

Testing

Done

  • flux-gtest full suite (on the Feat/pon vrf integration #284 base): green, including Feat/pon vrf integration #284's 7 VRF tests (fork-choice, serialization, ECVRF prove/verify) which exercise the resident-score fork-choice merge.
  • Custom gtests (all green): CDiskBlockIndex serialization round-trip (no-reindex guard), prune→restore serializes byte-identically, resident state incl. nCachedBranchId + cached PON hash survives the prune, CBlockIndexPool alloc/exhaustion/Contains/DestroyAll, forced PersistToDisk writes the sync marker on a clean cache, the resident VRF output (nodesVrfOutput) survives CBlockIndex copy/assignment and a v101 CDiskBlockIndex serialize round-trip (the disk-write path the fork-choice comparator depends on), and ExistsBlockUndoFluxnodeData distinguishes an absent fluxnode undo record from an empty one (the recovery missing-record check).
  • New regtest qa/rpc-tests/fluxnode_cache_recovery.py (green): asserts all four recovery shapes — clean restart skips recovery; a stale marker is repaired exactly once; a marker behind the tip disconnects/replays to the same tip; a marker on a stale fork triggers the fluxnode-only rewind along the marker's chain and converges to the best tip.
  • ThreadSanitizer dynamic race check (header serving): a regtest fluxnode built with -fsanitize=thread served /rest/headers from multiple HTTP worker threads over the header-pruned region while the validation thread pruned aged header data on every block connect (PruneAgedHeaderDataFreeHeaderData); TSan reported no data race on the prunable HeaderData, the block index, or the arena — confirming the serving path snapshots header fields under cs_main and rehydrates from disk lock-free, never touching pHeaderData off-lock.
  • Live validation on mainnet fluxnodes: clean init/sync, value pools intact, ~500 MB under a 500 MB no-swap cgroup cap. A ~22 h soak of the pre-rebase (master-based) head across 6 nodes: RssAnon flat 356–467 MB, VmSwap 0, all at tip, host reboots recovered with "no recovery needed".
  • Multi-day fleet soak on mainnet fluxnodes: RssAnon bounded in the ~425–500 MB band, VmSwap 0 (one memory-pressured host evicts cold anon pages to swap, by design), all at tip, no restarts.
  • Mechanism re-confirmed live (current height): a direct read of the resident page state on a running mainnet fluxnode confirms the central claim in the field — the block index is resident as clean, file-backed page cache (blockindex.arena, Dirty=0, counted in RssFile, kernel-evictable), VmSwap 0; RssAnon oscillates within a bounded band and fully reclaims after each per-block allocation spike, with no monotonic growth across the soak.
  • Wallet rescan over the bounded index (mainnet fluxnode on this head): a watch-only address was rescanned over a bounded window and its getreceivedbyaddress cross-checked to the satoshi against the independent address index (getaddressdeltas, coinbase-adjusted). This exercises the bounded-index historical read path — per-block CBlockIndex lookup + ReadBlockFromDisk + the GetFullBlockHeader nSolution disk fallback; RssAnon stayed bounded, VmSwap 0.
  • Crash recovery under a real kill mid-rescan: the daemon was hard-killed (SIGKILL) twice with the wallet mid-write; both restarts recovered cleanly — chainstate behind the block files, fluxnode-cache UNDO PREPARE to a consistent height then replay to tip, no reindex and no corruption.
  • Stress suite (2026-06-12, qa/stress/): restart storm (13/13 boots clean, "no recovery needed", post-settle RssAnon 318–449 MB); memory-pressure torture (500 MB MemoryHigh + MemorySwapMax=0, VmSwap 0 in every sample, arena pages evicting/faulting as designed); header-serving storm (exposed multi-second cs_main holds in getheaders serving → fixed in this branch: disk rehydration now runs outside cs_main, RPC stays at idle latency, 7–13 ms, while serving 40 k cold headers); rehydration churn and a 12-kill crash matrix on regtest; and mixed fork-choice at the VRF boundary — all green.
  • Fresh-peer IBD served only by patched nodes: a brand-new node synced genesis→tip (progress=1.0) into a clean datadir sourcing only from two patched nodes, with no OOM and no crash and dbcache bounded (~89 MiB resident at tip); random-height block-hash cross-checks against a serving node matched throughout, and the serving nodes held at tip with bounded RssAnon.
  • Also carries one test-only fix unrelated to the memory work: the CachedWitnessesCleanIndex case in test_wallet.cpp decremented note witnesses at a hardcoded height that could coincide with the chain tip under Flux's MAX_REORG_LENGTH=40; the decrement point is now derived from WITNESS_CACHE_SIZE so it stays below the tip. No production witness code is touched.

Remaining gate

The fleet soak, stress suite, fresh-peer IBD, and wallet-rescan validations above are complete. The only remaining gate is the #284 merge dependency (shared protocol bump 170022) — this PR must land after it.

Reviewer guidance — scrutinize

  • CDiskBlockIndex::SerializationOp field order vs. Feat/pon vrf integration #284 (must be byte-identical, including the VRF gating).
  • Every pHeaderData-> access: guarded, rehydrated, or pre-prune (see EnsureHeaderDataFromBlock/FromDisk, GetFullBlockHeader).
  • The fFluxnode gating points (arena create, both prune sites).
  • CBlockIndexPool lifetime: Contains()-based pool-vs-heap discrimination in cleanup paths; chunk-map failure → heap fallback; O_TRUNC/unlink cleanup; mmap-window offsets stay valid as the file grows.
  • ComparePonForkChoice reads only resident fields (nodesVrfOutput, hashPON).

Deployment notes

  • Linux runtime dependency: libjemalloc2 (ldconfig -p | grep jemalloc); the deb should declare it. Without it fluxd runs but RssAnon savings shrink (glibc retains freed pages).
  • Pair with dbcache=200 (bounds the anonymous UTXO cache, the dominant remaining RssAnon; a per-tier tunable — lower for Cumulus/Nimbus).
  • Exclude blockindex.arena from datadir backups/snapshots — a visible scratch file regenerated from leveldb each run.
  • Rollback = restore the previous binary; no on-disk migration in either direction.

🤖 Generated with Claude Code

blondfrogs and others added 30 commits June 8, 2026 10:49
Vendors the ECVRF-SECP256K1-SHA256-TAI (CFRG VRF draft-05, suite 0xFE) module
from aergo/secp256k1-vrf (MIT) into the bundled libsecp256k1 as an optional
module (--enable-module-vrf, enabled in the root build), and adds src/crypto/
ecvrf.{h,cpp} as the C++ boundary (ECVRF_Prove/ECVRF_Verify over CKey/CPubKey).

This is the cryptographic primitive for the PON VRF leader-election fix that
closes the leader-election grinding vulnerability: block eligibility becomes
y = VRF(operator_sk, epoch_seed) <= target, which the proposer cannot grind.

Verified: builds under Flux's exact secp256k1 flags (--with-bignum=no) and
reproduces the published draft-05 test vector byte-for-byte (prove/verify/
proof_to_hash); cross-checked against Witnet vrf-rs and an independent Python
reference. Constant-time audit of secret paths still pending before activation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the grindable PON eligibility lottery (GetPONHash over the proposer-
chosen prevBlockHash) with VRF-based eligibility, gated by UPGRADE_PON_VRF:

    eligible(C)  <=>  y <= target,  y = VRF(operator_key, epoch_seed)

y is unforgeable (operator secret key) and seeded by a buried block window the
proposer did not author (GetEpochSeed), so a producer can no longer shape its
own block to win the next lottery. Builds on the ECVRF primitive + ecvrf C++
boundary added in the previous commit.

- block.h: PON_VRF_VERSION=101; nodesVrfOutput + nodesVrfProof header fields,
  committed under SER_GETHASH (covered by the operator signature).
- consensus/params.h, upgrades.cpp, chainparams.cpp: UPGRADE_PON_VRF
  (NO_ACTIVATION_HEIGHT on all networks for now).
- pon-fork: IsPONVRFActive().
- pon.cpp: GetEpochSeed (buried-window accumulator); VRF eligibility in
  CheckPONBlockHeader; proof verification (recomputed beta == nodesVrfOutput)
  in ContextualCheckPONBlockHeader.
- pon-minter.cpp: compute the VRF proof with the operator key; coordinate via a
  self-computable priority (lower y => shorter delay) since other nodes' VRF
  outputs are unknowable; set the header fields before signing.

Pre-activation blocks use the legacy GetPONHash path unchanged. fluxd builds.
NOT yet exercised on a regtest/testnet fork; coordination/liveness and the
constant-time audit are pending (see pon-vrf/REVIEW.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes competing same-height VRF blocks resolve deterministically so the network
converges instead of forking back and forth:

- CBlockIndexWorkComparator (main.cpp): for PON_VRF blocks at equal work/height,
  break ties by lowest nodesVrfOutput. The VRF output is un-grindable, so unlike
  GetPONHash (depends on proposer-chosen nTime, grindable to win ties) an attacker
  cannot bias which competitor wins. Legacy PON blocks keep the GetPONHash
  tie-break (mixed-version forks around activation).
- block.h: commit only the VRF output to the block hash; exclude the proof (like
  the signature) — the proof is self-validating against the committed output.
- chain.h / txdb.cpp: store nodesVrfOutput in CBlockIndex + CDiskBlockIndex so the
  comparator can read it and GetBlockHash() recomputes correctly across restarts.

The minting-delay coordination (previous commit) is now only orphan reduction;
convergence/safety rests on this deterministic comparator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gence

Extracts the PON fork-choice tie-break from the anonymous-namespace comparator in
main.cpp into a public, testable function ComparePonForkChoice (pon.cpp). The
comparator now delegates to it, so the tests exercise the real deployed logic.

Adds gtests (test_pon.cpp) verifying the convergence guarantee:
- lowest VRF output is preferred (deterministic winner among competitors),
- antisymmetric (swap args -> sign flips: all nodes agree),
- deterministic (same inputs -> same result),
- equal outputs -> undecided (fall back to first-seen).

All 22 PONTest cases pass (flux-gtest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds gtests (test_pon.cpp) exercising the VRF block lifecycle with unbypassed crypto:
- VrfBlockHeaderSerializationRoundTrip: PON_VRF header serializes/deserializes intact
  and the hash is stable.
- VrfOutputCommittedProofExcludedFromHash: changing the proof does not change the block
  hash (excluded, like the signature) while changing the VRF output does (committed) —
  pins the design that lets CBlockIndex store only the 32-byte output.
- EcvrfProveVerifyRoundTrip: real ECVRF_Prove -> ECVRF_Verify round trip (the same crypto
  ContextualCheckPONBlockHeader runs); tampered proof, wrong key, and wrong seed are all
  rejected; proving is deterministic (RFC 6979).

25 PONTest cases pass (flux-gtest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roduced

Live regtest testing revealed CreateNewBlock assembled a v100 PON block and ran
TestBlockValidity on it BEFORE the minter/generate set the VRF fields — so once
PON-VRF is active, block production failed with 'bad-pon-...' (version below
PON_VRF_VERSION). The header build + validity check were producing/validating a
block that could never pass the VRF eligibility rules.

Fix: in CreateNewBlock, when PON-VRF is active, set nVersion = PON_VRF_VERSION and
compute nodesVrfOutput/nodesVrfProof (via the operator key, or a deterministic
placeholder when no key is configured, e.g. regtest generate) BEFORE
TestBlockValidity. The minter and the regtest generate RPC now rely on this single
authoritative path (generate's redundant post-assembly block removed).

Verified on regtest: 'generate' past the PON-VRF activation height produces v101
blocks that pass validation (v100 before activation, v101 after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…support

Three consensus/relay fixes (found via live testnet block production), all the same
root cause: the committed VRF output must be handled everywhere a block header is
built, hashed, validated, or relayed.

1. Per-slot VRF eligibility (pon.{h,cpp}, pon-minter.cpp, miner.cpp): the VRF input
   is now H(epoch_seed || slot) (GetPonVrfMessage) instead of just epoch_seed. Without
   the slot, a node's eligibility was constant for an entire epoch (eligible every slot
   or none) — no leader rotation, broken liveness. The slot carries only the minor,
   already-acknowledged ~10-slot future-time grind; the large prevBlockHash/coinbase
   grind remains eliminated. Minter and ContextualCheckPONBlockHeader use it consistently.

2. CheckBlockHeader (main.cpp): for PON-VRF blocks, check the committed VRF output
   (nodesVrfOutput) against target, not the legacy GetPONHash. The legacy value is
   meaningless for VRF blocks and rejected ~half of valid v101 blocks as 'high-hash'.

3. CCompactBlockHeader (block.h): serialize the VRF output/proof for PON-VRF blocks.
   It was omitted, so a peer decoded a v101 compact header with a null VRF output,
   recomputed the wrong block hash, and rejected the chain as 'non-continuous
   cmpheaders sequence' — breaking header sync between nodes.

Verified on a local testnet: a confirmed fluxnode mints v101 VRF blocks with clean
production (0 high-hash) and a second node syncs the VRF chain (0 non-continuous).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same root cause as the CheckBlockHeader fix: ReadBlockFromDisk re-validated every PON
block against the legacy GetPONHash, so ~half of v101 blocks failed on disk-read with
'Errors in block header' — crashing the node shortly after it minted a VRF block. For
PON-VRF blocks, check the committed nodesVrfOutput against target instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth and final instance of the same root cause: the on-disk block-index verification
(LoadBlockIndex) re-checked every PON block against the legacy GetPONHash, so ~half of
stored v101 blocks failed on startup with 'Error loading block database', preventing a
node from restarting once it had synced/minted VRF blocks. Use the committed
nodesVrfOutput for PON-VRF blocks. All header-eligibility check sites now agree:
CheckPONBlockHeader, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sets the testnet PON_VRF upgrade to a placeholder height (9999999) so the activation
switch is staged in one obvious place. This is NOT a real schedule.

ACTION REQUIRED before tagging a testnet release:
  - Replace 9999999 with a concrete testnet height comfortably above the current tip,
    giving the fleet time to upgrade first.
Mainnet and regtest remain NO_ACTIVATION_HEIGHT (inert) and are unchanged here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PON-VRF changes the wire serialization: v101 block headers carry the VRF output
(committed) + proof, and the cmpheaders compact-header format carries them too. That
must be a distinct protocol version so VRF-capable nodes are distinguishable from
prior 170021 (compact-headers) nodes and can be gated at activation.

- PROTOCOL_VERSION: 170021 -> 170022 (VRF-capable nodes advertise this)
- UPGRADE_PON_VRF.nProtocolVersion: 170020 -> 170022 (all networks) so peers below
  170022 are rejected once PON_VRF activates, guaranteeing all connected peers speak
  the VRF wire format. UPGRADE_PON stays 170020 (unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…under shared operator keys

The VRF message was H(epoch_seed || slot), keyed by the operator key — but
operator keys are shared across an owner's fleet in practice (review finding).
With the key alone, N same-key nodes compute the identical VRF output, which:

1. Collapses N lottery draws into one, shrinking the fleet's share of block
   production N-fold. Minting pays the dev fund (not the minter), so this is
   a leadership/liveness distortion — block production silently concentrates
   in uniquely-keyed operators — rather than lost operator revenue.
2. On a win, makes all N nodes eligible with the same VRF-derived priority
   delay, so they broadcast competing blocks simultaneously (broadcast storm).
3. Voids the lowest-VRF fork-choice tie-break — the outputs are identical, so
   convergence degrades to first-seen on every such win.

The message is now H(epoch_seed || slot || collateral). The collateral outpoint
is the canonical per-node identity and is already committed in the header
(nodesCollateral) and already used by the verifier to look up the operator
pubkey, so verification needs no new wire data. The outpoint is fixed at node
registration — before any future epoch seed exists — so it adds no grinding
surface beyond the known key-grinding residual.

Adds gtest VrfMessagePerNodeUnderSharedOperatorKey: distinct collaterals yield
distinct messages and independent verifiable outputs under one shared key, and
a proof for node A does not verify as node B. 26 PONTest cases pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LogPONEligibility predicts per-node eligibility with the legacy GetPONHash
formula, which is dead once VRF leader election activates — and under VRF
other nodes' eligibility cannot be computed at all (each draw needs that
node's secret key). Anything it printed post-activation would be actively
misleading to operators debugging minting from logs.

Log-only change, gated on the same IsPONVRFActive height check as the
consensus paths: no behavior change before activation on any network. Also
skips a full confirmed-fluxnode-cache iteration per connected tip after
activation. 26 PONTest cases pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reverts 85d252e to make way for the incremental PersistToDisk
approach with full crash recovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…odes

Fluxnode cache dirty entries were accumulated for up to 24 hours and
flushed in a single operation holding cs_main. On memory-constrained
CUMULUS nodes the fluxnode maps get swapped out, causing thousands of
page faults during the flush — stalling RPC for minutes and triggering
watchdog kills.

Two changes:

1. Incremental fluxnode persistence: new PersistToDisk() writes dirty
   entries to LevelDB every 10 blocks using a batched write. Only holds
   the fluxnode lock, not cs_main. DumpFluxnodeCache is removed from the
   periodic flush path (kept only for shutdown via fForce).

2. Block index solution pruning: on fluxnodes, clear equihash solutions
   from PoW block index entries after load. Serialization paths (RPC,
   REST, P2P getheaders) fall back to reading from block files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The block index solution pruning needs fFluxnode to be set before
LoadBlockIndexDB runs. Move the assignment from after genesis wait
to before LoadBlockIndex in the init sequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move nNonce, nSolution, nodesCollateral, and vchBlockSig out of
CBlockIndex into a separately-allocated HeaderData struct. On fluxnodes,
this allocation is freed for all block index entries after load, saving
~116 bytes per entry across 2.5M blocks (~290 MB).

hashMerkleRoot and hashFinalSaplingRoot remain in the core struct as
they are accessed during consensus validation and reorgs.

All serialization paths (RPC, REST, P2P getheaders) fall back to
reading from block files when HeaderData is null.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move hashSproutAnchor, hashFinalSproutRoot, nSproutValue,
nChainSproutValue, nSaplingValue, nChainSaplingValue, and
nCachedBranchId into CBlockIndex::HeaderData alongside the proof
fields. On fluxnodes, HeaderData is freed for blocks deeper than 100,
saving ~244 bytes per entry across 2.5M blocks (~610 MB).

The 100-block retention depth provides margin beyond the max reorg
depth of 40 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missed access sites for hashFinalSproutRoot, nChainSproutValue,
nSproutValue, nChainSaplingValue, nSaplingValue in getblock and
getblockchaininfo RPCs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pruning was inside LoadBlockIndexDB which runs before block
rewinding. Rewinding accesses HeaderData fields on blocks that had
already been pruned, causing a crash loop.

Move to init.cpp after ActivateBestChain completes so all chain
operations are done before we free the data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jemalloc is significantly better than glibc malloc at returning freed
memory to the OS and avoiding fragmentation. This is critical for the
HeaderData split where 2.5M allocations are freed after load — glibc
retains the freed pages in its arena, jemalloc returns them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These fields are only accessed during reorgs (DisconnectBlock) and
wallet merkle verification — both limited to recent blocks where
HeaderData is retained. Moving them saves 64 bytes per entry across
2.5M blocks (~160 MB) on fluxnodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Data split

Replace individual new/delete of CBlockIndex with a contiguous mmap-backed
pool allocator. Benefits:
- Zero per-element malloc overhead across 2.5M+ entries
- Contiguous layout: old blocks at low addresses, recent at high
- MADV_COLD hint tells the kernel which pages are cold after sync
- OS manages working set: pages out old blocks under memory pressure
- No heap fragmentation from millions of small allocations

This reverts the HeaderData struct split (commits 1798652..7ed2691) which
added null checks on every access site and caused build/crash issues.
The mmap pool achieves the same goal (OS-managed memory for old blocks)
without changing the CBlockIndex field layout.

Also:
- Reserve mapBlockIndex capacity before loading to avoid rehashes
- Store block hashes in parallel pool array instead of relying on
  pointer-into-map-key (safer, pool-controlled memory)
- Graceful fallback to heap allocation if mmap fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The git checkout reverted chain.h but main.cpp retained pHeaderData
references from the earlier HeaderData commits. Restore direct field
access on CBlockIndex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Combines the mmap pool allocator with the HeaderData struct separation.
The pool provides contiguous allocation with OS-managed paging for old
blocks. HeaderData provides deterministic memory reduction by freeing
the extended fields for buried blocks on fluxnodes.

Together: the pool eliminates malloc overhead and fragmentation, jemalloc
returns freed HeaderData pages to the OS, and MADV_COLD hints the kernel
about cold pool pages. CBlockIndex shrinks from 424 to ~112 bytes for
pruned entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use AC_CHECK_LIB to probe for jemalloc at configure time. Enabled by
default, falls back gracefully with a warning if not found. Follows
the same pattern as the existing Proton and ZMQ dependencies.

Build with --disable-jemalloc to explicitly skip it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…very

CMainCleanup's destructor called delete on CBlockIndex pointers that
were placement-new'd into mmap pool memory, not heap-allocated. This
caused SEGV/SIGABRT in jemalloc/glibc during process exit. Fix by
using DestroyAll to run destructors on pool entries, then freeing the
pool itself.

Add RecoverFluxnodeCache startup check: compares the FluxnodeSyncState
marker (written by PersistToDisk) to the chain tip. On mismatch (unclean
shutdown, power cut, OOM kill), disconnects the stale blocks and lets
ActivateBestChain reconnect them through normal ConnectBlock, rebuilding
the fluxnode cache correctly. Cost: a few seconds to replay ≤10 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the sync state block is on a different fork (crash happened mid-reorg),
walk back from the sync state block through pprev to find the common
ancestor with the active chain, then disconnect to there. ActivateBestChain
reconnects everything through normal ConnectBlock, rebuilding the cache.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MorningLightMountain713 and others added 25 commits June 12, 2026 07:47
…active chain

Fixes M1 (HIGH) from the memory-branch adversarial review; implements
the design validated 2026-06-10 (MEM_REVIEW_FINDINGS.md §M1).

The periodic write path commits the fluxnode DB and its sync marker at
the in-memory tip every 15-20 minutes, while the coins DB flushes ~24h
apart. A crash in that window restarts with chainActive BEHIND the
marker. Old recovery treated marker-ahead as success (common ancestor
== tip -> nothing to disconnect -> return true), so ActivateBestChain
replayed blocks onto a cache that already contained their effects:
UPDATE_CONFIRM undo records rebuilt from the already-updated cache
overwrote the correct on-disk records (permanent corruption), starts
were re-inserted into the tracker, and the confirm path reset
nLastPaidHeight (payout-order divergence). The fork case had the same
hole: fork-side effects were never rewound.

Recovery now:
- factors the fluxnode portion of DisconnectBlock into
  DisconnectFluxnodeOnly (undo-record read -> AddBackUndoData ->
  CheckForUndoExpiredStartTx -> reverse-tx loop -> delegate push, in
  exactly that order; no coins/chainstate access);
- walks the MARKER's chain from the marker down to the common ancestor
  with the active chain, undoing each block's fluxnode effects into a
  fresh local cache flushed per block (per-block Flush is required:
  setAddToConfirmHeight semantics and AddBackUndoData's already-in-
  local guard both assume it, mirroring DisconnectTip);
- runs the existing chainstate disconnect loop unchanged (verified
  convergent on a cache that never applied those blocks);
- persists the repaired state with PersistToDisk(tip, fForce=true)
  (depends on the M7 fix) and clears the RPC list cache;
- hard-fails (was: return true with a corrupt cache) on missing common
  ancestor, unreadable block, or a rewind deeper than the 5040-block
  undo retention window (matching M12), telling the operator to
  -reindex. init now aborts BEFORE ActivateBestChain on recovery
  failure - previously the error string was only checked afterwards,
  by which point ABC had already replayed onto the corrupt cache and
  overwritten good undo records.

Crash mid-recovery is safe: phase 1 mutates only the in-memory cache
(disk marker unchanged until the final atomic batch), so recovery
re-runs identically on the next start.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fixes M3 from the memory-branch adversarial review
(MEM_REVIEW_FINDINGS.md): GetDepthInMainChainINTERNAL compared the
tx's merkle branch against pHeaderData->hashMerkleRoot, substituting
uint256() when the header data had been pruned. CheckMerkleBranch never
returns zero for a real tx (nIndex == -1 already early-returns), so the
check always failed for txs buried in pruned-header blocks: depth 0,
and GetDepthInMainChain turned that into -1 — every confirmed wallet tx
reported as CONFLICTED. fMerkleVerified is memory-only and
vMerkleBranch is serialized, so this re-fired on every restart of a
fluxnode with a non-empty wallet (the wallet is not disabled by
-zelnode, only by -prune).

Uses the same read-from-disk fallback as the rescan path
(wallet.cpp ChainTipAdded caller); the read happens once per wtx per
session (fMerkleVerified caches the result). A failed disk read keeps
depth 0 rather than verifying against garbage.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…reads

Fixes M4 (HIGH) from the memory-branch adversarial review
(MEM_REVIEW_FINDINGS.md): serving getheaders (160/message) or
cmpheaders (2000/message) from a fluxnode's pruned block index called
GetFullBlockHeader per entry, which fell back to ReadBlockFromDisk —
deserializing EVERY transaction in each block and re-running the
Equihash/PoW or PON proof check — all under cs_main. A single syncing
peer could pin cs_main for seconds per message and turn a tiny request
into tens-to-hundreds of MB of disk reads (cheap remote amplifier).
The compact path was reading full blocks to recover an nSolution that
CCompactBlockHeader then omits.

A CBlock on disk serializes its CBlockHeader base first, so the new
ReadBlockHeaderFromDisk deserializes only the header prefix at
nDataPos: no transaction parsing, no proof recheck (the block was fully
validated at accept; the reconstructed hash is still verified against
the index entry, the same integrity check the full read performed).

GetFullBlockHeader and EnsureHeaderDataFromDisk now use it, which also
removes the full-block read from DisconnectBlock's pruned-pprev anchor
restore and the M2 flush-side guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ives pruning

Fixes M5 from the memory-branch adversarial review
(MEM_REVIEW_FINDINGS.md): CBlockIndexWorkComparator computed
GetPONHash(GetBlockHeader()) per comparison. For entries whose header
data was pruned, GetBlockHeader() returns a zeroed nodesCollateral —
the distinguishing PON input — so after a restart (load prunes ALL
entries before they are inserted into setBlockIndexCandidates),
equal-work PON ties could resolve opposite the rest of the network.
Worse, ConnectBlock/DisconnectBlock restore header data on entries that
may still sit in the candidate set: an in-place comparator-key change
violates strict weak ordering and lets erase-by-key silently fail when
an equal-chainwork same-height sibling coexists.

The PON hash (32 bytes) is now a resident memory-only CBlockIndex
member, computed where the header is guaranteed complete: at index load
(it was already being computed there for the proof check, before the
prune) and at entry creation (AddToBlockIndex, cmpheaders). The
comparator reads the cached value, so its key never depends on
pHeaderData and never mutates. No disk-format change: CDiskBlockIndex
serialization is untouched (round-trip gtest unchanged and green).

Note: trial/mem-on-pr284 is immune (resident nodesVrfOutput); this fix
is specific to this branch's HeaderData split.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…op stale arena warning

Fixes M8 and M9 from the memory-branch adversarial review
(MEM_REVIEW_FINDINGS.md).

M9: the pruned-header disk fallback was hand-rolled in five places with
already-divergent behavior (rest.cpp best-effort, getblockheader
hex-mode throwing only on one branch, blockheaderToJSON field-by-field
with up to two full-block reads per header, wallet assert-style).
GetFullBlockHeader is now exported via main.h with one deliberate
contract: it fills the header (re-reading the header prefix from disk
when pruned or when nSolution was omitted) and returns false on a
failed read, leaving the partial in-memory view in place. P2P/REST
serving stays best-effort; getblockheader hex mode throws
RPC_INTERNAL_ERROR on any failed read; blockheaderToJSON emits empty
strings (as before); the wallet sites fail their respective checks.
All callers now benefit from the M4 header-prefix read instead of
full-block deserialization.

M8: delete the "arena over 90% full, bump POOL_CAPACITY" warning. It
fired once on every mainnet fluxnode at ~90% of the FIRST 128MiB chunk
and named a constant that no longer exists — the segmented arena grows
on demand. Real exhaustion still warns via the heap-fallback path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Exercises RecoverFluxnodeCache end-to-end by rewriting the sync marker
directly in determ_zelnodes (plyvel) between restarts:

- clean restart skips recovery
- stale marker is repaired exactly once (M7: the forced persist must
  write the marker even with a clean cache — asserted by reading the
  marker back from leveldb) and the next restart skips recovery
- marker behind the active tip triggers the chainstate disconnect and
  the node replays back to the same tip
- marker on a stale fork triggers the fluxnode-only rewind along the
  marker's chain (M1 phase 1) plus the chainstate disconnect, and the
  node converges to the best tip with the marker re-anchored there

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… completion fails

Fixes a regression introduced by the M9 consolidation, caught by the
per-fix gtest run (five wallet tests newly failing: FindUnspentSproutNotes,
SproutNullifierIsSpent, SaplingNullifierIsSpent,
NavigateFromSaplingNullifierToNote, SpentSaplingNoteIsFromMe).

The consolidated GetFullBlockHeader returned false whenever any disk
read failed — including the nSolution-completion read for a RESIDENT
POW header with an empty solution. Callers that only need the header
fields (the wallet merkle-depth check, the rescan sapling-anchor
lookup) then treated a perfectly valid resident header as missing.
Before M9 those sites read the resident fields directly and never
touched disk. The gtest wallet fixtures (in-memory blocks, empty
nSolution, no block files) hit exactly this.

GetFullBlockHeader now reports failure only when the header FIELDS are
unavailable (pruned entry and the disk read failed); a resident header
whose omitted nSolution could not be completed is returned as success
with the solution left empty — matching the pre-M9 per-field behavior
of every call site. The two wallet sites go back to using the resident
fields directly (no disk access at all when resident), falling back to
the cheap ReadBlockHeaderFromDisk only when pruned.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…B snapshots

The first version rewrote the sync marker with plyvel, but writes from a
modern plyvel/leveldb produce a MANIFEST the daemon's older bundled
leveldb silently ignores — fluxd never saw the rewritten marker (reads
are compatible, writes are not; verified by writing ff..ff, reading it
back with plyvel, and watching the daemon still report "sync state
matches chain tip"). The test now uses directory snapshots of
determ_zelnodes taken between restarts (only fluxd's own writes), a
second never-connected node to supply the unknown-marker DB for the
stale-marker scenario, and keeps plyvel strictly read-only for the
marker assertions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Caught by the new crash-recovery regtest: every startup "recovered"
cleanly even when the on-disk marker pointed somewhere else entirely.

FlushStateToDisk passed fForce=true to PersistToDisk. Now that fForce
is honored, that combination writes the sync marker even when the
fluxnode cache is clean — and FlushStateToDisk also runs during init
BEFORE RecoverFluxnodeCache (RewindBlockIndex ends with an
unconditional FLUSH_STATE_ALWAYS). The forced write overwrote the
on-disk marker with the current in-memory tip, destroying the
marker/chain divergence that tells recovery a crash happened and
leaving the stale fluxnode DB state in place unrepaired.

The flush path now persists unforced: dirty data still goes out with
the marker in the same atomic batch, and a clean cache leaves the
marker untouched. The forced write remains where it is the point —
recovery's stale-marker repair and post-rewind persist, and the manual
RPC flush.

Also rewords test comments to drop review-shorthand and historical
phrasing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Follow-up to the previous commit: leaving the flush path permanently
unforced meant that on a chain with no fluxnode transactions (regtest,
quiet chains) the cache is never dirty, so the sync marker would never
be written at all — recovery then has nothing to verify and the
marker-tracking guarantees degrade to nothing.

A new fFluxnodeCacheRecovered flag is set right after
RecoverFluxnodeCache succeeds at init. Flushes before that point
persist unforced (they must not overwrite the marker the recovery is
about to read); flushes after it force the marker write, keeping the
marker at the tip even when no fluxnode data is dirty.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nale

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… plyvel

Merely OPENING a leveldb directory with modern plyvel compacts the
write-ahead log into snappy-compressed table files. The daemon's bundled
leveldb is built without snappy, so on the next start it dies in
AppInit with "corrupted compressed block contents" — printed only to
stderr, which the test framework swallows, leaving -rpcwait hanging
forever. (Diagnosed from the file modes: the poisoned table was 0644,
plyvel's umask, among the daemon's 0600 files.)

Marker assertions now copy the DB directory to a throwaway path and
open the copy.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rges

Two isolated regtest nodes mine identical block hashes, so node1's
"foreign" tip was actually a block on node0's active chain — recovery
correctly treated the marker as merely behind (disconnect/replay)
instead of taking the stale-marker path the scenario asserts. Running
node1 on -mocktime gives its blocks different timestamps and therefore
hashes node0 has never seen.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… fork

Deterministic regtest mining strikes again: re-mining at the height of
a just-invalidated block reproduces that exact block (same parent, same
coinbase, same timestamp), which the daemon rejects as already-invalid.
Mock the clock forward for chain B's blocks so they hash differently.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ives restart

Blocks marked invalid by invalidateblock do not survive a restart in
the block index: LoadBlockIndexDB skips nCachedBranchId reconstruction
for blocks failing IsValid(BLOCK_VALID_CONSENSUS), so RewindBlockIndex
deems them insufficiently validated and erases them. The fork scenario
then exercised the stale-marker repair path instead of the marker-chain
rewind. A real crash-during-reorg leaves the losing fork's blocks fully
valid, so reconsiderblock the fork tip after mining the heavier chain —
the failure flags clear, no reorg happens (chain B has more work), and
the fork block survives the restart as recovery expects.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ardown restart

node1's blocks carry far-future timestamps from its mocked clock; the
final courtesy restart for framework teardown launched it with the real
clock, so startup verification rejected its own chain ("Corrupted block
database detected") and the framework's -rpcwait hung until timeout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ComparePonForkChoice must be a pure function of resident index fields:
legacy PON entries score by the cached hashPON, VRF entries by the
committed nodesVrfOutput. Cover the activation-boundary fork shape
(legacy vs VRF at the same height) and prove a pruned entry
(pHeaderData == nullptr) compares identically to an unpruned one.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Eight deliberate-path stress tests for the fluxnode memory work, per the
PR test plan: P2P header storm walking the full chain with wire-level
field validation, regtest rehydration churn (reorg rehydrate +
flush-side restore), regtest kill -9 crash matrix across persist
windows, and operator runbooks for memory-pressure capping, restart
storms, wallet rescans, and a fresh-peer IBD served exclusively by
patched nodes. Python pieces are python3/stdlib, uv-managed, ruff and
ty clean; regtest harness is self-contained (no python2 qa framework).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
blockindexpool.h unconditionally included sys/mman.h, which the
x86_64-w64-mingw32 target does not provide (CI windows build failure).
The POSIX includes move into the .cpp; on WIN32 the pool compiles as an
inert stub whose Initialize() reports failure, so every caller takes
its existing heap-fallback path and the node behaves like a stock
build. The pool is only ever engaged on fluxnodes, which are Linux.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verified live: fluxnode mode on regtest needs txindex=1, fluxbenchd /
fluxbench-cli files beside the daemon binary, and UNMANAGED_FLUXBENCHD
to skip launching them — the harness now provides all three. The crash
matrix accepts every legitimate RecoverFluxnodeCache outcome (a kill
before the first persist legitimately boots with no marker). The header
storm handles the version-gated server reply formats: legacy protocol
gets full 'headers' (solutions on the wire), current protocol gets
'cmpheaders' (PoW solution omitted, explicit block hash, PON field
order differs); PON block hashes are recomputed from the GETHASH
serialization and chain continuity is asserted via hashPrevBlock.

Both regtest tests pass on the build server; both header-storm modes
validated against a mainnet fluxnode.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
sys/statvfs.h does not exist on the mingw target. On Windows the check
is moot anyway: the pool's Initialize() reports failure there, so the
node already takes the heap path regardless of free disk.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- ibd_sync: parse verificationprogress numerically (arrives in scientific
  notation early in IBD; the textual match declared success at height 1326);
  pin dedicated rpcport/port so a shared host can't collide
- restart_storm: assert on the LAST RecoverFluxnodeCache line instead of a
  byte-offset log window (rotation/buffering produced a false failure while
  all 10 boots were actually clean)
- memory_pressure: mirror the production unit's identity and environment
  (User/Group, HOME for .zcash-params, UNMANAGED_FLUXBENCHD, MALLOC_ARENA_MAX);
  wait for the datadir's own daemon to release the leveldb lock before
  starting the capped instance (scoped by -datadir, NOT pgrep -x fluxd —
  other instances on the host are not ours); tolerate transient empty
  MainPID reads while the unit is active

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Serving headers for buried (pruned) entries re-reads them from the block
files; under memory pressure a cold 2000-entry compact batch takes whole
seconds, and the handler held cs_main for all of it — starving RPC and
validation (measured: multi-second getblockcount stalls and >20s peer
ping RTTs on a 500 MB MemoryHigh-capped node under a 6-walker header
storm).

The handler now snapshots the resident header fields and the entries
needing disk during a short cs_main hold (pure pointer-chasing), then
performs the rehydration reads lock-free. Index entries are never freed
at runtime and the read path uses only immutable fields. The snapshot is
taken under the lock because reorg paths rehydrate pHeaderData
concurrently; a lock-free reader could observe a half-filled allocation.
A failed read keeps the resident snapshot (best-effort serving,
unchanged). ReadBlockHeaderFromDisk's mismatch log now identifies the
block by its immutable hash for the same reason.

ProcessGetData has the same shape with full-block reads (worse) — left
for a follow-up: its loop interleaves tx/filter serving that genuinely
needs the lock.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eader data

The arena and the HeaderData prune only covered the load-time path: blocks
accepted at runtime took plain `new CBlockIndex` (pinned heap) in
AddToBlockIndex and the compact-header handler, and their rebuildable header
data was never freed after init -- so RssAnon drifted up ~0.5-0.7 MB/day with
chain growth (the deferred M10 finding).

Allocate runtime-accepted entries from the file-backed arena (reclaimable
RssFile) via a shared AddBlockIndexFromHeader helper that mirrors
InsertBlockIndex's arena/heap split, and free header data once an active-chain
entry ages past a hot window (PruneAgedHeaderData, hooked after UpdateTip in
ConnectTip), re-read from disk on demand exactly like the load-time prune --
fluxnode-gated and only for entries whose block data is on disk (header-only
entries cannot be rebuilt and must stay resident). Steady-state runtime RssAnon
growth is now bounded instead of climbing with the chain.

Adds a gtest for the prune decision (ShouldFreeAgedHeaderData) covering the
window boundary and the fluxnode / on-disk / resident-data guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DecrementNoteWitnesses only touches a note when its witnessHeight is at or
below the decremented height, so the test's assertion that a reorg leaves the
witness cache and final anchor untouched holds only when the decrement is
strictly below the tip. The test decremented at heights 5 and 50; with
MAX_REORG_LENGTH=40 the chain is WITNESS_CACHE_SIZE + 10 = 51 blocks, so 50 is
the tip and the decrement pops the live witness. Derive the deeper decrement
height from WITNESS_CACHE_SIZE so it stays below the tip regardless of the
reorg-length constant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit f140688)
@MorningLightMountain713 MorningLightMountain713 marked this pull request as ready for review June 17, 2026 10:33
MorningLightMountain713 and others added 4 commits June 17, 2026 12:02
When the arena cannot map a chunk (disk full, unsupported filesystem),
CBlockIndex entries fall back to heap allocation and live outside the pool.
The two shutdown paths (UnloadBlockIndex, CMainCleanup) destructed the
pool's own entries via DestroyAll but never deleted those heap-fallback
entries, leaking them at process exit (and under leak sanitizers). Both
paths now delete the non-Contains() entries before tearing the pool down,
mirroring the runtime erase path. Also corrects a stale arena comment in
the block-index-pool gtest header.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five issues found reviewing the file-backed block index / fluxnode recovery:

- CBlockIndex copy ctor and operator= dropped nodesVrfOutput. The hand-written
  copy operations (added to deep-copy pHeaderData) omitted this resident field,
  which is committed to the v101 block hash. The disk-write path builds
  CDiskBlockIndex(*pindex) through the copy ctor, so a persisted PON-VRF index
  entry would serialize a zeroed output, fail its hash check on the next load,
  and abort startup once PON-VRF activates. The copy ctor now delegates to
  operator= (one enumerated copy path) and operator= carries nodesVrfOutput.

- rest_headers dereferenced pHeaderData off raw CBlockIndex* after releasing
  cs_main; PruneAgedHeaderData frees that allocation at runtime, so a
  /rest/headers request racing a ConnectTip was a use-after-free on -rest
  fluxnodes. The bulk binary/hex path now snapshots resident headers under
  cs_main and rehydrates pruned entries from disk lock-free (mirroring the
  getheaders handler); the JSON path is built under the lock.

- RecoverFluxnodeCache could read an absent fluxnode undo record back as a
  silently-empty one and under-rewind, because the recovery depth cap and the
  CleanupOldFluxnodeData cutoff anchor to different heights. Recovery now probes
  ExistsBlockUndoFluxnodeData and fails closed to -reindex on a missing record
  (recovery-only flag; the live disconnect path is unchanged).

- getheaders served a zeroed header when the lock-free disk rehydration of a
  pruned entry failed; it now truncates the batch at that entry.

- RewindBlockIndex leaked the prunable HeaderData of arena-backed entries it
  removed (their destructor is skipped because arena slots are not freed
  individually); it now frees HeaderData explicitly.

gtests: nodesVrfOutput survives the CBlockIndex copy and the CDiskBlockIndex
round-trip (the coverage gap that let the first issue through), and
ExistsBlockUndoFluxnodeData distinguishes an absent record from an empty one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A second adversarial pass plus the existing recovery regtest surfaced that
three of the previous fixes needed tightening:

- The missing-undo-record fail-closed (recovery) was too strict. It forced a
  full -reindex on any rewound block with no undo record, but a block that
  changed no fluxnode state legitimately writes none. The recovery regtest
  (and pre-fluxnode-activation / low-activity histories) hit exactly that and
  failed to start. Scope the fail-closed to blocks at or below the retention
  cutoff (tip - ONE_WEEK_OF_BLOCK_COUNT), where an absent record could mean
  'pruned'; above it an absent record can only mean 'empty', so tolerate it.
  ONE_WEEK_OF_BLOCK_COUNT moves to fluxnodecachedb.h to be shared with recovery.

- rest_headers' JSON branch still held cs_main across the on-disk header reads
  (up to 2000 cold reads per request), reintroducing the validation-thread
  stall the rest of the change set removes. It now snapshots index pointers +
  resident headers under cs_main, rehydrates pruned entries from disk lock-free,
  and builds the JSON under the lock from the prefetched headers (a new
  blockheaderToJSON override that skips the disk read). No disk I/O under the lock.

- getheaders no longer sends an empty headers/cmpheaders batch when the first
  served entry is unreadable: an empty batch reads as 'end of chain, stop
  asking' to the peer, dropping this node as a header source. It now sends
  nothing and lets the peer retry elsewhere.

Verified: full gtest 252/252; the fluxnode_cache_recovery regtest passes all
scenarios (it deadlocked on the over-strict fail-closed before this).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PruneAgedHeaderData only sweeps the active chain, so a block's HeaderData
(rehydrated to disconnect it during a reorg) was retained in RAM indefinitely
once the block fell off the active chain — a bounded but unbounded-over-time
leak on fluxnodes. DisconnectTip now frees it when the block leaves the chain;
it is rebuilt from disk on demand and re-restored by ConnectTip if the block
is reconnected.

Surfaced by a second adversarial review pass; pre-existing, not a regression
from the other fixes. Verified: full gtest 252/252 and the fluxnode_cache_recovery
regtest (which exercises reorg disconnect) both pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants