Feat/pon vrf integration by blondfrogs · Pull Request #284 · RunOnFlux/fluxd

blondfrogs · 2026-06-09T18:46:20Z

PON-VRF: VRF leader election (closes the leader-election grinding vulnerability)

Summary

Replaces Proof-of-Node's grindable leader election with a verifiable random function (VRF) election that reuses the existing fluxnode operator key. Leadership is now an un-grindable, verifiable draw, and ties converge deterministically. Gated behind a new UPGRADE_PON_VRF network upgrade, so it ships inert until an activation height is scheduled.

Problem

PON leader eligibility was H(collateral, prevBlockHash, slot) ≤ target. Because that hash depends on block content the proposer controls, a proposer can grind its inputs to bias who is eligible to lead — there is no cooldown and no consensus rank check. This lets a sufficiently resourced node disproportionately influence leader selection.

Solution

VRF eligibility: y = VRF(operator_key, H(epoch_seed ‖ slot)); eligible iff y ≤ target.
- epoch_seed is derived from a buried block window (beyond the reorg horizon) that the current proposer could not have authored → not grindable.
- The slot is mixed in so eligibility is a fresh draw each slot (leader rotation / liveness).
Reuses the existing secp256k1 operator key — no new key material for operators.
The VRF output is committed to the block hash (so it is signed and immutable); the proof is carried but excluded from the hash (self-validating against the output).
Deterministic fork choice: at equal work/height, the lowest VRF output wins, so the network converges.

What's in this PR (by area)

Crypto: ECVRF (ECVRF-SECP256K1-SHA256-TAI) via a vendored secp256k1-vrf module + src/crypto/ecvrf.{h,cpp}.
Block format: PON_VRF_VERSION = 101; nodesVrfOutput (committed) + nodesVrfProof (excluded from hash).
Consensus: UPGRADE_PON_VRF network upgrade (branch id 0x76b809bb); per-slot VRF eligibility; contextual VRF-proof verification; lowest-VRF fork-choice tie-break.
Production / validation / relay — every site that builds, hashes, validates, relays, or reads a block header now handles the committed VRF output: CreateNewBlock, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex, and the cmpheaders compact-header serialization.
Networking: PROTOCOL_VERSION and UPGRADE_PON_VRF.nProtocolVersion bumped to 170022 so VRF-capable peers are distinguishable and peers below 170022 are rejected once the upgrade activates.

Testing

25 gtests pass — ECVRF prove/verify round-trip, output-committed/proof-excluded serialization, and fork-choice convergence.
Live local testnet validation — confirmed fluxnodes minting v101 VRF blocks accepted under the real (unbypassed) verifier; multi-node header sync and concurrent minting validated up to ~100 nodes; orphan-rate measurements confirm the difficulty target (eligible-nodes-per-slot) is the dominant orphan lever.

Activation & deployment notes

Ships inert: PON_VRF is NO_ACTIVATION_HEIGHT on mainnet and regtest. Testnet is set to a PLACEHOLDER (9999999) — replace with a scheduled testnet height (above the current tip) before tagging.
Protocol-version gating ensures a clean upgrade window: nothing changes until activation, after which sub-170022 peers are dropped.

Follow-ups (not blocking testnet)

Validate on the live testnet that the difficulty retarget keeps eligible-nodes-per-slot small at scale (the orphan driver).
Optional networking optimization: priority-aware "announce-first" relay to further cut orphans at large N.
Constant-time / side-channel audit of the VRF module before mainnet.

Commits (in order)

Crypto foundation

ca735752a — Add ECVRF-secp256k1 module (vendored from aergo) + ecvrf C++ boundary

Consensus core

474a792ff — PON: VRF leader election (consensus) — closes leader-election grinding
d418724fe — PON-VRF: deterministic fork choice via lowest-VRF tie-break

Tests

5c393c205 — PON-VRF: extract ComparePonForkChoice + gtests for fork-choice convergence
51a941b8c — PON-VRF: gtests for block serialization + real ECVRF prove/verify path

Completeness fixes (found via live testnet block production)

382c2e0e3 — populate VRF fields in CreateNewBlock so VRF blocks can be produced
8bcd4f1f9 — per-slot eligibility, CheckBlockHeader + compact-header VRF support
52f49d786 — ReadBlockFromDisk must check the VRF output, not GetPONHash
494486f27 — LoadBlockIndex must check the VRF output, not GetPONHash

Deployment

3792a6e81 — PLACEHOLDER testnet activation height — SET BEFORE TAGGING
73fda1b54 — bump protocol version to 170022 for VRF wire format

Vendors the ECVRF-SECP256K1-SHA256-TAI (CFRG VRF draft-05, suite 0xFE) module from aergo/secp256k1-vrf (MIT) into the bundled libsecp256k1 as an optional module (--enable-module-vrf, enabled in the root build), and adds src/crypto/ ecvrf.{h,cpp} as the C++ boundary (ECVRF_Prove/ECVRF_Verify over CKey/CPubKey). This is the cryptographic primitive for the PON VRF leader-election fix that closes the leader-election grinding vulnerability: block eligibility becomes y = VRF(operator_sk, epoch_seed) <= target, which the proposer cannot grind. Verified: builds under Flux's exact secp256k1 flags (--with-bignum=no) and reproduces the published draft-05 test vector byte-for-byte (prove/verify/ proof_to_hash); cross-checked against Witnet vrf-rs and an independent Python reference. Constant-time audit of secret paths still pending before activation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replaces the grindable PON eligibility lottery (GetPONHash over the proposer- chosen prevBlockHash) with VRF-based eligibility, gated by UPGRADE_PON_VRF: eligible(C) <=> y <= target, y = VRF(operator_key, epoch_seed) y is unforgeable (operator secret key) and seeded by a buried block window the proposer did not author (GetEpochSeed), so a producer can no longer shape its own block to win the next lottery. Builds on the ECVRF primitive + ecvrf C++ boundary added in the previous commit. - block.h: PON_VRF_VERSION=101; nodesVrfOutput + nodesVrfProof header fields, committed under SER_GETHASH (covered by the operator signature). - consensus/params.h, upgrades.cpp, chainparams.cpp: UPGRADE_PON_VRF (NO_ACTIVATION_HEIGHT on all networks for now). - pon-fork: IsPONVRFActive(). - pon.cpp: GetEpochSeed (buried-window accumulator); VRF eligibility in CheckPONBlockHeader; proof verification (recomputed beta == nodesVrfOutput) in ContextualCheckPONBlockHeader. - pon-minter.cpp: compute the VRF proof with the operator key; coordinate via a self-computable priority (lower y => shorter delay) since other nodes' VRF outputs are unknowable; set the header fields before signing. Pre-activation blocks use the legacy GetPONHash path unchanged. fluxd builds. NOT yet exercised on a regtest/testnet fork; coordination/liveness and the constant-time audit are pending (see pon-vrf/REVIEW.md). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Makes competing same-height VRF blocks resolve deterministically so the network converges instead of forking back and forth: - CBlockIndexWorkComparator (main.cpp): for PON_VRF blocks at equal work/height, break ties by lowest nodesVrfOutput. The VRF output is un-grindable, so unlike GetPONHash (depends on proposer-chosen nTime, grindable to win ties) an attacker cannot bias which competitor wins. Legacy PON blocks keep the GetPONHash tie-break (mixed-version forks around activation). - block.h: commit only the VRF output to the block hash; exclude the proof (like the signature) — the proof is self-validating against the committed output. - chain.h / txdb.cpp: store nodesVrfOutput in CBlockIndex + CDiskBlockIndex so the comparator can read it and GetBlockHash() recomputes correctly across restarts. The minting-delay coordination (previous commit) is now only orphan reduction; convergence/safety rests on this deterministic comparator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gence Extracts the PON fork-choice tie-break from the anonymous-namespace comparator in main.cpp into a public, testable function ComparePonForkChoice (pon.cpp). The comparator now delegates to it, so the tests exercise the real deployed logic. Adds gtests (test_pon.cpp) verifying the convergence guarantee: - lowest VRF output is preferred (deterministic winner among competitors), - antisymmetric (swap args -> sign flips: all nodes agree), - deterministic (same inputs -> same result), - equal outputs -> undecided (fall back to first-seen). All 22 PONTest cases pass (flux-gtest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds gtests (test_pon.cpp) exercising the VRF block lifecycle with unbypassed crypto: - VrfBlockHeaderSerializationRoundTrip: PON_VRF header serializes/deserializes intact and the hash is stable. - VrfOutputCommittedProofExcludedFromHash: changing the proof does not change the block hash (excluded, like the signature) while changing the VRF output does (committed) — pins the design that lets CBlockIndex store only the 32-byte output. - EcvrfProveVerifyRoundTrip: real ECVRF_Prove -> ECVRF_Verify round trip (the same crypto ContextualCheckPONBlockHeader runs); tampered proof, wrong key, and wrong seed are all rejected; proving is deterministic (RFC 6979). 25 PONTest cases pass (flux-gtest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…roduced Live regtest testing revealed CreateNewBlock assembled a v100 PON block and ran TestBlockValidity on it BEFORE the minter/generate set the VRF fields — so once PON-VRF is active, block production failed with 'bad-pon-...' (version below PON_VRF_VERSION). The header build + validity check were producing/validating a block that could never pass the VRF eligibility rules. Fix: in CreateNewBlock, when PON-VRF is active, set nVersion = PON_VRF_VERSION and compute nodesVrfOutput/nodesVrfProof (via the operator key, or a deterministic placeholder when no key is configured, e.g. regtest generate) BEFORE TestBlockValidity. The minter and the regtest generate RPC now rely on this single authoritative path (generate's redundant post-assembly block removed). Verified on regtest: 'generate' past the PON-VRF activation height produces v101 blocks that pass validation (v100 before activation, v101 after). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…support Three consensus/relay fixes (found via live testnet block production), all the same root cause: the committed VRF output must be handled everywhere a block header is built, hashed, validated, or relayed. 1. Per-slot VRF eligibility (pon.{h,cpp}, pon-minter.cpp, miner.cpp): the VRF input is now H(epoch_seed || slot) (GetPonVrfMessage) instead of just epoch_seed. Without the slot, a node's eligibility was constant for an entire epoch (eligible every slot or none) — no leader rotation, broken liveness. The slot carries only the minor, already-acknowledged ~10-slot future-time grind; the large prevBlockHash/coinbase grind remains eliminated. Minter and ContextualCheckPONBlockHeader use it consistently. 2. CheckBlockHeader (main.cpp): for PON-VRF blocks, check the committed VRF output (nodesVrfOutput) against target, not the legacy GetPONHash. The legacy value is meaningless for VRF blocks and rejected ~half of valid v101 blocks as 'high-hash'. 3. CCompactBlockHeader (block.h): serialize the VRF output/proof for PON-VRF blocks. It was omitted, so a peer decoded a v101 compact header with a null VRF output, recomputed the wrong block hash, and rejected the chain as 'non-continuous cmpheaders sequence' — breaking header sync between nodes. Verified on a local testnet: a confirmed fluxnode mints v101 VRF blocks with clean production (0 high-hash) and a second node syncs the VRF chain (0 non-continuous). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Same root cause as the CheckBlockHeader fix: ReadBlockFromDisk re-validated every PON block against the legacy GetPONHash, so ~half of v101 blocks failed on disk-read with 'Errors in block header' — crashing the node shortly after it minted a VRF block. For PON-VRF blocks, check the committed nodesVrfOutput against target instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sixth and final instance of the same root cause: the on-disk block-index verification (LoadBlockIndex) re-checked every PON block against the legacy GetPONHash, so ~half of stored v101 blocks failed on startup with 'Error loading block database', preventing a node from restarting once it had synced/minted VRF blocks. Use the committed nodesVrfOutput for PON-VRF blocks. All header-eligibility check sites now agree: CheckPONBlockHeader, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sets the testnet PON_VRF upgrade to a placeholder height (9999999) so the activation switch is staged in one obvious place. This is NOT a real schedule. ACTION REQUIRED before tagging a testnet release: - Replace 9999999 with a concrete testnet height comfortably above the current tip, giving the fleet time to upgrade first. Mainnet and regtest remain NO_ACTIVATION_HEIGHT (inert) and are unchanged here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PON-VRF changes the wire serialization: v101 block headers carry the VRF output (committed) + proof, and the cmpheaders compact-header format carries them too. That must be a distinct protocol version so VRF-capable nodes are distinguishable from prior 170021 (compact-headers) nodes and can be gated at activation. - PROTOCOL_VERSION: 170021 -> 170022 (VRF-capable nodes advertise this) - UPGRADE_PON_VRF.nProtocolVersion: 170020 -> 170022 (all networks) so peers below 170022 are rejected once PON_VRF activates, guaranteeing all connected peers speak the VRF wire format. UPGRADE_PON stays 170020 (unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MorningLightMountain713 · 2026-06-10T13:55:47Z

Correct me if I'm wrong here, but in practice, the privkey is per owner, not per node. So if I have 100 nodes, and they win the lottery (which only happens 1/100th the time relative to 100 nodes with seperate keys) you will get a block broadcast storm?

What about this:

y = VRF(operator_key, H(epoch_seed ‖ slot ‖ collateral))

blondfrogs · 2026-06-11T16:10:23Z

Correct me if I'm wrong here, but in practice, the privkey is per owner, not per node. So if I have 100 nodes, and they win the lottery (which only happens 1/100th the time relative to 100 nodes with seperate keys) you will get a block broadcast storm?

What about this:

y = VRF(operator_key, H(epoch_seed ‖ slot ‖ collateral))

You're are absolutely right. This is exactly why we review code. Shipping an update now.

…under shared operator keys The VRF message was H(epoch_seed || slot), keyed by the operator key — but operator keys are shared across an owner's fleet in practice (review finding). With the key alone, N same-key nodes compute the identical VRF output, which: 1. Collapses N lottery draws into one, shrinking the fleet's share of block production N-fold. Minting pays the dev fund (not the minter), so this is a leadership/liveness distortion — block production silently concentrates in uniquely-keyed operators — rather than lost operator revenue. 2. On a win, makes all N nodes eligible with the same VRF-derived priority delay, so they broadcast competing blocks simultaneously (broadcast storm). 3. Voids the lowest-VRF fork-choice tie-break — the outputs are identical, so convergence degrades to first-seen on every such win. The message is now H(epoch_seed || slot || collateral). The collateral outpoint is the canonical per-node identity and is already committed in the header (nodesCollateral) and already used by the verifier to look up the operator pubkey, so verification needs no new wire data. The outpoint is fixed at node registration — before any future epoch seed exists — so it adds no grinding surface beyond the known key-grinding residual. Adds gtest VrfMessagePerNodeUnderSharedOperatorKey: distinct collaterals yield distinct messages and independent verifiable outputs under one shared key, and a proof for node A does not verify as node B. 26 PONTest cases pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

LogPONEligibility predicts per-node eligibility with the legacy GetPONHash formula, which is dead once VRF leader election activates — and under VRF other nodes' eligibility cannot be computed at all (each draw needs that node's secret key). Anything it printed post-activation would be actively misleading to operators debugging minting from logs. Log-only change, gated on the same IsPONVRFActive height check as the consensus paths: no behavior change before activation on any network. Also skips a full confirmed-fluxnode-cache iteration per connected tip after activation. 26 PONTest cases pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MorningLightMountain713 · 2026-06-13T08:24:55Z

PON-VRF — two consensus-level issues at `3acfecf9` (each confirmed with a gtest)

Reviewing this PR I hit two consensus-level issues. I wrote a gtest for each and ran it against this head — both reproduce and confirm the issue. They drop straight into src/gtest/test_pon.cpp (they reuse the existing MakeVrfHeader() / MakeVrfIndex() helpers):

make -C src flux-gtest -j$(nproc)
./src/flux-gtest --gtest_filter='PONTest.Blocker*'

🔴 Issue 1 — VRF proof dropped on the `CBlockIndex` round-trip → header relay breaks post-activation

getheaders rebuilds each header from CBlockIndex via GetBlockHeader(). The index stores nodesVrfOutput (committed to the hash) but not nodesVrfProof (81 bytes, excluded from the hash — see CBlockIndex(const CBlockHeader&) and GetBlockHeader() in chain.h). So every relayed v101 header carries an empty proof; the receiver's CheckPONBlockHeader hard-fails nodesVrfProof.size() != 81 → state.DoS(100) → Misbehaving(100). The honest sender is banned and headers-first sync stalls after activation. Tip-following uses full blocks (which carry the proof), which is why already-synced nodes don't surface it.

Fix: persist nodesVrfProof in CBlockIndex/CDiskBlockIndex and copy it in GetBlockHeader(), mirroring vchBlockSig (stored for exactly this reason). ~81 bytes per index entry.

// Goes RED on this head (proof comes back size 0); goes green once the proof is
// persisted in CBlockIndex/CDiskBlockIndex and copied in GetBlockHeader().
TEST_F(PONTest, Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay) {
    CBlockHeader original = MakeVrfHeader();            // carries a full 81-byte proof
    ASSERT_EQ(original.nodesVrfProof.size(), 81u);

    CBlockIndex index(original);                        // what a relaying node stores
    CBlockHeader relayed = index.GetBlockHeader();      // what getheaders reconstructs and sends

    // Output survives (stored + committed to the hash) -> the loss is invisible to hash checks.
    EXPECT_EQ(relayed.nodesVrfOutput, original.nodesVrfOutput);

    // ...but the 81-byte proof is gone -> a relayed header is rejected on the size check.
    EXPECT_EQ(relayed.nodesVrfProof.size(), 81u)
        << "CBlockIndex drops nodesVrfProof -> relayed v101 headers fail CheckPONBlockHeader "
           "size()!=81 -> Misbehaving(100), header sync stalls.";
}

Observed:

[ RUN      ] PONTest.Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay
      Expected: relayed.nodesVrfProof.size()  Which is: 0
      To be equal to: 81u                     Which is: 81
[  FAILED  ] PONTest.Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay

The failure (proof size 0, not 81) is the confirmation: the proof does not survive the index round-trip.

🔴 Issue 2 — pre-activation v101 block with `nodesVrfOutput = 0` steals fork choice

ComparePonForkChoice branches on the block version (>= PON_VRF_VERSION), not on the activation height, and trusts nodesVrfOutput. Nothing rejects a v101 block before UPGRADE_PON_VRF, and pre-activation the output is never validated. Lowest output wins and 0 is the minimum — so a legacy-eligible node can mint a block stamped nVersion = 101 / nodesVrfOutput = 0 and beat every honest v100 block in a same-height tie. No VRF key or proof required. This is live the moment the code ships, independent of any activation height.

Fix: reject nVersion >= PON_VRF_VERSION when !IsPONVRFActive(nHeight) (mirror of the version-floor check in the VRF branch of CheckPONBlockHeader). Belt-and-braces: also gate the VRF tie-break in ComparePonForkChoice on activation height. The same version-not-height asymmetry exists at the other read sites (CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndexGuts).

// PASSES on this head -> the attacker wins, confirming the hole. After the fix this
// scenario can't arise; this would become an acceptance-rejection test at that point.
TEST_F(PONTest, Blocker2_PreActivationV101StealsForkChoice) {
    CBlockIndex honest;
    honest.nVersion        = CBlockHeader::PON_VERSION;          // 100 (legacy)
    honest.nHeight         = 100;
    honest.nTime           = 1700000000;
    honest.nBits           = 0x1d00ffff;
    honest.nodesCollateral = COutPoint(uint256S("0xfeedface"), 0);
    ASSERT_NE(GetPONHash(honest.GetBlockHeader()), uint256());   // real, non-zero legacy score

    // v101 block, same height, output forced to the minimum. Pre-activation nothing
    // rejects it and nothing validates the output.
    CBlockIndex attacker = MakeVrfIndex(100, uint256() /* 0x000...0 */);

    EXPECT_GT(ComparePonForkChoice(&honest, &attacker), 0)      // attacker (b) preferred
        << "v101 + nodesVrfOutput=0 steals the fork-choice tie from an honest v100 block "
           "pre-activation (comparator gated on version, not height).";
    EXPECT_LT(ComparePonForkChoice(&attacker, &honest), 0);     // wins regardless of order
}

Observed: [ OK ] PONTest.Blocker2_PreActivationV101StealsForkChoice — the v101/output=0 block wins the tie in both argument orders, confirming the issue.

Suggested fix order

Issue 2 — exploitable on mainnet immediately on ship.
Issue 1 — breaks IBD post-activation (fix before any activation height).

Both tests are confirmation tests against the current head: Issue 1's flips to green once the proof is persisted, and Issue 2's would become an acceptance-rejection test once the version/height gate is added.

blondfrogs and others added 11 commits June 8, 2026 10:49

blondfrogs and others added 2 commits June 11, 2026 11:00

blondfrogs force-pushed the feat/pon-vrf-integration branch from 73222da to 3acfecf Compare June 11, 2026 17:18

MorningLightMountain713 mentioned this pull request Jun 12, 2026

Bound fluxnode memory growth: file-backed block index, header pruning, jemalloc (depends on #284) #286

Open

MorningLightMountain713 mentioned this pull request Jun 15, 2026

Tor v3 hidden-service support (BIP155) with fluxnode authentication (depends on #284) #287

Open

MorningLightMountain713 mentioned this pull request Jul 2, 2026

Treat NO_ACTIVATION_HEIGHT as never active in IsPONActive #288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/pon vrf integration#284

Feat/pon vrf integration#284
blondfrogs wants to merge 13 commits into
RunOnFlux:masterfrom
blondfrogs:feat/pon-vrf-integration

blondfrogs commented Jun 9, 2026

Uh oh!

MorningLightMountain713 commented Jun 10, 2026

Uh oh!

blondfrogs commented Jun 11, 2026

Uh oh!

MorningLightMountain713 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

blondfrogs commented Jun 9, 2026

PON-VRF: VRF leader election (closes the leader-election grinding vulnerability)

Summary

Problem

Solution

What's in this PR (by area)

Testing

Activation & deployment notes

Follow-ups (not blocking testnet)

Commits (in order)

Uh oh!

MorningLightMountain713 commented Jun 10, 2026

Uh oh!

blondfrogs commented Jun 11, 2026

Uh oh!

MorningLightMountain713 commented Jun 13, 2026

PON-VRF — two consensus-level issues at 3acfecf9 (each confirmed with a gtest)

🔴 Issue 1 — VRF proof dropped on the CBlockIndex round-trip → header relay breaks post-activation

🔴 Issue 2 — pre-activation v101 block with nodesVrfOutput = 0 steals fork choice

Suggested fix order

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PON-VRF — two consensus-level issues at `3acfecf9` (each confirmed with a gtest)

🔴 Issue 1 — VRF proof dropped on the `CBlockIndex` round-trip → header relay breaks post-activation

🔴 Issue 2 — pre-activation v101 block with `nodesVrfOutput = 0` steals fork choice