Skip to content

Feat/pon vrf integration#284

Open
blondfrogs wants to merge 13 commits into
RunOnFlux:masterfrom
blondfrogs:feat/pon-vrf-integration
Open

Feat/pon vrf integration#284
blondfrogs wants to merge 13 commits into
RunOnFlux:masterfrom
blondfrogs:feat/pon-vrf-integration

Conversation

@blondfrogs

Copy link
Copy Markdown
Member

PON-VRF: VRF leader election (closes the leader-election grinding vulnerability)

Summary

Replaces Proof-of-Node's grindable leader election with a verifiable random function (VRF) election that reuses the existing fluxnode operator key. Leadership is now an un-grindable, verifiable draw, and ties converge deterministically. Gated behind a new UPGRADE_PON_VRF network upgrade, so it ships inert until an activation height is scheduled.

Problem

PON leader eligibility was H(collateral, prevBlockHash, slot) ≤ target. Because that hash depends on block content the proposer controls, a proposer can grind its inputs to bias who is eligible to lead — there is no cooldown and no consensus rank check. This lets a sufficiently resourced node disproportionately influence leader selection.

Solution

  • VRF eligibility: y = VRF(operator_key, H(epoch_seed ‖ slot)); eligible iff y ≤ target.
    • epoch_seed is derived from a buried block window (beyond the reorg horizon) that the current proposer could not have authored → not grindable.
    • The slot is mixed in so eligibility is a fresh draw each slot (leader rotation / liveness).
  • Reuses the existing secp256k1 operator key — no new key material for operators.
  • The VRF output is committed to the block hash (so it is signed and immutable); the proof is carried but excluded from the hash (self-validating against the output).
  • Deterministic fork choice: at equal work/height, the lowest VRF output wins, so the network converges.

What's in this PR (by area)

  • Crypto: ECVRF (ECVRF-SECP256K1-SHA256-TAI) via a vendored secp256k1-vrf module + src/crypto/ecvrf.{h,cpp}.
  • Block format: PON_VRF_VERSION = 101; nodesVrfOutput (committed) + nodesVrfProof (excluded from hash).
  • Consensus: UPGRADE_PON_VRF network upgrade (branch id 0x76b809bb); per-slot VRF eligibility; contextual VRF-proof verification; lowest-VRF fork-choice tie-break.
  • Production / validation / relay — every site that builds, hashes, validates, relays, or reads a block header now handles the committed VRF output: CreateNewBlock, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex, and the cmpheaders compact-header serialization.
  • Networking: PROTOCOL_VERSION and UPGRADE_PON_VRF.nProtocolVersion bumped to 170022 so VRF-capable peers are distinguishable and peers below 170022 are rejected once the upgrade activates.

Testing

  • 25 gtests pass — ECVRF prove/verify round-trip, output-committed/proof-excluded serialization, and fork-choice convergence.
  • Live local testnet validation — confirmed fluxnodes minting v101 VRF blocks accepted under the real (unbypassed) verifier; multi-node header sync and concurrent minting validated up to ~100 nodes; orphan-rate measurements confirm the difficulty target (eligible-nodes-per-slot) is the dominant orphan lever.

Activation & deployment notes

  • Ships inert: PON_VRF is NO_ACTIVATION_HEIGHT on mainnet and regtest. Testnet is set to a PLACEHOLDER (9999999)replace with a scheduled testnet height (above the current tip) before tagging.
  • Protocol-version gating ensures a clean upgrade window: nothing changes until activation, after which sub-170022 peers are dropped.

Follow-ups (not blocking testnet)

  • Validate on the live testnet that the difficulty retarget keeps eligible-nodes-per-slot small at scale (the orphan driver).
  • Optional networking optimization: priority-aware "announce-first" relay to further cut orphans at large N.
  • Constant-time / side-channel audit of the VRF module before mainnet.

Commits (in order)

Crypto foundation

  • ca735752a — Add ECVRF-secp256k1 module (vendored from aergo) + ecvrf C++ boundary

Consensus core

  • 474a792ff — PON: VRF leader election (consensus) — closes leader-election grinding
  • d418724fe — PON-VRF: deterministic fork choice via lowest-VRF tie-break

Tests

  • 5c393c205 — PON-VRF: extract ComparePonForkChoice + gtests for fork-choice convergence
  • 51a941b8c — PON-VRF: gtests for block serialization + real ECVRF prove/verify path

Completeness fixes (found via live testnet block production)

  • 382c2e0e3 — populate VRF fields in CreateNewBlock so VRF blocks can be produced
  • 8bcd4f1f9 — per-slot eligibility, CheckBlockHeader + compact-header VRF support
  • 52f49d786ReadBlockFromDisk must check the VRF output, not GetPONHash
  • 494486f27LoadBlockIndex must check the VRF output, not GetPONHash

Deployment

  • 3792a6e81 — PLACEHOLDER testnet activation height — SET BEFORE TAGGING
  • 73fda1b54 — bump protocol version to 170022 for VRF wire format

blondfrogs and others added 11 commits June 8, 2026 10:49
Vendors the ECVRF-SECP256K1-SHA256-TAI (CFRG VRF draft-05, suite 0xFE) module
from aergo/secp256k1-vrf (MIT) into the bundled libsecp256k1 as an optional
module (--enable-module-vrf, enabled in the root build), and adds src/crypto/
ecvrf.{h,cpp} as the C++ boundary (ECVRF_Prove/ECVRF_Verify over CKey/CPubKey).

This is the cryptographic primitive for the PON VRF leader-election fix that
closes the leader-election grinding vulnerability: block eligibility becomes
y = VRF(operator_sk, epoch_seed) <= target, which the proposer cannot grind.

Verified: builds under Flux's exact secp256k1 flags (--with-bignum=no) and
reproduces the published draft-05 test vector byte-for-byte (prove/verify/
proof_to_hash); cross-checked against Witnet vrf-rs and an independent Python
reference. Constant-time audit of secret paths still pending before activation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the grindable PON eligibility lottery (GetPONHash over the proposer-
chosen prevBlockHash) with VRF-based eligibility, gated by UPGRADE_PON_VRF:

    eligible(C)  <=>  y <= target,  y = VRF(operator_key, epoch_seed)

y is unforgeable (operator secret key) and seeded by a buried block window the
proposer did not author (GetEpochSeed), so a producer can no longer shape its
own block to win the next lottery. Builds on the ECVRF primitive + ecvrf C++
boundary added in the previous commit.

- block.h: PON_VRF_VERSION=101; nodesVrfOutput + nodesVrfProof header fields,
  committed under SER_GETHASH (covered by the operator signature).
- consensus/params.h, upgrades.cpp, chainparams.cpp: UPGRADE_PON_VRF
  (NO_ACTIVATION_HEIGHT on all networks for now).
- pon-fork: IsPONVRFActive().
- pon.cpp: GetEpochSeed (buried-window accumulator); VRF eligibility in
  CheckPONBlockHeader; proof verification (recomputed beta == nodesVrfOutput)
  in ContextualCheckPONBlockHeader.
- pon-minter.cpp: compute the VRF proof with the operator key; coordinate via a
  self-computable priority (lower y => shorter delay) since other nodes' VRF
  outputs are unknowable; set the header fields before signing.

Pre-activation blocks use the legacy GetPONHash path unchanged. fluxd builds.
NOT yet exercised on a regtest/testnet fork; coordination/liveness and the
constant-time audit are pending (see pon-vrf/REVIEW.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes competing same-height VRF blocks resolve deterministically so the network
converges instead of forking back and forth:

- CBlockIndexWorkComparator (main.cpp): for PON_VRF blocks at equal work/height,
  break ties by lowest nodesVrfOutput. The VRF output is un-grindable, so unlike
  GetPONHash (depends on proposer-chosen nTime, grindable to win ties) an attacker
  cannot bias which competitor wins. Legacy PON blocks keep the GetPONHash
  tie-break (mixed-version forks around activation).
- block.h: commit only the VRF output to the block hash; exclude the proof (like
  the signature) — the proof is self-validating against the committed output.
- chain.h / txdb.cpp: store nodesVrfOutput in CBlockIndex + CDiskBlockIndex so the
  comparator can read it and GetBlockHash() recomputes correctly across restarts.

The minting-delay coordination (previous commit) is now only orphan reduction;
convergence/safety rests on this deterministic comparator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gence

Extracts the PON fork-choice tie-break from the anonymous-namespace comparator in
main.cpp into a public, testable function ComparePonForkChoice (pon.cpp). The
comparator now delegates to it, so the tests exercise the real deployed logic.

Adds gtests (test_pon.cpp) verifying the convergence guarantee:
- lowest VRF output is preferred (deterministic winner among competitors),
- antisymmetric (swap args -> sign flips: all nodes agree),
- deterministic (same inputs -> same result),
- equal outputs -> undecided (fall back to first-seen).

All 22 PONTest cases pass (flux-gtest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds gtests (test_pon.cpp) exercising the VRF block lifecycle with unbypassed crypto:
- VrfBlockHeaderSerializationRoundTrip: PON_VRF header serializes/deserializes intact
  and the hash is stable.
- VrfOutputCommittedProofExcludedFromHash: changing the proof does not change the block
  hash (excluded, like the signature) while changing the VRF output does (committed) —
  pins the design that lets CBlockIndex store only the 32-byte output.
- EcvrfProveVerifyRoundTrip: real ECVRF_Prove -> ECVRF_Verify round trip (the same crypto
  ContextualCheckPONBlockHeader runs); tampered proof, wrong key, and wrong seed are all
  rejected; proving is deterministic (RFC 6979).

25 PONTest cases pass (flux-gtest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roduced

Live regtest testing revealed CreateNewBlock assembled a v100 PON block and ran
TestBlockValidity on it BEFORE the minter/generate set the VRF fields — so once
PON-VRF is active, block production failed with 'bad-pon-...' (version below
PON_VRF_VERSION). The header build + validity check were producing/validating a
block that could never pass the VRF eligibility rules.

Fix: in CreateNewBlock, when PON-VRF is active, set nVersion = PON_VRF_VERSION and
compute nodesVrfOutput/nodesVrfProof (via the operator key, or a deterministic
placeholder when no key is configured, e.g. regtest generate) BEFORE
TestBlockValidity. The minter and the regtest generate RPC now rely on this single
authoritative path (generate's redundant post-assembly block removed).

Verified on regtest: 'generate' past the PON-VRF activation height produces v101
blocks that pass validation (v100 before activation, v101 after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…support

Three consensus/relay fixes (found via live testnet block production), all the same
root cause: the committed VRF output must be handled everywhere a block header is
built, hashed, validated, or relayed.

1. Per-slot VRF eligibility (pon.{h,cpp}, pon-minter.cpp, miner.cpp): the VRF input
   is now H(epoch_seed || slot) (GetPonVrfMessage) instead of just epoch_seed. Without
   the slot, a node's eligibility was constant for an entire epoch (eligible every slot
   or none) — no leader rotation, broken liveness. The slot carries only the minor,
   already-acknowledged ~10-slot future-time grind; the large prevBlockHash/coinbase
   grind remains eliminated. Minter and ContextualCheckPONBlockHeader use it consistently.

2. CheckBlockHeader (main.cpp): for PON-VRF blocks, check the committed VRF output
   (nodesVrfOutput) against target, not the legacy GetPONHash. The legacy value is
   meaningless for VRF blocks and rejected ~half of valid v101 blocks as 'high-hash'.

3. CCompactBlockHeader (block.h): serialize the VRF output/proof for PON-VRF blocks.
   It was omitted, so a peer decoded a v101 compact header with a null VRF output,
   recomputed the wrong block hash, and rejected the chain as 'non-continuous
   cmpheaders sequence' — breaking header sync between nodes.

Verified on a local testnet: a confirmed fluxnode mints v101 VRF blocks with clean
production (0 high-hash) and a second node syncs the VRF chain (0 non-continuous).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same root cause as the CheckBlockHeader fix: ReadBlockFromDisk re-validated every PON
block against the legacy GetPONHash, so ~half of v101 blocks failed on disk-read with
'Errors in block header' — crashing the node shortly after it minted a VRF block. For
PON-VRF blocks, check the committed nodesVrfOutput against target instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth and final instance of the same root cause: the on-disk block-index verification
(LoadBlockIndex) re-checked every PON block against the legacy GetPONHash, so ~half of
stored v101 blocks failed on startup with 'Error loading block database', preventing a
node from restarting once it had synced/minted VRF blocks. Use the committed
nodesVrfOutput for PON-VRF blocks. All header-eligibility check sites now agree:
CheckPONBlockHeader, CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sets the testnet PON_VRF upgrade to a placeholder height (9999999) so the activation
switch is staged in one obvious place. This is NOT a real schedule.

ACTION REQUIRED before tagging a testnet release:
  - Replace 9999999 with a concrete testnet height comfortably above the current tip,
    giving the fleet time to upgrade first.
Mainnet and regtest remain NO_ACTIVATION_HEIGHT (inert) and are unchanged here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PON-VRF changes the wire serialization: v101 block headers carry the VRF output
(committed) + proof, and the cmpheaders compact-header format carries them too. That
must be a distinct protocol version so VRF-capable nodes are distinguishable from
prior 170021 (compact-headers) nodes and can be gated at activation.

- PROTOCOL_VERSION: 170021 -> 170022 (VRF-capable nodes advertise this)
- UPGRADE_PON_VRF.nProtocolVersion: 170020 -> 170022 (all networks) so peers below
  170022 are rejected once PON_VRF activates, guaranteeing all connected peers speak
  the VRF wire format. UPGRADE_PON stays 170020 (unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MorningLightMountain713

Copy link
Copy Markdown
Contributor

Correct me if I'm wrong here, but in practice, the privkey is per owner, not per node. So if I have 100 nodes, and they win the lottery (which only happens 1/100th the time relative to 100 nodes with seperate keys) you will get a block broadcast storm?

What about this:

y = VRF(operator_key, H(epoch_seed ‖ slot ‖ collateral))

@blondfrogs

Copy link
Copy Markdown
Member Author

Correct me if I'm wrong here, but in practice, the privkey is per owner, not per node. So if I have 100 nodes, and they win the lottery (which only happens 1/100th the time relative to 100 nodes with seperate keys) you will get a block broadcast storm?

What about this:

y = VRF(operator_key, H(epoch_seed ‖ slot ‖ collateral))

You're are absolutely right. This is exactly why we review code. Shipping an update now.

blondfrogs and others added 2 commits June 11, 2026 11:00
…under shared operator keys

The VRF message was H(epoch_seed || slot), keyed by the operator key — but
operator keys are shared across an owner's fleet in practice (review finding).
With the key alone, N same-key nodes compute the identical VRF output, which:

1. Collapses N lottery draws into one, shrinking the fleet's share of block
   production N-fold. Minting pays the dev fund (not the minter), so this is
   a leadership/liveness distortion — block production silently concentrates
   in uniquely-keyed operators — rather than lost operator revenue.
2. On a win, makes all N nodes eligible with the same VRF-derived priority
   delay, so they broadcast competing blocks simultaneously (broadcast storm).
3. Voids the lowest-VRF fork-choice tie-break — the outputs are identical, so
   convergence degrades to first-seen on every such win.

The message is now H(epoch_seed || slot || collateral). The collateral outpoint
is the canonical per-node identity and is already committed in the header
(nodesCollateral) and already used by the verifier to look up the operator
pubkey, so verification needs no new wire data. The outpoint is fixed at node
registration — before any future epoch seed exists — so it adds no grinding
surface beyond the known key-grinding residual.

Adds gtest VrfMessagePerNodeUnderSharedOperatorKey: distinct collaterals yield
distinct messages and independent verifiable outputs under one shared key, and
a proof for node A does not verify as node B. 26 PONTest cases pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LogPONEligibility predicts per-node eligibility with the legacy GetPONHash
formula, which is dead once VRF leader election activates — and under VRF
other nodes' eligibility cannot be computed at all (each draw needs that
node's secret key). Anything it printed post-activation would be actively
misleading to operators debugging minting from logs.

Log-only change, gated on the same IsPONVRFActive height check as the
consensus paths: no behavior change before activation on any network. Also
skips a full confirmed-fluxnode-cache iteration per connected tip after
activation. 26 PONTest cases pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MorningLightMountain713

Copy link
Copy Markdown
Contributor

PON-VRF — two consensus-level issues at 3acfecf9 (each confirmed with a gtest)

Reviewing this PR I hit two consensus-level issues. I wrote a gtest for each and ran it against this head — both reproduce and confirm the issue. They drop straight into src/gtest/test_pon.cpp (they reuse the existing MakeVrfHeader() / MakeVrfIndex() helpers):

make -C src flux-gtest -j$(nproc)
./src/flux-gtest --gtest_filter='PONTest.Blocker*'

🔴 Issue 1 — VRF proof dropped on the CBlockIndex round-trip → header relay breaks post-activation

getheaders rebuilds each header from CBlockIndex via GetBlockHeader(). The index stores nodesVrfOutput (committed to the hash) but not nodesVrfProof (81 bytes, excluded from the hash — see CBlockIndex(const CBlockHeader&) and GetBlockHeader() in chain.h). So every relayed v101 header carries an empty proof; the receiver's CheckPONBlockHeader hard-fails nodesVrfProof.size() != 81state.DoS(100)Misbehaving(100). The honest sender is banned and headers-first sync stalls after activation. Tip-following uses full blocks (which carry the proof), which is why already-synced nodes don't surface it.

Fix: persist nodesVrfProof in CBlockIndex/CDiskBlockIndex and copy it in GetBlockHeader(), mirroring vchBlockSig (stored for exactly this reason). ~81 bytes per index entry.

// Goes RED on this head (proof comes back size 0); goes green once the proof is
// persisted in CBlockIndex/CDiskBlockIndex and copied in GetBlockHeader().
TEST_F(PONTest, Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay) {
    CBlockHeader original = MakeVrfHeader();            // carries a full 81-byte proof
    ASSERT_EQ(original.nodesVrfProof.size(), 81u);

    CBlockIndex index(original);                        // what a relaying node stores
    CBlockHeader relayed = index.GetBlockHeader();      // what getheaders reconstructs and sends

    // Output survives (stored + committed to the hash) -> the loss is invisible to hash checks.
    EXPECT_EQ(relayed.nodesVrfOutput, original.nodesVrfOutput);

    // ...but the 81-byte proof is gone -> a relayed header is rejected on the size check.
    EXPECT_EQ(relayed.nodesVrfProof.size(), 81u)
        << "CBlockIndex drops nodesVrfProof -> relayed v101 headers fail CheckPONBlockHeader "
           "size()!=81 -> Misbehaving(100), header sync stalls.";
}

Observed:

[ RUN      ] PONTest.Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay
      Expected: relayed.nodesVrfProof.size()  Which is: 0
      To be equal to: 81u                     Which is: 81
[  FAILED  ] PONTest.Blocker1_VrfProofDroppedByIndexBreaksHeaderRelay

The failure (proof size 0, not 81) is the confirmation: the proof does not survive the index round-trip.

🔴 Issue 2 — pre-activation v101 block with nodesVrfOutput = 0 steals fork choice

ComparePonForkChoice branches on the block version (>= PON_VRF_VERSION), not on the activation height, and trusts nodesVrfOutput. Nothing rejects a v101 block before UPGRADE_PON_VRF, and pre-activation the output is never validated. Lowest output wins and 0 is the minimum — so a legacy-eligible node can mint a block stamped nVersion = 101 / nodesVrfOutput = 0 and beat every honest v100 block in a same-height tie. No VRF key or proof required. This is live the moment the code ships, independent of any activation height.

Fix: reject nVersion >= PON_VRF_VERSION when !IsPONVRFActive(nHeight) (mirror of the version-floor check in the VRF branch of CheckPONBlockHeader). Belt-and-braces: also gate the VRF tie-break in ComparePonForkChoice on activation height. The same version-not-height asymmetry exists at the other read sites (CheckBlockHeader, ReadBlockFromDisk, LoadBlockIndexGuts).

// PASSES on this head -> the attacker wins, confirming the hole. After the fix this
// scenario can't arise; this would become an acceptance-rejection test at that point.
TEST_F(PONTest, Blocker2_PreActivationV101StealsForkChoice) {
    CBlockIndex honest;
    honest.nVersion        = CBlockHeader::PON_VERSION;          // 100 (legacy)
    honest.nHeight         = 100;
    honest.nTime           = 1700000000;
    honest.nBits           = 0x1d00ffff;
    honest.nodesCollateral = COutPoint(uint256S("0xfeedface"), 0);
    ASSERT_NE(GetPONHash(honest.GetBlockHeader()), uint256());   // real, non-zero legacy score

    // v101 block, same height, output forced to the minimum. Pre-activation nothing
    // rejects it and nothing validates the output.
    CBlockIndex attacker = MakeVrfIndex(100, uint256() /* 0x000...0 */);

    EXPECT_GT(ComparePonForkChoice(&honest, &attacker), 0)      // attacker (b) preferred
        << "v101 + nodesVrfOutput=0 steals the fork-choice tie from an honest v100 block "
           "pre-activation (comparator gated on version, not height).";
    EXPECT_LT(ComparePonForkChoice(&attacker, &honest), 0);     // wins regardless of order
}

Observed: [ OK ] PONTest.Blocker2_PreActivationV101StealsForkChoice — the v101/output=0 block wins the tie in both argument orders, confirming the issue.

Suggested fix order

  1. Issue 2 — exploitable on mainnet immediately on ship.
  2. Issue 1 — breaks IBD post-activation (fix before any activation height).

Both tests are confirmation tests against the current head: Issue 1's flips to green once the proof is persisted, and Issue 2's would become an acceptance-rejection test once the version/height gate is added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants