Skip to content

Found-008: LockNotifyHandler::notify_waiters() drops lock events arriving in wait_for_proof's check/await gap (concurrent asset-lock builds stall on FinalityTimeout) #3641

@lklimek

Description

@lklimek

User Story

As a Platform application developer using rs-platform-wallet to fund concurrent asset-lock-based identity operations (e.g. registering multiple identities in parallel, or topping up several identities at once), I need AssetLockManager::wait_for_proof to reliably wake when an IS-lock or ChainLock event arrives at LockNotifyHandler, so that concurrent asset-lock builds from the same wallet complete without stalling on FinalityTimeout.

Today, any caller driving N ≥ 2 concurrent asset-lock builds against a single wallet hits a deterministic missed-wakeup race that turns a 5-second proof wait into a 5-minute FinalityTimeout.

Current Behavior

Failing e2e pin at packages/rs-platform-wallet/tests/e2e/cases/al_001_concurrent_asset_lock_builds.rs:299 (AL-001 in tests/e2e/TEST_SPEC.md):

  • 2 concurrent top_up_identity_with_funding tasks against the same Core-funded wallet
  • Both tasks acquire UTXOs cleanly (the historical coin-selection race is closed by PR fix(platform-wallet): auto_select_inputs honors Σ inputs == Σ outputs #3554 + fix: case-insensitive .dash + atomic state on broadcast failure #3585's OutpointReservations)
  • Both tasks broadcast their asset-lock transactions
  • Task 1's wait_for_proof enters its check-then-await loop, observes no proof yet, drops the wallet-manager read lock, and proceeds to tokio::select! { _ = self.lock_notify.notified() => ..., _ = sleep(remaining) => ... }
  • The IS-lock event for task 1's transaction arrives at LockNotifyHandler::on_sync_event between the state check and the registration of the notified() future
  • Notify::notify_waiters() runs against an empty waker list and discards the event — it does not store a permit
  • Task 1's notified() future then registers for an event that already fired
  • Task 1 sleeps until FinalityTimeout (no second IS-lock event for the same transaction)

Failure fingerprint FinalityTimeout(<txid>) is identical across v48, v49, v50, v51 — no run-to-run drift. Reproduces deterministically under 14-thread test parallelism.

Reproduction

On branch feat/rs-platform-wallet-e2e (PR #3549):

cargo test -p platform-wallet --test e2e \
  -- --include-ignored --test-threads=14 --nocapture \
  al_001_concurrent_asset_lock_builds

Requires PLATFORM_WALLET_E2E_BANK_CORE_GATE and the Wave-E Core-funded test wallet harness. Expected: panic at al_001_concurrent_asset_lock_builds.rs:299 with task 1 panicked: FinalityTimeout(<txid>).

Root Cause

Bug site: packages/rs-platform-wallet/src/wallet/asset_lock/lock_notify_handler.rs:30

impl EventHandler for LockNotifyHandler {
    fn on_sync_event(&self, event: &dash_spv::sync::SyncEvent) {
        if matches!(event, ...InstantLockReceived... | ...ChainLockReceived...) {
            self.notify.notify_waiters();   // ← drops events for late-registering waiters
        }
    }
}

Affected caller: packages/rs-platform-wallet/src/wallet/asset_lock/sync/proof.rs:367-418 (wait_for_proof). The structural problem in the check-then-await loop:

  1. loop { (line 367)
  2. Read state under wallet_manager.read().await lock (lines 371-380)
  3. Match record.context for InstantSend / InChainLockedBlock (lines 384-403)
  4. Drop the read lock; compute remaining deadline (line 406)
  5. tokio::select! — register self.lock_notify.notified() future (line 413)

If an IS-lock or ChainLock event fires between step 3 (state check finished, no proof yet) and step 5 (future registered with Notify), notify_waiters() runs through the empty waker list and the wakeup is permanently lost. There is no second event for the same transaction.

This is the canonical mis-use of tokio::sync::Notify:

  • Notify::notify_waiters() — wakes only currently-registered waiters; does NOT store a permit
  • Notify::notify_one() — stores a pending permit when no waiter is registered; the next notified().await claims it immediately

The same race surface appears at proof.rs:324 (the chainlock variant) — same notified().await => continue pattern, same bug.

Affected Versions

  • v3.1-dev HEAD (today) — confirmed
  • All prior versions since LockNotifyHandler was introduced (search returned 0 issues / 0 PRs touching notify_waiters or LockNotifyHandler)

Cross-References

Severity

HIGH — asset-lock proof flow is on the critical path of every identity registration and every identity top-up. A single missed wakeup converts a 5-second proof wait into a 5-minute hang. Any production caller (Dash Mobile, third-party apps) that drives concurrent identity operations from the same wallet hits this.

Notes

The fix is well-understood (the spec discusses Option A: notify_waiters()notify_one() with permit storage; Option B: replace Notify with tokio::sync::broadcast). Per repo conventions for issue filing, no fix proposal in this issue body — leaving the implementation choice to the implementer.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions