This document captures the threat model the package is designed to resist, the assumptions it places on the integrator, and the review history that has shaped the current implementation. If you are wiring bittensor-auth into a production service, skim the Integrator requirements section first.
Email team@oroagents.com with a clear description of the issue, a reproduction, and an impact assessment. Please do not open a public GitHub issue for exploitable findings. We aim to acknowledge within two business days and to coordinate a fix and disclosure timeline with you before publication.
The package provides two authentication primitives with distinct threat profiles:
-
Per-request signing — every authenticated request carries
X-Hotkey,X-Timestamp,X-Nonce,X-Signatureheaders. The signature is an SR25519 signature over{hotkey}:{timestamp}:{nonce}(default) produced by the hotkey's keypair. The server verifies the signature, the timestamp's skew, and the nonce's uniqueness under the hotkey. -
Challenge / session — the client requests a challenge, signs it once, and exchanges the signed challenge for a short-lived Bearer token. Subsequent requests present only the Bearer token.
Threats the package is designed to resist:
| Threat | How it's resisted |
|---|---|
| Signature forgery | SR25519 verification via bittensor.Keypair.verify; no homegrown crypto. |
| Replay of a signed request | Nonce registered under (hotkey, nonce) with TTL ≥ 2 × timestamp_skew_seconds (the full two-sided skew window); atomic set_if_not_exists on the cache backend. A smaller TTL is refused at NonceTracker construction. |
| Replay outside the skew window | validate_timestamp rejects anything outside ±timestamp_skew_seconds. |
| Cross-hotkey challenge reuse | The challenge is stored server-side indexed by its nonce; the /session endpoint rejects if the claimed hotkey doesn't match the stored challenge's owner. |
| One-time-use challenges | Atomic getdel against the cache — two concurrent /session calls for the same challenge produce exactly one success. |
| Cross-role ban evasion via stolen session | When a hotkey is banned, sessions for that hotkey are bulk-revoked via atomic smembers_and_delete. See the Residual race note below. |
| Stale session after deregistration / permit change | When recheck_registration_on_session=True (default), Bearer-token auth re-queries the metagraph cache each request; deregistered hotkeys are rejected immediately. |
| Stale metagraph during chain partition | MetagraphCache refuses to serve snapshots older than metagraph_max_age_seconds (default 4× the refresh interval). Queries fail closed under prolonged chain outages. |
| Malformed input DoS | Hotkey format, timestamp, nonce length, and signature hex are all bounded before reaching SR25519 verify. Oversized nonces are rejected before they hit the cache. |
Threats not handled by the package — you must layer defenses:
| Threat | Where it lives |
|---|---|
| Request-body tampering by a TLS-internal adversary | See What the signature binds to below. The default signing message covers only (hotkey, timestamp, nonce), so an attacker who can observe and modify a signed request in flight (compromised TLS-terminating proxy, logging sidecar, traffic-replay tool) can swap the body and the signature still verifies. Mitigate by (a) ensuring TLS terminates at your application layer — not at an untrusted proxy — and/or (b) applying application-level HMAC/signing on sensitive endpoints until the opt-in request binding lands (tracked for 0.2.0). |
| Cross-endpoint replay within the skew window | A captured signed request can be replayed to a different URL/method within the same skew window, since the default signing message is not bound to the target endpoint. Same mitigations as above. |
Unauthenticated flood of /challenge |
Rate limit at the framework / ingress (see Integrator requirements). The endpoint only format-validates the claimed hotkey, so a low-rate-limit is required to prevent cache pressure. |
| TLS / transport security | httpx defaults are used. Adopters running public endpoints must front the service with TLS and leave verify_ssl=True. |
| Key custody | The package never touches secret keys — it delegates signing to bittensor.Keypair. Clients are responsible for secure storage. |
The default MessageBuilder constructs the signing message as:
{hotkey}:{timestamp}:{nonce}
The signature therefore proves only that the keypair authorized some request at that timestamp with that nonce. It does not cover:
- the HTTP method (GET vs POST)
- the URL path or query string
- the request body
- any other request headers
Why this matters. Against an internet attacker over TLS, none of the above is observable, so this is not a concern in the default threat model. But certain deployment topologies place a TLS-internal party on the request path:
- A corporate TLS-intercepting proxy
- A logging / APM sidecar that receives decrypted traffic
- A traffic-replay tool used for load testing or debugging
- A compromised ingress or L7 load balancer
An attacker who holds one of those positions can:
- Tamper with the body of an in-flight request. The signature still verifies because it wasn't over the body.
- Replay a captured signed request to a different endpoint.
GET /fooheaders reused againstPOST /admin/delete-allwithin thetimestamp_skew_secondswindow (nonce permitting). - Swap query-string parameters.
?limit=10→?limit=100000.
What we are doing about it. Opt-in request binding — signing the method, path, and a hash of the body alongside the existing fields — is on the 0.2.0 roadmap. It is not a one-line change: it touches the client signing transport (streaming-body handling, canonical path), the server FastAPI middleware (body buffering, re-hashing, X-Body-SHA256 verification), and the MessageBuilder API shape. Subtle interactions with reverse proxies that rewrite paths also need a deliberate design. We would rather ship it once, correctly, than leave a partial mitigation in place.
What you can do in the meantime.
- Make sure TLS terminates at the same process (or one you fully trust) as your application. A TLS-terminating LB controlled by your org, running the same ops team's code, is a reasonable trust boundary; a vendor middlebox that receives decrypted traffic is not.
- For endpoints where the body carries high-stakes authorization data (admin actions, money movement, irreversible mutations), consider an additional application-level HMAC over
(method, path, canonical body)until the opt-in binding lands. The HMAC key can ride alongside the Bittensor signature. - Keep
timestamp_skew_secondsas small as your clocks tolerate (default 60s is usually enough). The cross-endpoint replay window is exactlytimestamp_skew_secondswide — shrinking it proportionally shrinks the attack surface.
SessionStore.revoke_all_sessions(hotkey) uses the atomic smembers_and_delete primitive to list and drop every session in one step, so the classic check-then-delete race is closed. A truly concurrent create_session(hotkey, ...) issued after smembers_and_delete returns will land in a fresh index key and survive revocation — there is no way to prevent this at the cache layer without a distributed lock.
Recommended defense in depth: pair revocation with a per-request ban check so a survived session is caught on its next use. This is what BittensorAuth(..., ban_checker=…) provides. Adopters who cannot tolerate the window (e.g., legal compliance requirements) should take an additional lock around ban + revocation at the application layer.
Running bittensor-auth safely in production requires the following configuration:
-
Redis backend, not
InMemoryCache.InMemoryCacheis process-local (sessions/nonces don't cross workers) and has no background sweeper. Use it only for tests and single-process development. -
Rate-limit
/challenge. The endpoint accepts any SS58-valid hotkey and stores one cache entry per request. Put a rate limiter (ingress-level or framework-level) in front of it. A sensible default is a per-IP budget on the order of 10–30 requests per minute and a per-hotkey budget of a handful per minute. -
timestamp_skew_secondssmall. 60 seconds is the default and is usually enough for real-world clock drift. Increasing it widens the replay window for captured signatures; don't push it beyond what your client population needs. -
metagraph_max_age_secondsset and monitored. The default (4 × metagraph_refresh_interval) tolerates a few consecutive sync failures but alarms if the chain endpoint is wholly down. ExposeMetagraphCache.last_synced_atin your monitoring so you can alarm on staleness before requests begin failing closed. -
Pair
revoke_all_sessionswithban_checker. As noted in Residual race, a concurrent session creation can survive revocation. Aban_checkeron the per-request path closes the window on next use. -
Don't downgrade the default
recheck_*_on_sessionflags without understanding the trade-off. The defaults cost one metagraph-cache lookup per authenticated request (in-memory, no chain round-trip). Disabling them freezes role and registration at session creation for up tosession_ttl_seconds.
As of v0.1.0 + security hardening, the following code paths have been reviewed end-to-end:
core.py— SR25519 verify, hotkey SS58 decodesigning.py— timestamp/skew validation, message buildersnonce.py— replay prevention, TTL vs skew invariantsession.py— challenge/session lifecycle,getdel,smembers_and_deletecache.py— in-memory + Redis backends, atomic primitivesmetagraph.py— background sync, staleness floor, O(1) hotkey indexfastapi/router.py— challenge/session/logout endpointsfastapi/dependencies.py— per-request and session-based FastAPI dependenciesclient.py—SigningTransport,generate_auth_headers,default_is_public_endpoint
Out of scope for the package's review (handled by the adopter):
- TLS / transport security
- Rate limiting / WAF
- Key custody on the client side
- Any authorization decision past the role returned by
role_resolver
v0.1.2 — security hardening
Replay / session model:
NonceTrackernow enforcesttl_seconds >= 2 × skew_seconds(the full two-sided skew window). The previousttl ≥ skewbound left a one-sided replay gap for future-dated signatures after the nonce evicted; the defaultttl_secondsis now 120s to match the default 60s skew.SessionStore.get_sessionrejects sessions withcreated_at <= revoked_after(inclusive), closing a same-second race where a session created in the same wall-clock second asrevoke_all_sessionscould survive the barrier.SessionStore.revoke_all_sessionsstamps a per-hotkeysession_revoked_afterepoch before sweeping the index;get_sessionrejects any session whosecreated_atpredates the stamp. Catches tokens that slip the SMEMBERS atomic step (e.g. sadd-ed concurrently with revocation).NonceTracker.registerrejects nonces containing:(the cache-key delimiter) so pathological values can't alias(hotkey, nonce)pairs to the same Redis key. NewAuthErrorCode.NONCE_INVALID_CHARS(400).default_is_public_endpointuses an exact-match whitelist instead ofendswith("/health")(which previously leaked on paths like/admin/x/health).- Bearer-token parsing is case-insensitive and whitespace-tolerant.
Cache / atomicity:
CacheBackend.sadd_with_ttlatomic primitive;SessionStore.create_sessionuses it so a crash between SADD and EXPIRE cannot leave the per-hotkey index without a TTL (previously a potential unbounded-memory path in Redis).- Added
CacheBackend.smembers_and_deleteatomic primitive;SessionStore.revoke_all_sessionsno longer has the classic SMEMBERS-then-DEL race. SessionStore.get_sessionreturnsNoneon malformed JSON instead of raising.
Metagraph / integrator:
- Added
recheck_registration_on_sessionandrecheck_ban_on_sessionconfig flags (defaultTrue): Bearer-token auth now re-resolves role and rechecks registration on every request. - Added
MetagraphCache.last_synced_atandseconds_since_last_sync()for integrator monitoring;metagraph_max_age_secondsfails closed when snapshots age out. - Hotkey-to-uid lookup is O(1) via a cached index swapped atomically with the metagraph snapshot.
Operator ergonomics:
BittensorAuthConfig.collapse_auth_error_codes(defaultFalse) — opt-in opaqueUNAUTHORIZEDon every 401 response, for deployments that want to close a mild error-code enumeration side channel.- Pinned
bittensor-walletto>=2.1.0,<3.0.0. Note:bittensor-walletis the only direct dependency; the fullbittensorSDK is not a direct dependency and is pulled transitively only when adopters construct aMetagraphCache. Adopters usingMetagraphCacheshould pinbittensorthemselves to controlKeypair.verifystability.
Documentation:
- This
SECURITY.mdnow documents the request-body / cross-endpoint replay gap under What the signature binds to (the signature covers(hotkey, timestamp, nonce)only — opt-in request binding is on the 0.2.0 roadmap).
v0.1.0 — initial release.