Skip to content

cold: unified handle architecture (supersedes #57)#58

Merged
prestwich merged 19 commits intomainfrom
prestwich/cold-unified-architecture
May 4, 2026
Merged

cold: unified handle architecture (supersedes #57)#58
prestwich merged 19 commits intomainfrom
prestwich/cold-unified-architecture

Conversation

@prestwich
Copy link
Copy Markdown
Member

Summary

Collapse cold storage from channels+dispatcher+writer-task into a single ColdStorage<B: ColdStorageBackend> handle over Arc<Inner<B>>. All concurrency primitives (read_sem(64), write_sem(1), stream_sem(8), TaskTracker, CancellationToken) live on Inner; all backend work spawns into the shared tracker; spawned tasks hold permits for the real duration of the backend call.

Supersedes #57. Tracks Linear ENG-2198.

Fixes three PR #57 review issues:

  1. Semaphore acquisition was scattered across three places — now all in the handle.
  2. StreamLogs setup held the outer read permit across unbounded awaits — streams now acquire only stream_sem; setup reads are isolated from the read pool by design (a stream requesting "latest" should observe latest at setup).
  3. Dispatcher-side timeout did not cancel backend work so permits released early — timeouts are now mandatory in the ColdStorageBackend trait and enforced at the backend layer, so permits honestly reflect in-flight work.

Shape:

  • ColdStorage<B> is clone-cheap (one Arc refcount bump); replaces ColdStorageHandle + ColdStorageReadHandle + ColdStorageTask.
  • ColdStorageWrite now takes &self across all backends (Mem / MDBX / SQL updated in lockstep).
  • Reads acquire a read permit in the handle and spawn; cache-hit fast path returns in the caller's task without permit/spawn.
  • Writes acquire write_sem then the drain barrier (read_sem.acquire_many_owned(64)); destructive writes invalidate the cache inside the spawned body while holding the drain.
  • Streams acquire only stream_sem; producer runs in the shared tracker.
  • Hidden shutdown coordinator task watches the cancel token and closes all three semaphores + the tracker; handle calls then fail fast with TaskTerminated.

Backend timeouts (mandatory in trait contract):

  • MDBX: tokio::task::spawn_blocking for reads, tokio::task::block_in_place for writes, in-body deadline check between per-block iterations. Post-commit WARN on overrun (advisory — MDBX commits are uninterruptible).
  • SQL: SET LOCAL statement_timeout = <ms> at the start of every Postgres transaction; SQLite skips (no equivalent).
  • Both backends expose with_read_timeout / with_write_timeout builders on backend and connector. Defaults 500ms / 2s.

Observability:

  • New crates/cold/src/metrics.rs following the ajj pattern.
  • Gauges: cold.reads_in_flight, cold.writes_in_flight, cold.streams_active.
  • Histograms: cold.op_duration_us, cold.permit_wait_us, cold.stream_lifetime_ms.
  • Counter: cold.op_errors_total.
  • #[tracing::instrument] on every handle method with stable op field; spans propagate into spawned tasks via .in_current_span().
  • Alerting is external (Prometheus on gauges); no in-process watchdog.

Error variants removed: Timeout, Backpressure, Cancelled. Backend timeouts now map to Backend. TaskTerminated remains and is returned from every handle method when the semaphore is closed.

Rollout: 10 commits, one per phase, each building + testing clean in isolation:

  1. `b539c8f` refactor(cold): ColdStorageWrite takes &self; all backends updated in lockstep
  2. `31e1c13` refactor(cold): unify handle around Arc; remove channels and dispatcher
  3. `b0d3063` fix(cold): stream permit acquired in handle; streams do not hold a read permit
  4. `9881f03` feat(cold): drain barrier moves to handle write path
  5. `9426d88` feat(cold): shutdown coordinator closes semaphores on cancel
  6. `902aab6` refactor(cold-mdbx): spawn_blocking reads, block_in_place writes, in-body iterator deadline
  7. `727c054` feat(cold-sql): mandatory statement_timeout; read_timeout and write_timeout builders
  8. `978e2af` feat(cold): metrics and tracing spans across all operations
  9. `3339c87` docs(cold): trait impl guide documents mandatory timeouts
  10. `cb06741` test(cold): concurrency suite covers new architecture

Spec: `docs/superpowers/specs/2026-04-20-cold-unified-architecture-design.md` (local — `docs/` is gitignored).

Test plan

  • `cargo +nightly fmt -- --check`
  • `cargo clippy --workspace --all-targets --all-features -- -D warnings`
  • `cargo clippy --workspace --all-targets --no-default-features -- -D warnings`
  • `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps`
  • `cargo t --workspace` — all suites green, including:
    • `tests/handle_shape.rs` (clone-cheap handle)
    • `tests/stream_isolation.rs` (stream setup not blocked by saturated reads)
    • `tests/drain_barrier.rs` (write waits for reads to drain)
    • `tests/shutdown.rs` (acquire fails fast after cancel)
    • `tests/concurrency.rs` (7 scenarios: 256-way reads, fairness, cancel during backpressure, cancel during drain, stream permit cancel, cache-through-truncate)
    • `crates/cold-mdbx/tests/timeout.rs` (iterator deadline trips; point lookups bypass)
    • `crates/cold-sql/tests/statement_timeout.rs` (gated on Postgres; skip locally without DATABASE_URL)
  • Reviewer validation path: cut a patch release, bump `init4tech/node-components`, rebuild `signet-sidecar:latest`, redeploy to dev mainnet, confirm no backpressure-induced crashes and no permit-wait regression on the new metrics.

🤖 Generated with Claude Code

prestwich and others added 10 commits April 21, 2026 12:33
…body iterator deadline

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `metrics` module under `crates/cold/src/metrics.rs` with const
metric names, help strings, a `LazyLock` describe block, and
`pub(crate)` helper functions for recording:

- `cold.reads_in_flight`, `cold.writes_in_flight`, `cold.streams_active`
  (gauges)
- `cold.op_duration_us` (histogram, labeled by op)
- `cold.permit_wait_us` (histogram, labeled by sem: read/write/drain/stream)
- `cold.op_errors_total` (counter, labeled by op and error kind)
- `cold.stream_lifetime_ms` (histogram)

Wires the helpers into every `ColdStorage<B>` handle method:
`spawn_read` and `spawn_write` time permit acquisition, bump in-flight,
measure op duration, record errors, and dec in-flight after the backend
call. Cache hits in `get_header`/`get_transaction`/`get_receipt` record
op duration only (no permit wait, no in-flight). `stream_logs`
instruments stream permit wait and records stream lifetime + gauge in
the spawned producer.

Adds `ColdStorageError::kind()` for the error metric label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich
Copy link
Copy Markdown
Member Author

prestwich commented Apr 21, 2026

[Claude Code]

Code review

Went deep on the concurrency model. One real regression worth addressing, plus a resource-lifetime nit.

Found 2 issues:

  1. Fast-path cache reads bypass the drain barrier. Withdrawn on review. Cache values are populated only from successful backend reads and are block-number-keyed; a fast-path read during a concurrent truncate returns a value that was real at some point, which is equivalent to serializing the read before the truncate. Caller has no happens-before with the concurrent writer, so this is linearizable. The drain barrier's purpose is to protect the backend from read-during-commit interference, not to globally serialize cache lookups.

  2. Shutdown coordinator keeps Inner alive indefinitely if cancel never fires. ColdStorage::new spawns an untracked tokio::spawn that moves Arc<Inner<B>> and awaits cancel.cancelled(). If a caller drops all ColdStorage clones without firing the token, the backend (MDBX env, Postgres pool, file handles) is pinned until process exit. Fixed in 25b4b6c. Coordinator now holds Weak<Inner>; Inner carries a DropGuard on a child cancel token so shutdown fires on either user-side cancel or Inner drop, and the coordinator exits cleanly in both cases.

Separately, CI was red on signet-cold-mdbx::mdbx_backend_conformance.map_err(ColdStorageError::backend) was unconditionally wrapping MdbxColdError::TooManyLogs as Backend(Box<_>), shadowing the From<MdbxColdError> impl that already translates it. Fixed in 09c2981 by routing all spawn_blocking result conversions through ::from.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

prestwich and others added 2 commits April 22, 2026 09:07
The coordinator task previously moved Arc<Inner<B>> into its body and
awaited the user's cancel token. If callers dropped all ColdStorage
clones without firing cancel, Inner (and the backend's file/DB handles)
stayed pinned until process exit.

Switch the coordinator to Weak<Inner>, and put a DropGuard on Inner that
fires a child cancel token. shutdown now fires on either user-side
cancel OR Inner drop; in the drop case upgrade() returns None and the
coordinator exits without pinning anything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ColdStorageError::backend unconditionally wraps as Backend(Box<_>),
which hid MdbxColdError::TooManyLogs behind the generic backend variant
and broke the conformance suite's max_logs assertion. The
From<MdbxColdError> for ColdStorageError impl already translates
TooManyLogs correctly and wraps the rest. Route all spawn_blocking
result conversions through ::from so the translation runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Evalir Evalir requested review from Evalir, Fraser999 and rswanson April 24, 2026 09:31
Copy link
Copy Markdown
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Deep-read focused on invariants, panics, and perf regressions. Below are the actionable items I found, ordered roughly by severity.


1. block_in_place in MDBX writes panics on a current_thread runtime

crates/cold-mdbx/src/backend.rs:882,897,912,929 — every MDBX write (append_block, append_blocks, truncate_above, drain_above) now wraps its inner call in tokio::task::block_in_place. That API panics on a single-threaded runtime. Pre-PR, this work ran in the dispatcher task and never hit that requirement.

Reads in the same file correctly use spawn_blocking — it is only the writes. The new tests added for this path all pin #[tokio::test(flavor = "multi_thread", worker_threads = 2)], which hides the issue in CI. Any consumer of MdbxColdBackend on a current_thread runtime will panic on the first write.

Fix: either use spawn_blocking like the read path does, or loudly document the multi-threaded runtime requirement on MdbxColdBackend / ColdStorage::new.

2. Cache-hit readers bypass the drain barrier → return just-truncated blocks mid-write

crates/cold/src/handle.rs:243-248, 294-300, 367-373get_header, get_transaction, get_receipt check self.inner.cache before acquiring a read_sem permit. spawn_write invalidates the cache after the backend mutation inside the same spawned task:

self.spawn_write("truncate_above", move |inner| async move {
    let result = inner.backend.truncate_above(block).await;  // tens of ms for MDBX
    if result.is_ok() {
        inner.cache.lock().await.invalidate_above(block);
    }
    result
})

Ordering: drain barrier acquired → backend commits the truncate → cache invalidated. Between the backend commit and the cache invalidation, a cache-hit reader can still return a header/tx/receipt for a block the backend has already deleted. The cache_consistent_through_truncate test only checks the post-commit state — it does not exercise this window.

Fix: invalidate the cache before calling backend.truncate_above / backend.drain_above. The drain barrier already guarantees no spawn_read task is racing to re-populate, so pre-invalidation closes the window.

3. stream_logs setup holds stream_sem across an un-timeouted backend call

crates/cold/src/handle.rs:519-544 — the permit is acquired, then self.inner.backend.get_latest_block().await? runs before the task is spawned. The backend call has no timeout:

  • SQL: begin_readpool.begin() can park up to sqlx's acquire_timeout (default 30 s) if the pool is saturated.
  • MDBX: get_latest_block is a point lookup which skips the deadline check entirely (see issue 5).

If 8 callers land here with a slow / hung backend, stream_sem is fully held and no new streams can start. The permit only exists to cap in-flight producer tasks; holding it across setup I/O is exactly the bug the PR #57 review (item 2) was meant to avoid.

Fix: either wrap the setup I/O in tokio::time::timeout(effective_deadline, …), or acquire the stream_sem permit after resolving to.

4. MDBX iterator timeouts are only checked between blocks, never within

crates/cold-mdbx/src/backend.rs:658, 483-487, 517-519, and the same pattern in produce_log_stream_blocking:125-131. get_logs_inner / collect_signet_events_in_range / collect_zenith_headers_in_range / the streaming producer check Instant::now() > deadline at the top of each block iteration — never inside the per-receipt or per-log loops. A single block with thousands of matching logs runs far past the 500 ms default before the next check.

In the streaming path it is worse: the deadline-error branch at block boundaries is the only place the deadline ever trips, and the inner blocking_send to a slow receiver can stall indefinitely per log.

Fix: add a bounded periodic check in the inner loops (every N logs/receipts), at minimum inside produce_log_stream_blocking's inner log loop so a slow consumer plus a big block cannot stall a spawn_blocking worker past the deadline.

5. MDBX point lookups have no timeout enforcement at all

crates/cold-mdbx/src/backend.rs:700-704, 726-730, 750-754, 795-799, 861-875get_header, get_transaction, get_receipt, get_zenith_header, get_latest_block all spawn_blocking their inner call with no deadline and no timeout wrapper. The trait doc exempts point lookups, but MDBX point reads can still block on disk I/O for arbitrary durations (cold page fetch, adjacent writer doing a page split, etc.).

A stuck point lookup holds a spawn_blocking worker and a read_sem permit. Enough stuck lookups ⇒ read_sem exhausted, and the handle cannot fail fast because nothing resolves.

Fix: wrap each spawn_blocking in tokio::time::timeout(self.read_timeout, …). The blocking thread still runs to completion (unavoidable), but the handle releases its permit promptly and returns an error to the caller.

6. In-flight gauges leak on panic

crates/cold/src/handle.rs:154-168, 211-226, 550-560inc_in_flight runs before the spawned body, dec_in_flight runs at the end. If anything between them panics (f(inner).await, the backend call, record_op_duration, the cache .lock()), the decrement is skipped and cold.reads_in_flight / cold.writes_in_flight / cold.streams_active drifts up permanently.

Semaphore permits are Drop-based and do release correctly; only the gauges leak. These are the gauges the PR pitches Prometheus alerts against, so drift poisons the signal.

Fix: wrap the in-flight bookkeeping in a small Drop guard struct.

7. Write-timeout WARN fires on error paths

crates/cold-mdbx/src/backend.rs:878-940 — pattern:

let result = tokio::task::block_in_place(|| …);
let elapsed = start.elapsed();
if elapsed > threshold {
    tracing::warn!(, "mdbx … exceeded advisory write timeout");
}
Ok(result?)

If the write fails (disk full, corruption) after > 2 s, the caller sees ColdStorageError::Backend(…) and a misleading advisory-write-timeout WARN. Noisy for alerting, since the summary pitches this WARN as an SLO signal.

Fix: only warn on Ok.


Notes — not bugs, but worth an eye before merge

  • tokio::sync::Mutex<ColdCache> (handle.rs:56) — the cache operations hold the lock only across a synchronous LRU op. A std::sync::Mutex / parking_lot::Mutex would avoid the async-runtime yield on every cache access. Minor perf.
  • Removed error variants (Timeout, Backpressure, Cancelled) collapse into Backend for MDBX timeouts — downstream callers that used to match on Timeout now have to string-match. Worth confirming nothing does.
  • MemColdBackend silently violates the new "Timeouts (mandatory)" contract. The trait doc says exempt backends "may document their exemption explicitly" — MemColdBackend's type doc does not.

@prestwich
Copy link
Copy Markdown
Member Author

wrt #2, this is intentional. returning slightly stale data > blocking

prestwich and others added 2 commits April 24, 2026 11:11
- stream_logs resolves `to` (and get_latest_block fallback) before
  acquiring stream_sem, so a stuck backend no longer pins all 8 permits
  across setup I/O.
- In-flight gauges are now maintained by an InFlightGuard RAII wrapper
  so the decrement survives a panic in the spawned body; previously a
  panic left cold.reads_in_flight / writes_in_flight / streams_active
  drifting up and poisoning the Prometheus alert signal.
- Promote timeout to a first-class ColdStorageError::DeadlineExceeded
  variant. MDBX Timeout now routes through it (not Backend), and
  downstream callers can match without downcasting. Fixes stale
  Backpressure references in the cold and storage READMEs and the
  signet-storage skill doc.
- ColdCache switches from tokio::sync::Mutex to parking_lot::Mutex.
  The cache only ever holds the lock across synchronous LRU ops, so
  the async mutex's yield-on-lock was pure overhead.
- MemColdBackend now explicitly documents its exemption from the
  trait's mandatory-timeout contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Writes (append_block, append_blocks, truncate_above, drain_above)
  now use tokio::task::spawn_blocking instead of block_in_place.
  block_in_place panics on a current_thread runtime, so any consumer
  wiring MdbxColdBackend into a single-threaded Tokio would hit the
  first write and crash. Added writes_work_on_current_thread_runtime
  regression test.
- Overrun WARN fires only on successful writes. A failed write that
  took > 2 s already surfaces Backend(...) to the caller; a spurious
  advisory-write-timeout WARN on the error path would poison any SLO
  alert built on that signal.
- Iterator reads gained inner-loop deadline checks: per-receipt in
  get_logs_inner, per-event in collect_signet_events_in_range,
  per-receipt + per-log in produce_log_stream_blocking. A block with
  many matching logs (or a slow stream consumer) can no longer run
  unbounded past the configured deadline.
- MdbxColdError::Timeout now maps to ColdStorageError::DeadlineExceeded
  (new variant) instead of Backend. Updated the existing timeout test
  to match on the variant directly.
- Documented the point-lookup timeout exemption: MDBX page I/O on
  cold pages can stall arbitrarily, and the handle does not wrap point
  lookups in a tokio::time::timeout, so a stuck lookup ties up a
  spawn_blocking worker AND a read_sem permit. Callers that need
  fail-fast behavior should wrap at the call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich
Copy link
Copy Markdown
Member Author

[Claude Code]

Thanks @Evalir — solid review. Pushed two commits addressing everything except #2 (already-considered; see below).

# Finding Action Commit
1 block_in_place panics on current_thread runtime Writes switched to spawn_blocking; added writes_work_on_current_thread_runtime regression test 7c9d50a
2 Cache-hit readers bypass drain barrier Ignored. Identical to the finding I withdrew in the prior comment — cache values are block-number-keyed and populated only from successful backend reads, so a fast-path hit during a concurrent truncate linearizes before the truncate. The reader has no happens-before with the writer. Pre-invalidation would be cheap hardening but not a correctness fix.
3 stream_sem held across un-timeouted setup I/O stream_logs now resolves to (incl. get_latest_block fallback) before acquiring the permit 4f67ca7
4 Iterator deadlines only checked between blocks Added per-receipt check in get_logs_inner, per-event in collect_signet_events_in_range, per-receipt + per-log in produce_log_stream_blocking 7c9d50a
5 Point lookups have no timeout Documented. Exemption is intentional (MDBX native latency profile), but the type-level docs + with_read_timeout docs now explicitly call out the spawn_blocking worker / read_sem permit implications so callers know to wrap at the call site when fail-fast matters. 7c9d50a
6 In-flight gauges leak on panic Introduced InFlightGuard RAII wrapper; spawn_read / spawn_write / stream_logs all use it, so the decrement runs even on panic 4f67ca7
7 Advisory-write-timeout WARN fires on error paths Overrun WARN now only fires on Ok; factored out to a small helper 7c9d50a
Note Action Commit
N1 — tokio::sync::Mutex for cache is overkill Switched to parking_lot::Mutex — no await is ever held across the guard, so the async mutex was pure overhead 4f67ca7
N2 — Removed Timeout/Backpressure/Cancelled collapse into Backend Promoted to a first-class ColdStorageError::DeadlineExceeded(Duration) variant. MDBX Timeout now routes through it. Audited downstream — no runtime code matched on the removed variants; fixed stale Backpressure references in the cold and storage READMEs and the signet-storage skill doc. 4f67ca7
N3 — MemColdBackend silently violates the "Timeouts (mandatory)" contract Added an explicit # Timeout exemption section to the type docs, noting it's test-only and MUST NOT be used for SLO/prod 4f67ca7

Verified against the pre-push hook suite: cargo +nightly fmt --check, workspace clippy (--all-features and --no-default-features, -D warnings), RUSTDOCFLAGS="-D warnings" cargo doc, and the cold / cold-mdbx / storage test suites all green.

@Evalir Evalir self-requested a review April 27, 2026 08:47
Comment thread crates/cold-mdbx/src/backend.rs Outdated
for result in iter {
// Per-receipt deadline check bounds iteration cost across
// blocks with many receipts.
if std::time::Instant::now() > deadline {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::time::Instant could be just Instant (same thing in a few places in this file).

Comment thread crates/cold-mdbx/src/backend.rs Outdated
Comment on lines +949 to +951
let threshold = self.write_timeout;
let this = self.clone();
let start = Instant::now();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Capturing start here means the SLO measurement only covers spawn_blocking queueing + the MDBX commit. It excludes the handle-side wait on write_sem and the drain barrier, which is precisely the failure shape that wedged production at #56 (slow readers blocking writes). A 3 s drain followed by a 100 ms commit measures 100 ms here, well under the 2 s threshold, and the WARN never fires.

Same applies to append_blocks, truncate_above, and drain_above.

Suggest hoisting the SLO timing into ColdStorage::spawn_write so it captures start before write_sem.acquire_owned, then reading the configured threshold via a new ColdStorageBackend::write_timeout(&self) -> Option<Duration> accessor (defaulting to None so MemColdBackend opts out cleanly). The backend method then becomes a thin spawn_blocking wrapper with no timing logic, and warn_on_overrun can be deleted.

Trade-off worth calling out in the WARN message or commit notes: an e2e measurement will surface reader-side contention as write SLO violations. That is arguably exactly the signal you want given #56, but it is a deliberate change in what the alert means.

Comment on lines +29 to +32
assert!(
msg.contains("canceling") || msg.contains("57014") || msg.contains("timeout"),
"expected statement_timeout error, got: {err}"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

String-matching here hides that PG statement_timeout (SQLSTATE 57014) reaches the caller as Backend(...), not DeadlineExceeded. MDBX timeouts route to DeadlineExceeded (crates/cold-mdbx/src/error.rs:33); PG doesn't, because From<SqlColdError> for ColdStorageError (crates/cold-sql/src/error.rs:15-19) blindly wraps every sqlx error as Backend. Result: callers can't match on timeout without string-matching, and cold.op_errors_total{error=backend} collapses "query too slow" with "backend down".

Suggest: add SqlColdError::Timeout(Duration), detect 57014 at the SQL boundary (carrying the configured read_timeout/write_timeout), map to DeadlineExceeded(d). Then this assertion becomes assert!(matches!(err, SqlColdError::Timeout(_))) and a future refactor that drops timeout detection actually fails the test.

) -> Result<sqlx::Transaction<'a, sqlx::Any>, SqlColdError> {
if self.is_postgres {
let ms = i64::try_from(d.as_millis()).unwrap_or(i64::MAX);
sqlx::query(&format!("SET LOCAL statement_timeout = {ms}")).execute(&mut *tx).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

In PG, SET LOCAL statement_timeout = 0 means "no timeout". The builders with_read_timeout / with_write_timeout (line 154, 165 here, plus the SqlConnector mirrors) accept any Duration including Duration::ZERO, so a caller passing zero (or computing one from a config that defaults to zero) silently disables the trait-level "Timeouts (mandatory)" contract at crates/cold/src/traits.rs:267-296.

Suggest either rejecting zero / sub-millisecond values at the builder (runtime check on first use, or non-const builder returning Result), or documenting the disable behaviour explicitly on with_read_timeout / with_write_timeout. Either way the trait-level "MUST enforce" wording should be reconciled with what the SQL backend actually allows.

Comment on lines +719 to +721
if Instant::now() > deadline {
return Err(MdbxColdError::Timeout(read_timeout));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

This per-receipt check doesn't bound the inner per-log loop. A single receipt with thousands of matching logs (large mints, batched events) iterates here unchecked. The streaming path (produce_log_stream_blocking:159-194) checks per-log; the non-streaming path inconsistently doesn't.

Comment thread crates/storage/src/unified.rs Outdated
///
/// - [`Hot`]: Hot storage read or unwind failed.
/// - [`Cold`]: Hot storage unwound but cold truncate dispatch failed.
/// - [`Cold`]: Hot storage unwound but cold drain failed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

This claim is wrong as written. The implementation at line 212 below uses self.cold.drain_above(block).await.unwrap_or_default(), which silently turns every cold error (TaskTerminated, Backend, DeadlineExceeded, etc.) into Vec::new(). StorageError::Cold is never returned from this method.

Pre-existing behaviour, but the doc rewrite in this PR added the claim, so the gap between docs and impl is new. Suggest either:

  • Restore real error propagation (return Cold on at least non-TaskTerminated failures).
  • Keep the silent-swallow behaviour and update the doc + the "best-effort - failure = normal cold lag" comment to admit it. Currently that comment conflates "cold returned an error" with "cold has no data above" - two different states that operators need to distinguish.

Comment thread crates/cold/src/handle.rs Outdated
.in_current_span(),
)
.await
.map_err(|_| ColdStorageError::TaskTerminated)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

This JoinError only fires on panic in normal operation - tracked tasks aren't dropped by the runtime, so the only way to get one here is the spawned body panicking inside f(inner).await (i.e. backend code). Mapping it to TaskTerminated tells the caller "shutdown", which on-call will read as "graceful exit", not
"backend panicked". The original error is also swallowed silently - no log of the panic message or location.

Same pattern at line 231 (spawn_write).

Suggest at minimum logging the JoinError at ERROR before mapping. Optionally introduce a ColdStorageError::BackendPanic variant so panics are distinguishable from semaphore-close shutdown - the two failure modes have very different ops responses (panic = bug to chase, TaskTerminated = expected shutdown signal).

Comment thread crates/cold/src/handle.rs Outdated
// in-flight write.
let to = match filter.get_to_block() {
Some(to) => to,
None => match self.inner.backend.get_latest_block().await? {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

This setup get_latest_block runs before any permit acquisition (deliberate, per the comment above) AND without caller-side timeout protection. With MDBX, point lookups have no in-body deadline so it can stall on disk I/O indefinitely. With PG, pool.begin() inside begin_read can park for acquire_timeout (default 30 s) on a saturated pool. Either way, N concurrent stream_logs callers do uncapped backend reads concurrently with no concurrency cap.

tests/stream_isolation.rs:36-43 uses a bounded to_block, so this code path isn't regression-tested. With a GatedBackend that gates point lookups, you could pin a setup-stall scenario.

Suggest one of:

  • Wrap this call in tokio::time::timeout(self.inner.setup_timeout, ...) so it can fail fast (with the standard caveat that the future drops but backend work continues - the same trade-off the rest of the design accepts).
  • Move stream_sem.acquire_owned() to before this call. That re-introduces the original concern (a stuck setup pins stream permits), but bounds the call by stream_sem rather than leaving it completely uncapped.

Comment thread crates/cold/src/traits.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Stale doc: the signatures below all take &self, and the backend is no longer "exclusively owned by the task runner" - the handle wraps it in Arc<Inner> and shares it across spawned tasks.

prestwich and others added 4 commits April 30, 2026 13:07
The PR-#58 doc rewrite advertised a `Cold` error path, but the impl
collapses every cold error into `Vec::new()`. Update the doc + comment
to admit silent-swallow behaviour. ENG-2210 tracks the propagation
decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SQLSTATE 57014 (query_canceled, emitted on `statement_timeout` expiry)
was wrapped as `Sqlx(...)` and surfaced to the handle as `Backend(...)`,
breaking symmetry with the MDBX backend (which routes its `Timeout` to
`ColdStorageError::DeadlineExceeded`). The metric
`cold.op_errors_total{error="backend"}` therefore conflated "query too
slow" with "backend down".

`From<sqlx::Error> for SqlColdError` now detects 57014 and produces a
dedicated `Timeout` variant; `From<SqlColdError> for ColdStorageError`
maps it to `DeadlineExceeded`. The configured deadline is not threaded
to this conversion boundary so the surfaced duration is `ZERO`;
threading the real value is a separate refactor (left for a follow-up
once the call sites are confirmed to need it).

The `pg_statement_timeout` test is rewritten to match on the typed
variant rather than a substring of the error message — a future
refactor that drops 57014 detection now fails the test instead of
silently passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two accessors to the `ColdStorageBackend` trait:

    fn read_timeout(&self) -> Option<Duration> { None }
    fn write_timeout(&self) -> Option<Duration> { None }

Wired through `MdbxColdBackend`, `SqlColdBackend`, and `EitherCold`.
`MemColdBackend` returns `None` (already-documented test exemption).

Two behaviour changes use these:

1. The advisory write-SLO WARN moves from the MDBX backend
   (`warn_on_overrun` per-method) to `ColdStorage::spawn_write`. Timing
   is now captured before `write_sem` acquisition, so the elapsed
   value covers the queue wait, the read drain, and the commit
   end-to-end. The failure shape that wedged production at #56 — slow
   readers gating writes — now surfaces as a write-SLO violation
   rather than as a sub-threshold backend timing.

2. `stream_logs`'s setup `get_latest_block` is wrapped in
   `tokio::time::timeout(backend.read_timeout(), ...)`. Without this,
   a stuck point lookup (cold MDBX page) or a saturated PG pool
   parking on `acquire_timeout` could pin N concurrent setup callers
   indefinitely with no permit cap. The setup read still bypasses
   `read_sem` and the drain barrier by design.

Also drops the now-unused `tracing` dep from `signet-cold-mdbx` and
updates the type docs to point at the handle's new WARN path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builders for `read_timeout` / `write_timeout` on both the MDBX and SQL
backends and connectors now panic on zero. Postgres treats
`statement_timeout = 0` as "no timeout", so a caller passing
`Duration::ZERO` (or computing one from a config that defaults to
zero) would silently disable the trait-level mandatory-timeout
contract. MDBX accepts the same assert for symmetry — zero there is a
useless config rather than a silent disable, but the trait says
non-zero and the assert keeps the surface honest.

`spawn_read` / `spawn_write` now log spawned-task `JoinError`s before
mapping to `TaskTerminated`. A backend panic was previously
indistinguishable from graceful shutdown for the on-call: panics fire
ERROR with the panic message, cancellations fire DEBUG. The error
variant still collapses to `TaskTerminated` per design.

`MdbxColdBackend::get_logs_inner` now checks the deadline inside the
inner per-log loop, mirroring the streaming path. Previously a single
receipt with thousands of matching logs would iterate unchecked past
the configured `read_timeout`. The two `std::time::Instant::now()`
sites in `produce_log_stream_blocking` are also folded into the
already-imported `Instant`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich
Copy link
Copy Markdown
Member Author

[Claude Code]

Thanks @Fraser999 — pushed four commits addressing all nine inline comments. Summary:

# Finding Action Commit
1 std::time::Instant::now() where Instant is already in scope Folded both sites in produce_log_stream_blocking e6d44e4
2 Write SLO start captured inside spawn_blocking, missing write_sem + drain wait Hoisted timing into ColdStorage::spawn_write; added ColdStorageBackend::write_timeout() accessor; deleted MdbxColdBackend::warn_on_overrun. WARN now measures end-to-end (queue + drain + commit), so reader contention surfaces as a write-SLO violation — explicit signal for the #56 failure shape. f5fdaae
3 PG SQLSTATE 57014 reaches caller as Backend(...), not DeadlineExceeded; test was string-matching Added SqlColdError::Timeout; From<sqlx::Error> now detects 57014; From<SqlColdError> for ColdStorageError maps to DeadlineExceeded. Test now matches on the typed variant. The configured deadline is not threaded to this conversion boundary so the surfaced Duration is ZERO (variant correctness > duration precision; threading the real value is a separate refactor). d3e3210
4 with_read_timeout(Duration::ZERO) silently disables PG statement_timeout Builders are now non-const and assert! non-zero on both MDBX and SQL backends + connectors. Failure is loud at boot rather than silent at runtime. e6d44e4
5 get_logs_inner per-receipt deadline check doesn't bound the inner per-log loop Added per-log Instant::now() > deadline check inside the receipt loop, mirroring the streaming path e6d44e4
6 unified::drain_above doc claims Cold error path; impl is unwrap_or_default Doc + comment rewritten to admit silent-swallow behaviour. Propagation/deletion decision tracked in ENG-2210 under ENG-2198. 0b965bf
7 JoinError mapped to TaskTerminated swallows backend panics spawn_read / spawn_write now log JoinError before mapping: panics fire ERROR with the panic message, cancellations fire DEBUG. Variant still collapses to TaskTerminated per design. e6d44e4
8 stream_logs setup get_latest_block runs without permit cap or timeout Wrapped in tokio::time::timeout(backend.read_timeout(), ...) with a 500 ms fallback for backends that don't advertise a timeout. Standard caveat applies (future drops on timeout, backend work continues). f5fdaae
9 Stale trait doc on ColdStorageWrite ("exclusively owned by the task runner") Rewritten to describe &self shared-via-Arc ownership f5fdaae

Pre-push hook clean: cargo +nightly fmt --check, workspace clippy --all-features and --no-default-features (-D warnings), RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps, and the cold / cold-mdbx / cold-sql / storage test suites all green.

Pins the new setup-timeout behaviour against regression. The test
parks `GatedBackend::get_latest_block` indefinitely and asserts that
`stream_logs` (with no `to_block` on the filter, forcing the
"resolve to=latest" path) returns `DeadlineExceeded` within the
configured 50 ms `read_timeout` rather than hanging.

Adds `GatedBackend::with_read_timeout` so tests can advertise a
custom read timeout to the handle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Would be nice to ticket threading the timeout value through to the actual error, since hard-coding ZERO isn't ideal, but this would be an easy standalone follow-up :)

@prestwich prestwich merged commit 783cdb2 into main May 4, 2026
7 checks passed
@prestwich prestwich deleted the prestwich/cold-unified-architecture branch May 4, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants