cold: unified handle architecture (supersedes #57) by prestwich · Pull Request #58 · init4tech/storage

prestwich · 2026-04-21T21:03:16Z

Summary

Collapse cold storage from channels+dispatcher+writer-task into a single ColdStorage<B: ColdStorageBackend> handle over Arc<Inner>. All concurrency primitives (read_sem(64), write_sem(1), stream_sem(8), TaskTracker, CancellationToken) live on Inner; all backend work spawns into the shared tracker; spawned tasks hold permits for the real duration of the backend call.

Supersedes #57. Tracks Linear ENG-2198.

Fixes three PR #57 review issues:

Semaphore acquisition was scattered across three places — now all in the handle.
StreamLogs setup held the outer read permit across unbounded awaits — streams now acquire only stream_sem; setup reads are isolated from the read pool by design (a stream requesting "latest" should observe latest at setup).
Dispatcher-side timeout did not cancel backend work so permits released early — timeouts are now mandatory in the ColdStorageBackend trait and enforced at the backend layer, so permits honestly reflect in-flight work.

Shape:

ColdStorage is clone-cheap (one Arc refcount bump); replaces ColdStorageHandle + ColdStorageReadHandle + ColdStorageTask.
ColdStorageWrite now takes &self across all backends (Mem / MDBX / SQL updated in lockstep).
Reads acquire a read permit in the handle and spawn; cache-hit fast path returns in the caller's task without permit/spawn.
Writes acquire write_sem then the drain barrier (read_sem.acquire_many_owned(64)); destructive writes invalidate the cache inside the spawned body while holding the drain.
Streams acquire only stream_sem; producer runs in the shared tracker.
Hidden shutdown coordinator task watches the cancel token and closes all three semaphores + the tracker; handle calls then fail fast with TaskTerminated.

Backend timeouts (mandatory in trait contract):

MDBX: tokio::task::spawn_blocking for reads, tokio::task::block_in_place for writes, in-body deadline check between per-block iterations. Post-commit WARN on overrun (advisory — MDBX commits are uninterruptible).
SQL: SET LOCAL statement_timeout = <ms> at the start of every Postgres transaction; SQLite skips (no equivalent).
Both backends expose with_read_timeout / with_write_timeout builders on backend and connector. Defaults 500ms / 2s.

Observability:

New crates/cold/src/metrics.rs following the ajj pattern.
Gauges: cold.reads_in_flight, cold.writes_in_flight, cold.streams_active.
Histograms: cold.op_duration_us, cold.permit_wait_us, cold.stream_lifetime_ms.
Counter: cold.op_errors_total.
#[tracing::instrument] on every handle method with stable op field; spans propagate into spawned tasks via .in_current_span().
Alerting is external (Prometheus on gauges); no in-process watchdog.

Error variants removed: Timeout, Backpressure, Cancelled. Backend timeouts now map to Backend. TaskTerminated remains and is returned from every handle method when the semaphore is closed.

Rollout: 10 commits, one per phase, each building + testing clean in isolation:

`b539c8f` refactor(cold): ColdStorageWrite takes &self; all backends updated in lockstep
`31e1c13` refactor(cold): unify handle around Arc; remove channels and dispatcher
`b0d3063` fix(cold): stream permit acquired in handle; streams do not hold a read permit
`9881f03` feat(cold): drain barrier moves to handle write path
`9426d88` feat(cold): shutdown coordinator closes semaphores on cancel
`902aab6` refactor(cold-mdbx): spawn_blocking reads, block_in_place writes, in-body iterator deadline
`727c054` feat(cold-sql): mandatory statement_timeout; read_timeout and write_timeout builders
`978e2af` feat(cold): metrics and tracing spans across all operations
`3339c87` docs(cold): trait impl guide documents mandatory timeouts
`cb06741` test(cold): concurrency suite covers new architecture

Spec: `docs/superpowers/specs/2026-04-20-cold-unified-architecture-design.md` (local — `docs/` is gitignored).

Test plan

`cargo +nightly fmt -- --check`
`cargo clippy --workspace --all-targets --all-features -- -D warnings`
`cargo clippy --workspace --all-targets --no-default-features -- -D warnings`
`RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps`
`cargo t --workspace` — all suites green, including:
- `tests/handle_shape.rs` (clone-cheap handle)
- `tests/stream_isolation.rs` (stream setup not blocked by saturated reads)
- `tests/drain_barrier.rs` (write waits for reads to drain)
- `tests/shutdown.rs` (acquire fails fast after cancel)
- `tests/concurrency.rs` (7 scenarios: 256-way reads, fairness, cancel during backpressure, cancel during drain, stream permit cancel, cache-through-truncate)
- `crates/cold-mdbx/tests/timeout.rs` (iterator deadline trips; point lookups bypass)
- `crates/cold-sql/tests/statement_timeout.rs` (gated on Postgres; skip locally without DATABASE_URL)
Reviewer validation path: cut a patch release, bump `init4tech/node-components`, rebuild `signet-sidecar:latest`, redeploy to dev mainnet, confirm no backpressure-induced crashes and no permit-wait regression on the new metrics.

🤖 Generated with Claude Code

… lockstep

…ispatcher

…ad permit

…body iterator deadline Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…imeout builders

Adds a `metrics` module under `crates/cold/src/metrics.rs` with const metric names, help strings, a `LazyLock` describe block, and `pub(crate)` helper functions for recording: - `cold.reads_in_flight`, `cold.writes_in_flight`, `cold.streams_active` (gauges) - `cold.op_duration_us` (histogram, labeled by op) - `cold.permit_wait_us` (histogram, labeled by sem: read/write/drain/stream) - `cold.op_errors_total` (counter, labeled by op and error kind) - `cold.stream_lifetime_ms` (histogram) Wires the helpers into every `ColdStorage` handle method: `spawn_read` and `spawn_write` time permit acquisition, bump in-flight, measure op duration, record errors, and dec in-flight after the backend call. Cache hits in `get_header`/`get_transaction`/`get_receipt` record op duration only (no permit wait, no in-flight). `stream_logs` instruments stream permit wait and records stream lifetime + gauge in the spawned producer. Adds `ColdStorageError::kind()` for the error metric label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich · 2026-04-21T21:10:36Z

[Claude Code]

Code review

Went deep on the concurrency model. One real regression worth addressing, plus a resource-lifetime nit.

Found 2 issues:

~~Fast-path cache reads bypass the drain barrier.~~ Withdrawn on review. Cache values are populated only from successful backend reads and are block-number-keyed; a fast-path read during a concurrent truncate returns a value that was real at some point, which is equivalent to serializing the read before the truncate. Caller has no happens-before with the concurrent writer, so this is linearizable. The drain barrier's purpose is to protect the backend from read-during-commit interference, not to globally serialize cache lookups.
Shutdown coordinator keeps Inner alive indefinitely if cancel never fires. ColdStorage::new spawns an untracked tokio::spawn that moves Arc<Inner> and awaits cancel.cancelled(). If a caller drops all ColdStorage clones without firing the token, the backend (MDBX env, Postgres pool, file handles) is pinned until process exit. Fixed in 25b4b6c. Coordinator now holds Weak<Inner>; Inner carries a DropGuard on a child cancel token so shutdown fires on either user-side cancel or Inner drop, and the coordinator exits cleanly in both cases.

Separately, CI was red on signet-cold-mdbx::mdbx_backend_conformance — .map_err(ColdStorageError::backend) was unconditionally wrapping MdbxColdError::TooManyLogs as Backend(Box<_>), shadowing the From<MdbxColdError> impl that already translates it. Fixed in 09c2981 by routing all spawn_blocking result conversions through ::from.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

The coordinator task previously moved Arc<Inner> into its body and awaited the user's cancel token. If callers dropped all ColdStorage clones without firing cancel, Inner (and the backend's file/DB handles) stayed pinned until process exit. Switch the coordinator to Weak<Inner>, and put a DropGuard on Inner that fires a child cancel token. shutdown now fires on either user-side cancel OR Inner drop; in the drop case upgrade() returns None and the coordinator exits without pinning anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ColdStorageError::backend unconditionally wraps as Backend(Box<_>), which hid MdbxColdError::TooManyLogs behind the generic backend variant and broke the conformance suite's max_logs assertion. The From<MdbxColdError> for ColdStorageError impl already translates TooManyLogs correctly and wraps the rest. Route all spawn_blocking result conversions through ::from so the translation runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Evalir

[Claude Code] Deep-read focused on invariants, panics, and perf regressions. Below are the actionable items I found, ordered roughly by severity.

1. `block_in_place` in MDBX writes panics on a `current_thread` runtime

crates/cold-mdbx/src/backend.rs:882,897,912,929 — every MDBX write (append_block, append_blocks, truncate_above, drain_above) now wraps its inner call in tokio::task::block_in_place. That API panics on a single-threaded runtime. Pre-PR, this work ran in the dispatcher task and never hit that requirement.

Reads in the same file correctly use spawn_blocking — it is only the writes. The new tests added for this path all pin #[tokio::test(flavor = "multi_thread", worker_threads = 2)], which hides the issue in CI. Any consumer of MdbxColdBackend on a current_thread runtime will panic on the first write.

Fix: either use spawn_blocking like the read path does, or loudly document the multi-threaded runtime requirement on MdbxColdBackend / ColdStorage::new.

2. Cache-hit readers bypass the drain barrier → return just-truncated blocks mid-write

crates/cold/src/handle.rs:243-248, 294-300, 367-373 — get_header, get_transaction, get_receipt check self.inner.cache before acquiring a read_sem permit. spawn_write invalidates the cache after the backend mutation inside the same spawned task:

self.spawn_write("truncate_above", move |inner| async move {
    let result = inner.backend.truncate_above(block).await;  // tens of ms for MDBX
    if result.is_ok() {
        inner.cache.lock().await.invalidate_above(block);
    }
    result
})

Ordering: drain barrier acquired → backend commits the truncate → cache invalidated. Between the backend commit and the cache invalidation, a cache-hit reader can still return a header/tx/receipt for a block the backend has already deleted. The cache_consistent_through_truncate test only checks the post-commit state — it does not exercise this window.

Fix: invalidate the cache before calling backend.truncate_above / backend.drain_above. The drain barrier already guarantees no spawn_read task is racing to re-populate, so pre-invalidation closes the window.

3. `stream_logs` setup holds `stream_sem` across an un-timeouted backend call

crates/cold/src/handle.rs:519-544 — the permit is acquired, then self.inner.backend.get_latest_block().await? runs before the task is spawned. The backend call has no timeout:

SQL: begin_read → pool.begin() can park up to sqlx's acquire_timeout (default 30 s) if the pool is saturated.
MDBX: get_latest_block is a point lookup which skips the deadline check entirely (see issue 5).

If 8 callers land here with a slow / hung backend, stream_sem is fully held and no new streams can start. The permit only exists to cap in-flight producer tasks; holding it across setup I/O is exactly the bug the PR #57 review (item 2) was meant to avoid.

Fix: either wrap the setup I/O in tokio::time::timeout(effective_deadline, …), or acquire the stream_sem permit after resolving to.

4. MDBX iterator timeouts are only checked between blocks, never within

crates/cold-mdbx/src/backend.rs:658, 483-487, 517-519, and the same pattern in produce_log_stream_blocking:125-131. get_logs_inner / collect_signet_events_in_range / collect_zenith_headers_in_range / the streaming producer check Instant::now() > deadline at the top of each block iteration — never inside the per-receipt or per-log loops. A single block with thousands of matching logs runs far past the 500 ms default before the next check.

In the streaming path it is worse: the deadline-error branch at block boundaries is the only place the deadline ever trips, and the inner blocking_send to a slow receiver can stall indefinitely per log.

Fix: add a bounded periodic check in the inner loops (every N logs/receipts), at minimum inside produce_log_stream_blocking's inner log loop so a slow consumer plus a big block cannot stall a spawn_blocking worker past the deadline.

5. MDBX point lookups have no timeout enforcement at all

crates/cold-mdbx/src/backend.rs:700-704, 726-730, 750-754, 795-799, 861-875 — get_header, get_transaction, get_receipt, get_zenith_header, get_latest_block all spawn_blocking their inner call with no deadline and no timeout wrapper. The trait doc exempts point lookups, but MDBX point reads can still block on disk I/O for arbitrary durations (cold page fetch, adjacent writer doing a page split, etc.).

A stuck point lookup holds a spawn_blocking worker and a read_sem permit. Enough stuck lookups ⇒ read_sem exhausted, and the handle cannot fail fast because nothing resolves.

Fix: wrap each spawn_blocking in tokio::time::timeout(self.read_timeout, …). The blocking thread still runs to completion (unavoidable), but the handle releases its permit promptly and returns an error to the caller.

6. In-flight gauges leak on panic

crates/cold/src/handle.rs:154-168, 211-226, 550-560 — inc_in_flight runs before the spawned body, dec_in_flight runs at the end. If anything between them panics (f(inner).await, the backend call, record_op_duration, the cache .lock()), the decrement is skipped and cold.reads_in_flight / cold.writes_in_flight / cold.streams_active drifts up permanently.

Semaphore permits are Drop-based and do release correctly; only the gauges leak. These are the gauges the PR pitches Prometheus alerts against, so drift poisons the signal.

Fix: wrap the in-flight bookkeeping in a small Drop guard struct.

7. Write-timeout WARN fires on error paths

crates/cold-mdbx/src/backend.rs:878-940 — pattern:

let result = tokio::task::block_in_place(|| …);
let elapsed = start.elapsed();
if elapsed > threshold {
    tracing::warn!(…, "mdbx … exceeded advisory write timeout");
}
Ok(result?)

If the write fails (disk full, corruption) after > 2 s, the caller sees ColdStorageError::Backend(…) and a misleading advisory-write-timeout WARN. Noisy for alerting, since the summary pitches this WARN as an SLO signal.

Fix: only warn on Ok.

Notes — not bugs, but worth an eye before merge

tokio::sync::Mutex<ColdCache> (handle.rs:56) — the cache operations hold the lock only across a synchronous LRU op. A std::sync::Mutex / parking_lot::Mutex would avoid the async-runtime yield on every cache access. Minor perf.
Removed error variants (Timeout, Backpressure, Cancelled) collapse into Backend for MDBX timeouts — downstream callers that used to match on Timeout now have to string-match. Worth confirming nothing does.
MemColdBackend silently violates the new "Timeouts (mandatory)" contract. The trait doc says exempt backends "may document their exemption explicitly" — MemColdBackend's type doc does not.

prestwich · 2026-04-24T14:57:00Z

wrt #2, this is intentional. returning slightly stale data > blocking

- stream_logs resolves `to` (and get_latest_block fallback) before acquiring stream_sem, so a stuck backend no longer pins all 8 permits across setup I/O. - In-flight gauges are now maintained by an InFlightGuard RAII wrapper so the decrement survives a panic in the spawned body; previously a panic left cold.reads_in_flight / writes_in_flight / streams_active drifting up and poisoning the Prometheus alert signal. - Promote timeout to a first-class ColdStorageError::DeadlineExceeded variant. MDBX Timeout now routes through it (not Backend), and downstream callers can match without downcasting. Fixes stale Backpressure references in the cold and storage READMEs and the signet-storage skill doc. - ColdCache switches from tokio::sync::Mutex to parking_lot::Mutex. The cache only ever holds the lock across synchronous LRU ops, so the async mutex's yield-on-lock was pure overhead. - MemColdBackend now explicitly documents its exemption from the trait's mandatory-timeout contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Writes (append_block, append_blocks, truncate_above, drain_above) now use tokio::task::spawn_blocking instead of block_in_place. block_in_place panics on a current_thread runtime, so any consumer wiring MdbxColdBackend into a single-threaded Tokio would hit the first write and crash. Added writes_work_on_current_thread_runtime regression test. - Overrun WARN fires only on successful writes. A failed write that took > 2 s already surfaces Backend(...) to the caller; a spurious advisory-write-timeout WARN on the error path would poison any SLO alert built on that signal. - Iterator reads gained inner-loop deadline checks: per-receipt in get_logs_inner, per-event in collect_signet_events_in_range, per-receipt + per-log in produce_log_stream_blocking. A block with many matching logs (or a slow stream consumer) can no longer run unbounded past the configured deadline. - MdbxColdError::Timeout now maps to ColdStorageError::DeadlineExceeded (new variant) instead of Backend. Updated the existing timeout test to match on the variant directly. - Documented the point-lookup timeout exemption: MDBX page I/O on cold pages can stall arbitrarily, and the handle does not wrap point lookups in a tokio::time::timeout, so a stuck lookup ties up a spawn_blocking worker AND a read_sem permit. Callers that need fail-fast behavior should wrap at the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich · 2026-04-24T15:12:53Z

[Claude Code]

Thanks @Evalir — solid review. Pushed two commits addressing everything except #2 (already-considered; see below).

#	Finding	Action	Commit
1	`block_in_place` panics on `current_thread` runtime	Writes switched to `spawn_blocking`; added `writes_work_on_current_thread_runtime` regression test	`7c9d50a`
2	Cache-hit readers bypass drain barrier	Ignored. Identical to the finding I withdrew in the prior comment — cache values are block-number-keyed and populated only from successful backend reads, so a fast-path hit during a concurrent truncate linearizes before the truncate. The reader has no happens-before with the writer. Pre-invalidation would be cheap hardening but not a correctness fix.	—
3	`stream_sem` held across un-timeouted setup I/O	`stream_logs` now resolves `to` (incl. `get_latest_block` fallback) before acquiring the permit	`4f67ca7`
4	Iterator deadlines only checked between blocks	Added per-receipt check in `get_logs_inner`, per-event in `collect_signet_events_in_range`, per-receipt + per-log in `produce_log_stream_blocking`	`7c9d50a`
5	Point lookups have no timeout	Documented. Exemption is intentional (MDBX native latency profile), but the type-level docs + `with_read_timeout` docs now explicitly call out the `spawn_blocking` worker / `read_sem` permit implications so callers know to wrap at the call site when fail-fast matters.	`7c9d50a`
6	In-flight gauges leak on panic	Introduced `InFlightGuard` RAII wrapper; `spawn_read` / `spawn_write` / `stream_logs` all use it, so the decrement runs even on panic	`4f67ca7`
7	Advisory-write-timeout WARN fires on error paths	Overrun WARN now only fires on `Ok`; factored out to a small helper	`7c9d50a`

Note	Action	Commit
N1 — `tokio::sync::Mutex` for cache is overkill	Switched to `parking_lot::Mutex` — no await is ever held across the guard, so the async mutex was pure overhead	`4f67ca7`
N2 — Removed `Timeout`/`Backpressure`/`Cancelled` collapse into `Backend`	Promoted to a first-class `ColdStorageError::DeadlineExceeded(Duration)` variant. MDBX `Timeout` now routes through it. Audited downstream — no runtime code matched on the removed variants; fixed stale `Backpressure` references in the cold and storage READMEs and the signet-storage skill doc.	`4f67ca7`
N3 — `MemColdBackend` silently violates the "Timeouts (mandatory)" contract	Added an explicit `# Timeout exemption` section to the type docs, noting it's test-only and MUST NOT be used for SLO/prod	`4f67ca7`

Verified against the pre-push hook suite: cargo +nightly fmt --check, workspace clippy (--all-features and --no-default-features, -D warnings), RUSTDOCFLAGS="-D warnings" cargo doc, and the cold / cold-mdbx / storage test suites all green.

Fraser999 · 2026-04-28T10:16:21Z

        for result in iter {
+            // Per-receipt deadline check bounds iteration cost across
+            // blocks with many receipts.
+            if std::time::Instant::now() > deadline {


nit: std::time::Instant could be just Instant (same thing in a few places in this file).

Fraser999 · 2026-04-28T11:34:44Z

+        let threshold = self.write_timeout;
+        let this = self.clone();
+        let start = Instant::now();


[Claude Code]

Capturing start here means the SLO measurement only covers spawn_blocking queueing + the MDBX commit. It excludes the handle-side wait on write_sem and the drain barrier, which is precisely the failure shape that wedged production at #56 (slow readers blocking writes). A 3 s drain followed by a 100 ms commit measures 100 ms here, well under the 2 s threshold, and the WARN never fires.

Same applies to append_blocks, truncate_above, and drain_above.

Suggest hoisting the SLO timing into ColdStorage::spawn_write so it captures start before write_sem.acquire_owned, then reading the configured threshold via a new ColdStorageBackend::write_timeout(&self) -> Option<Duration> accessor (defaulting to None so MemColdBackend opts out cleanly). The backend method then becomes a thin spawn_blocking wrapper with no timing logic, and warn_on_overrun can be deleted.

Trade-off worth calling out in the WARN message or commit notes: an e2e measurement will surface reader-side contention as write SLO violations. That is arguably exactly the signal you want given #56, but it is a deliberate change in what the alert means.

Fraser999 · 2026-04-28T11:42:47Z

+    assert!(
+        msg.contains("canceling") || msg.contains("57014") || msg.contains("timeout"),
+        "expected statement_timeout error, got: {err}"
+    );


[Claude Code]

String-matching here hides that PG statement_timeout (SQLSTATE 57014) reaches the caller as Backend(...), not DeadlineExceeded. MDBX timeouts route to DeadlineExceeded (crates/cold-mdbx/src/error.rs:33); PG doesn't, because From<SqlColdError> for ColdStorageError (crates/cold-sql/src/error.rs:15-19) blindly wraps every sqlx error as Backend. Result: callers can't match on timeout without string-matching, and cold.op_errors_total{error=backend} collapses "query too slow" with "backend down".

Suggest: add SqlColdError::Timeout(Duration), detect 57014 at the SQL boundary (carrying the configured read_timeout/write_timeout), map to DeadlineExceeded(d). Then this assertion becomes assert!(matches!(err, SqlColdError::Timeout(_))) and a future refactor that drops timeout detection actually fails the test.

Fraser999 · 2026-04-28T11:45:39Z

+    ) -> Result<sqlx::Transaction<'a, sqlx::Any>, SqlColdError> {
+        if self.is_postgres {
+            let ms = i64::try_from(d.as_millis()).unwrap_or(i64::MAX);
+            sqlx::query(&format!("SET LOCAL statement_timeout = {ms}")).execute(&mut *tx).await?;


[Claude Code]

In PG, SET LOCAL statement_timeout = 0 means "no timeout". The builders with_read_timeout / with_write_timeout (line 154, 165 here, plus the SqlConnector mirrors) accept any Duration including Duration::ZERO, so a caller passing zero (or computing one from a config that defaults to zero) silently disables the trait-level "Timeouts (mandatory)" contract at crates/cold/src/traits.rs:267-296.

Suggest either rejecting zero / sub-millisecond values at the builder (runtime check on first use, or non-const builder returning Result), or documenting the disable behaviour explicitly on with_read_timeout / with_write_timeout. Either way the trait-level "MUST enforce" wording should be reconciled with what the SQL backend actually allows.

Fraser999 · 2026-04-28T11:51:26Z

+                if Instant::now() > deadline {
+                    return Err(MdbxColdError::Timeout(read_timeout));
+                }


[Claude Code]

This per-receipt check doesn't bound the inner per-log loop. A single receipt with thousands of matching logs (large mints, batched events) iterates here unchecked. The streaming path (produce_log_stream_blocking:159-194) checks per-log; the non-streaming path inconsistently doesn't.

Fraser999 · 2026-04-28T11:54:48Z

    ///
    /// - [`Hot`]: Hot storage read or unwind failed.
-    /// - [`Cold`]: Hot storage unwound but cold truncate dispatch failed.
+    /// - [`Cold`]: Hot storage unwound but cold drain failed.


[Claude Code]

This claim is wrong as written. The implementation at line 212 below uses self.cold.drain_above(block).await.unwrap_or_default(), which silently turns every cold error (TaskTerminated, Backend, DeadlineExceeded, etc.) into Vec::new(). StorageError::Cold is never returned from this method.

Pre-existing behaviour, but the doc rewrite in this PR added the claim, so the gap between docs and impl is new. Suggest either:

Restore real error propagation (return Cold on at least non-TaskTerminated failures).

Keep the silent-swallow behaviour and update the doc + the "best-effort - failure = normal cold lag" comment to admit it. Currently that comment conflates "cold returned an error" with "cold has no data above" - two different states that operators need to distinguish.

Fraser999 · 2026-04-28T11:58:09Z

+                .in_current_span(),
+            )
+            .await
+            .map_err(|_| ColdStorageError::TaskTerminated)?


[Claude Code]

This JoinError only fires on panic in normal operation - tracked tasks aren't dropped by the runtime, so the only way to get one here is the spawned body panicking inside f(inner).await (i.e. backend code). Mapping it to TaskTerminated tells the caller "shutdown", which on-call will read as "graceful exit", not
"backend panicked". The original error is also swallowed silently - no log of the panic message or location.

Same pattern at line 231 (spawn_write).

Suggest at minimum logging the JoinError at ERROR before mapping. Optionally introduce a ColdStorageError::BackendPanic variant so panics are distinguishable from semaphore-close shutdown - the two failure modes have very different ops responses (panic = bug to chase, TaskTerminated = expected shutdown signal).

Fraser999 · 2026-04-28T12:08:07Z

+        // in-flight write.
+        let to = match filter.get_to_block() {
+            Some(to) => to,
+            None => match self.inner.backend.get_latest_block().await? {


[Claude Code]

This setup get_latest_block runs before any permit acquisition (deliberate, per the comment above) AND without caller-side timeout protection. With MDBX, point lookups have no in-body deadline so it can stall on disk I/O indefinitely. With PG, pool.begin() inside begin_read can park for acquire_timeout (default 30 s) on a saturated pool. Either way, N concurrent stream_logs callers do uncapped backend reads concurrently with no concurrency cap.

tests/stream_isolation.rs:36-43 uses a bounded to_block, so this code path isn't regression-tested. With a GatedBackend that gates point lookups, you could pin a setup-stall scenario.

Suggest one of:

Wrap this call in tokio::time::timeout(self.inner.setup_timeout, ...) so it can fail fast (with the standard caveat that the future drops but backend work continues - the same trade-off the rest of the design accepts).

Move stream_sem.acquire_owned() to before this call. That re-introduces the original concern (a stuck setup pins stream permits), but bounds the call by stream_sem rather than leaving it completely uncapped.

Fraser999 · 2026-04-28T12:09:49Z

[Claude Code]

Stale doc: the signatures below all take &self, and the backend is no longer "exclusively owned by the task runner" - the handle wraps it in Arc<Inner> and shares it across spawned tasks.

The PR-#58 doc rewrite advertised a `Cold` error path, but the impl collapses every cold error into `Vec::new()`. Update the doc + comment to admit silent-swallow behaviour. ENG-2210 tracks the propagation decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SQLSTATE 57014 (query_canceled, emitted on `statement_timeout` expiry) was wrapped as `Sqlx(...)` and surfaced to the handle as `Backend(...)`, breaking symmetry with the MDBX backend (which routes its `Timeout` to `ColdStorageError::DeadlineExceeded`). The metric `cold.op_errors_total{error="backend"}` therefore conflated "query too slow" with "backend down". `From<sqlx::Error> for SqlColdError` now detects 57014 and produces a dedicated `Timeout` variant; `From<SqlColdError> for ColdStorageError` maps it to `DeadlineExceeded`. The configured deadline is not threaded to this conversion boundary so the surfaced duration is `ZERO`; threading the real value is a separate refactor (left for a follow-up once the call sites are confirmed to need it). The `pg_statement_timeout` test is rewritten to match on the typed variant rather than a substring of the error message — a future refactor that drops 57014 detection now fails the test instead of silently passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two accessors to the `ColdStorageBackend` trait: fn read_timeout(&self) -> Option<Duration> { None } fn write_timeout(&self) -> Option<Duration> { None } Wired through `MdbxColdBackend`, `SqlColdBackend`, and `EitherCold`. `MemColdBackend` returns `None` (already-documented test exemption). Two behaviour changes use these: 1. The advisory write-SLO WARN moves from the MDBX backend (`warn_on_overrun` per-method) to `ColdStorage::spawn_write`. Timing is now captured before `write_sem` acquisition, so the elapsed value covers the queue wait, the read drain, and the commit end-to-end. The failure shape that wedged production at #56 — slow readers gating writes — now surfaces as a write-SLO violation rather than as a sub-threshold backend timing. 2. `stream_logs`'s setup `get_latest_block` is wrapped in `tokio::time::timeout(backend.read_timeout(), ...)`. Without this, a stuck point lookup (cold MDBX page) or a saturated PG pool parking on `acquire_timeout` could pin N concurrent setup callers indefinitely with no permit cap. The setup read still bypasses `read_sem` and the drain barrier by design. Also drops the now-unused `tracing` dep from `signet-cold-mdbx` and updates the type docs to point at the handle's new WARN path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Builders for `read_timeout` / `write_timeout` on both the MDBX and SQL backends and connectors now panic on zero. Postgres treats `statement_timeout = 0` as "no timeout", so a caller passing `Duration::ZERO` (or computing one from a config that defaults to zero) would silently disable the trait-level mandatory-timeout contract. MDBX accepts the same assert for symmetry — zero there is a useless config rather than a silent disable, but the trait says non-zero and the assert keeps the surface honest. `spawn_read` / `spawn_write` now log spawned-task `JoinError`s before mapping to `TaskTerminated`. A backend panic was previously indistinguishable from graceful shutdown for the on-call: panics fire ERROR with the panic message, cancellations fire DEBUG. The error variant still collapses to `TaskTerminated` per design. `MdbxColdBackend::get_logs_inner` now checks the deadline inside the inner per-log loop, mirroring the streaming path. Previously a single receipt with thousands of matching logs would iterate unchecked past the configured `read_timeout`. The two `std::time::Instant::now()` sites in `produce_log_stream_blocking` are also folded into the already-imported `Instant`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich · 2026-04-30T17:22:27Z

[Claude Code]

Thanks @Fraser999 — pushed four commits addressing all nine inline comments. Summary:

#	Finding	Action	Commit
1	`std::time::Instant::now()` where `Instant` is already in scope	Folded both sites in `produce_log_stream_blocking`	`e6d44e4`
2	Write SLO `start` captured inside `spawn_blocking`, missing `write_sem` + drain wait	Hoisted timing into `ColdStorage::spawn_write`; added `ColdStorageBackend::write_timeout()` accessor; deleted `MdbxColdBackend::warn_on_overrun`. WARN now measures end-to-end (queue + drain + commit), so reader contention surfaces as a write-SLO violation — explicit signal for the #56 failure shape.	`f5fdaae`
3	PG SQLSTATE 57014 reaches caller as `Backend(...)`, not `DeadlineExceeded`; test was string-matching	Added `SqlColdError::Timeout`; `From<sqlx::Error>` now detects 57014; `From<SqlColdError> for ColdStorageError` maps to `DeadlineExceeded`. Test now matches on the typed variant. The configured deadline is not threaded to this conversion boundary so the surfaced `Duration` is `ZERO` (variant correctness > duration precision; threading the real value is a separate refactor).	`d3e3210`
4	`with_read_timeout(Duration::ZERO)` silently disables PG `statement_timeout`	Builders are now non-`const` and `assert!` non-zero on both MDBX and SQL backends + connectors. Failure is loud at boot rather than silent at runtime.	`e6d44e4`
5	`get_logs_inner` per-receipt deadline check doesn't bound the inner per-log loop	Added per-log `Instant::now() > deadline` check inside the receipt loop, mirroring the streaming path	`e6d44e4`
6	`unified::drain_above` doc claims `Cold` error path; impl is `unwrap_or_default`	Doc + comment rewritten to admit silent-swallow behaviour. Propagation/deletion decision tracked in ENG-2210 under ENG-2198.	`0b965bf`
7	`JoinError` mapped to `TaskTerminated` swallows backend panics	`spawn_read` / `spawn_write` now log `JoinError` before mapping: panics fire ERROR with the panic message, cancellations fire DEBUG. Variant still collapses to `TaskTerminated` per design.	`e6d44e4`
8	`stream_logs` setup `get_latest_block` runs without permit cap or timeout	Wrapped in `tokio::time::timeout(backend.read_timeout(), ...)` with a 500 ms fallback for backends that don't advertise a timeout. Standard caveat applies (future drops on timeout, backend work continues).	`f5fdaae`
9	Stale trait doc on `ColdStorageWrite` ("exclusively owned by the task runner")	Rewritten to describe `&self` shared-via-`Arc` ownership	`f5fdaae`

Pre-push hook clean: cargo +nightly fmt --check, workspace clippy --all-features and --no-default-features (-D warnings), RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps, and the cold / cold-mdbx / cold-sql / storage test suites all green.

Pins the new setup-timeout behaviour against regression. The test parks `GatedBackend::get_latest_block` indefinitely and asserts that `stream_logs` (with no `to_block` on the filter, forcing the "resolve to=latest" path) returns `DeadlineExceeded` within the configured 50 ms `read_timeout` rather than hanging. Adds `GatedBackend::with_read_timeout` so tests can advertise a custom read timeout to the handle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fraser999

LGTM now. Would be nice to ticket threading the timeout value through to the actual error, since hard-coding ZERO isn't ideal, but this would be an easy standalone follow-up :)

prestwich and others added 10 commits April 21, 2026 12:33

refactor(cold): ColdStorageWrite takes &self; all backends updated in…

b539c8f

… lockstep

refactor(cold): unify handle around Arc<Inner>; remove channels and d…

31e1c13

…ispatcher

fix(cold): stream permit acquired in handle; streams do not hold a re…

b0d3063

…ad permit

feat(cold): drain barrier moves to handle write path

9881f03

feat(cold): shutdown coordinator closes semaphores on cancel

9426d88

refactor(cold-mdbx): spawn_blocking reads, block_in_place writes, in-…

902aab6

…body iterator deadline Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(cold-sql): mandatory statement_timeout; read_timeout and write_t…

727c054

…imeout builders

docs(cold): trait impl guide documents mandatory timeouts

3339c87

test(cold): concurrency suite covers new architecture

cb06741

prestwich mentioned this pull request Apr 21, 2026

refactor(cold): permit-attached reads, dispatcher/writer split, 5s operation deadline #57

Closed

7 tasks

prestwich and others added 2 commits April 22, 2026 09:07

Evalir requested review from Evalir, Fraser999 and rswanson April 24, 2026 09:31

Evalir reviewed Apr 24, 2026

View reviewed changes

prestwich and others added 2 commits April 24, 2026 11:11

Evalir self-requested a review April 27, 2026 08:47

Fraser999 reviewed Apr 28, 2026

View reviewed changes

prestwich and others added 4 commits April 30, 2026 13:07

Fraser999 approved these changes May 4, 2026

View reviewed changes

prestwich merged commit 783cdb2 into main May 4, 2026
7 checks passed

prestwich deleted the prestwich/cold-unified-architecture branch May 4, 2026 15:58

Conversation

prestwich commented Apr 21, 2026

Summary

Test plan

Uh oh!

prestwich commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

Evalir left a comment

Choose a reason for hiding this comment

1. block_in_place in MDBX writes panics on a current_thread runtime

2. Cache-hit readers bypass the drain barrier → return just-truncated blocks mid-write

3. stream_logs setup holds stream_sem across an un-timeouted backend call

4. MDBX iterator timeouts are only checked between blocks, never within

5. MDBX point lookups have no timeout enforcement at all

6. In-flight gauges leak on panic

7. Write-timeout WARN fires on error paths

Notes — not bugs, but worth an eye before merge

Uh oh!

prestwich commented Apr 24, 2026

Uh oh!

prestwich commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prestwich commented Apr 30, 2026

Uh oh!

Fraser999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prestwich commented Apr 21, 2026 •

edited

Loading

1. `block_in_place` in MDBX writes panics on a `current_thread` runtime

3. `stream_logs` setup holds `stream_sem` across an un-timeouted backend call