refactor: improve the deadline module#423
Conversation
15a9863 to
02b53d2
Compare
|
|
||
| /// Wall-clock start of the given slot: `genesis_time + slot * | ||
| /// slot_duration`. | ||
| fn slot_start(&self, slot: SlotNumber) -> Result<DateTime<Utc>> { |
There was a problem hiding this comment.
would be good to have a unit test for it
|
|
||
| /// Network-delay margin added to every deadline: `slot_duration / | ||
| /// MARGIN_FACTOR`. | ||
| fn margin(&self) -> Result<Millis> { |
8b1312b to
82815f3
Compare
| label = %self.label, | ||
| duty = %curr_duty, | ||
| "Deadliner output channel full" | ||
| ); |
There was a problem hiding this comment.
Maybe return some small sleep time here to retry in a while?
There was a problem hiding this comment.
We left it like this on purpose to match the reference behavior, but I agree that can be the better choice here.
|
@claude review the changes |
|
Claude finished @therustmonk's task in 3m 32s —— View job Code Review: PR #423 — refactor: improve the deadline module
SummaryNice refactor overall. Splitting the 743-line A few correctness and design points below. Correctness1. Sentinel duty can leak to the output channel after ~10 years of idle — When fn handle_expired(&mut self) -> Option<Duration> {
if self.duties.is_empty() {
// Sentinel timer fired with no real duty; just reschedule.
return Some(self.remaining_duration());
}
// ...rest unchanged
}2.
3. mskrzypkows's comment on The reviewer suggests returning a small sleep time to retry instead of dropping the duty. The current behavior — log + drop + advance — matches 4. Not a bug, just wasteful: we already know the new duty has the smallest deadline, so an O(n) scan re-confirms what we already know. Could be: self.duties.insert(duty.clone());
self.curr_duty = duty;
self.curr_deadline = deadline;
Some(self.remaining_duration())Skip if you want to keep the code path uniform — the HashSet is small in practice. Design / API5. The thing that
The doc comment on 6.
7. Module-level doc and struct-level doc say the same thing nearly verbatim. Drop one. Tests8. Real-time tests with sub-second deadlines are notorious on loaded CI. The 5s timeout for ~1s expirations is generous, but consider 9. The test does 10. Voluntary-exits test path is misleading — The Other
VerdictSolid refactor, well-tested, faithful to Charon. The points above are mostly polish; the only real correctness item is #1 (sentinel-duty leak after 10y idle) and the #2 buffer-size deviation from Charon, which the golden rule on functional equivalence asks us to justify or revert. |
| use super::{Result, millis::Millis, to_chrono_duration}; | ||
|
|
||
| /// Fraction of slot duration to use as a margin for network delays. | ||
| const MARGIN_FACTOR: i32 = 12; |
There was a problem hiding this comment.
This can be i64 from the begin, no casting later
|
|
||
| /// Duty-type-specific offset from slot start. | ||
| fn duty_duration(&self, duty_type: &DutyType) -> Result<Millis> { | ||
| let secs = Millis::from(self.slot_duration); |
There was a problem hiding this comment.
msecs is better name i think, it's Millis
|
@therustmonk Could you check #430? It makes use of the deadliner and shows some challenges with the existing API that could be taken care of in this PR. |
emlautarom1
left a comment
There was a problem hiding this comment.
This refactor looks promising but it does not improve usages in a meaningful way.
I've left a comment on how I would address it based on the needs of modules like aggsigdb (#430), but I suggest to also explore how we can make existing modules better to use (ex. parsigdb).
Lastly, regarding the internals, some time ago I explored a different approach that uses tokio's scheduling for most of the work: https://github.com/NethermindEth/pluto/tree/emlautarom1/deadline_alt . Maybe something is worth salvaging from that experiment, but the main goal should now be to make it easier for callers to use this module.
| pub trait Deadliner: Send + Sync { | ||
| /// Adds a duty for deadline scheduling. | ||
| /// | ||
| /// Returns `true` if the duty was added for future deadline scheduling. | ||
| /// This method is idempotent and returns `true` if the duty was previously | ||
| /// added and still awaits deadline scheduling. | ||
| /// | ||
| /// Returns `false` if: | ||
| /// - The duty has already expired and cannot be scheduled | ||
| /// - The calculator reports the duty has no deadline (`Ok(None)`) | ||
| /// - The calculator failed to compute the deadline (`Err(_)`) | ||
| async fn add(&self, duty: Duty) -> bool; | ||
|
|
||
| /// Returns the channel for receiving deadlined duties. | ||
| /// | ||
| /// This method may only be called once and returns `None` on subsequent | ||
| /// calls. The returned channel should only be used by a single task. | ||
| fn c(&self) -> Option<mpsc::Receiver<Duty>>; | ||
| } |
There was a problem hiding this comment.
The Deadliner interface is hard to use as is:
- It's not intended to be cloned, so
Arc<dyn Deadliner>is unnecessary, that is, aDeadlinerinstance is never shared by multiple "services" (ex.aggsigdb), instead it's owned. - In a service, there might be multiple places where
Dutyare added - The channel
cis used only once by a service and it's always expected to beSome.
If you squint a bit you can see how a Deadliner is a mpsc channel, where there are multiple places where a Duty can be registered, while there is a single place where the deadlined duties are read. Also, consider that a Deadliner behaves like an Actor: https://rust-lang-nursery.github.io/rust-cookbook/concurrency/actor.html
I roughly suggest:
- Make the construction of a deadliner return
(handle, c), wherecis the currentDeadliner.c(), a background task is listening for messages, andhandleallows submittingDutyto the background task (throughmpscchannel, thus we support multiple handles by cloning) - The parameters to the constructor remain the same (ct, label and calculator), they're good.
- Submitting through a handle waits for a response as we're doing now (through
oneshot), but the response is more explicit, separating the "successfully scheduled, "already expired", "no deadline", "failed to compute deadline". - The background task emits values on
cwhen the deadline is reached.
There was a problem hiding this comment.
@emlautarom1 Yes, I thought about what we can do here, and overall I came to similar conclusions, fully agree with you.
The only thing is the (handle, c) pair, I think it's a good design. However, in that case we would also need to pass c around, OR switch to a stricter c() that returns Result, but an explicit c looks better to me.
Here's what I was thinking. Maybe we should merge the refactoring first, since it's the most compatible with the current changes, but already gives us flexibility for future improvements.
Then I can make a follow-up PR for the (handle, c) pair, since it looks like everyone will need to rebase for that change anyway, and it will be easier if the PR is small.
So basically:
- Smoothly migrate to this PR, since it's well compatible with the current implementation.
- Add a series of PRs (or one PR) that introduce more radical changes.
What do you think? Or do you still think it's better to include the API changes here as well?
There was a problem hiding this comment.
Passing the explicit c around is better than the c(), too many issues on callers otherwise.
No issues on having it as a stacked PRs (as long as we actually look into it and does not get forgotten 🫤)
There was a problem hiding this comment.
Okay, I’ll open a separate (follow-up) PR based on this one, but with the updates, so you can use them and the team can rebase their branches on top of the changes from this PR.
There was a problem hiding this comment.
The next PR with API changes (in progress): #434
| pub trait DeadlineCalculator: Send + Sync + 'static { | ||
| /// Computes the deadline for the given duty. See trait docs for return | ||
| /// semantics. | ||
| fn deadline(&self, duty: &Duty) -> Result<Option<DateTime<Utc>>>; | ||
| } |
There was a problem hiding this comment.
Looks like this interface is the important one for testing. I would explore if we can make Deadliner a concrete type and control the logic through the calculator instead. If we can do that we can simplify quite the usages (no need to Box<dyn Deadliner>, etc.)
e5108e6 to
69cd6e3
Compare
emlautarom1
left a comment
There was a problem hiding this comment.
Internal changes LGTM, waiting for a follow-up PR to address this module's API.
|
@emlautarom1 The next PR with API changes: #434 |
No description provided.