Skip to content

feat: build parent message chain in References header with cycle protection#277

Open
davidehu-69 wants to merge 2 commits into
sashiko-dev:mainfrom
davidehu-69:pr/references-header
Open

feat: build parent message chain in References header with cycle protection#277
davidehu-69 wants to merge 2 commits into
sashiko-dev:mainfrom
davidehu-69:pr/references-header

Conversation

@davidehu-69

Copy link
Copy Markdown

Outgoing Sashiko review replies were only containing the immediate parent Message-ID in the References and In-Reply-To headers. For deep email threads, this broke thread threading in several mail user agents (MUAs), causing review messages to appear outside the main thread or fragmented.

Solution

This PR implements full parent message chain retrieval to populate the References header, restoring RFC 5322 compliant threading while adding safety limits to prevent database traversal loops.

  • Database Lookup (src/db.rs):
    • Implemented get_message_references_chain to recursively walk up the parent in_reply_to chain.
    • Added cycle detection (via a visited set) and a max depth cutoff of 100 to prevent infinite loops from malformed or cyclic mail headers.
  • Review Handler (src/reviewer.rs):
    • Retrieved the parent references chain and formatted it as a space-separated string of bracketless Message-IDs, passing the immediate parent as in_reply_to.
  • Mail Worker (src/worker/email.rs):
    • Split the outbox references string, wrapped each Message-ID in angle brackets (<>), and passed the sequence to the Lettre email builder.
  • Tests (src/db.rs):
    • Added regression and unit tests covering standard chain building, multi-message cyclic loop prevention, and self-referencing loop prevention.

References

Alternative Designs & Future Optimizations (Not Implemented)

To optimize performance, we evaluated two alternative architectures but decided against them for the following reasons:

  1. Recursive SQL Common Table Expressions (CTEs):
    • Concept: Retrieve the entire chain in a single database roundtrip using a recursive SQLite query.
    • Trade-off: Offloading cycle detection to a recursive CTE increases SQL complexity significantly. More importantly, recursive CTE syntax and cycle features can vary or fail on older/varying SQLite/libsql versions in the deployment environment. We opted for explicit Rust traversal to maximize portability and readability.
  2. Pre-computed Materialized Paths (Store References at Ingestion):
    • Concept: Calculate and store the full references chain directly in a column on the messages table at ingestion time, reducing read-time lookup to $O(1)$.
    • Trade-off: If emails are ingested out-of-order (common in distributed mailing list fetching), backfilling the path links would require complex write-time stitching logic. Given that thread depths should be typically small ($&lt;20$ messages) and SQLite queries are extremely fast, the runtime overhead of traversing $N$ indexed queries in Rust is negligible compared to the added write-time complexity.

@TheSven73

Copy link
Copy Markdown
Contributor

Davide, thanks a lot for your contribution !

Have you considered simpler solutions? For example - when a email client replies, it takes the References: header from the parent email, copies it exactly, and appends the parent's Message-ID to the very end of the list. That ensures RFC 5322 compliance.

In sashiko this could mean:

  • extend the database schema to store references in the message table, and
  • when inserting into the email_outbox table, set references to parent_message_references + msg_id (instead of simply msg_id as happens today)

Perhaps it would be useful to compare tradeoffs between solutions?

@davidehu-69 davidehu-69 force-pushed the pr/references-header branch 2 times, most recently from 50113d2 to f426670 Compare June 15, 2026 16:19
@davidehu-69

Copy link
Copy Markdown
Author

Hi Sven,

Thanks for the feedback! I was initially hesitant to introduce a database schema change, but your suggestion is indeed much cleaner, faster, and more robust.

I have updated the PR to adopt this approach:

  • Schema Migration: Added a references_hdr column to the messages table. Sashiko should handle the migration on startup.
  • O(1) Resolution: Replies now read the parent's references_hdr directly and append the parent's Message-ID. This runs in $O (1)$ query time and completely eliminates the recursive traversal/cycle-protection code, making the logic much simpler and cleaner.
  • Safety & Fallback: Added unit tests covering the NULL/None record path to ensure the migration is safe for historical messages (which fallback to parent-only references, matching the old behavior).

Let me know what you think.

@TheSven73

Copy link
Copy Markdown
Contributor

Thanks ! I believe the new, simplified commit has issues with handling of angle brackets, but I suggest we wait until @rgushchin has had a chance to weigh in.

@rgushchin

Copy link
Copy Markdown
Member

Thank you! I like the new simpler design better. The commit message lacks the description and the SOB though and also the linter is not entirely happy. Please, make sure to include the description on how the change was tested (including the db schema migration). Thanks!

Comment thread src/reviewer.rs
let findings_count = findings.map(|f| f.len()).unwrap_or(0);

let msg_id = patch_message_id;
let msg_id_clean = msg_id.trim_matches(|c| c == '<' || c == '>');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this commit had angle bracket issues, because I got confused by msg_id / msg_id_clean .

msg_id is passed directly to get_message_details_by_msgid which leads to a db query, suggesting that the id is already clean. But then further down the function, it's explicitly cleaned.

This is not a problem of this PR, so let's disregard.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking! Yes, the raw vs. clean Message-ID handling is a bit fragmented across the code. I will create an issue to track this and fix it in later PR. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very minor, and entirely up to @rgushchin to decide if it's even worth creating an issue for. I suspect there are much more urgent and important things to address.

Comment thread src/worker/email.rs Outdated
.map(|part| format!("<{}>", part))
.collect();
let refs_str = refs.join(" ");
builder = builder.header(lettre::message::header::References::from(refs_str));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could be

builder = builder.references(refs_str);

?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the lettre docs: MessageBuilder has built-in helpers for to, from, and subject, but it doesn't expose a .references() method directly. It has to be set as a typed header via .header(lettre::message::header::References::new(...)) or .header(...) as done here, otherwise it won't compile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using references() compiles for me, and is also present in the lettre docs . Not sure what is different for you?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies. You are right. I think I checked something else. I will fix it.
Thanks for also pointing directly to correct docs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/schema.sql
cc_recipients TEXT,
git_blob_hash TEXT,
mailing_list TEXT,
references_hdr TEXT,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why references_hdr and not references ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, REFERENCES (case-insensitive) is a reserved keyword in SQL (used for foreign key constraints). To avoid syntax conflicts in the database queries, I had to find a different name; references_hdr (or alternatively refs) seemed like the clearest way to represent it. Let me know if you had a better name in mind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use "references" which escapes the reserved keyword issue. But this is only a nit, it has little importance, follow your own preference.

@davidehu-69 davidehu-69 force-pushed the pr/references-header branch from f426670 to 079dc67 Compare June 16, 2026 10:49
@davidehu-69

Copy link
Copy Markdown
Author

Hi Sven and Roman,

Thanks again for checking this PR and helping me out with my first contribution on Sashiko.

I have updated the PR branch:

  • Squashed the cargo formatting changes directly into the main commit.
  • Addressed the linter warnings in src/worker/tools.rs by adding #[allow(clippy::collapsible_match)] to the helper function in a separate, second commit. This silences the warnings while preserving consistent block statement style across all match arms. This linter warning does not come from my changes, so I kept it as a separate commit.
  • Concerning testing, migration happens on startup and I introduced a test for backward compatibility.
  • Added commit description and SOB lines to both commits.
  • Ran the PR checks locally via make check-pr, and all tests, formatting, clippy, and DCO checks are now passing.

Outgoing Sashiko review replies were only containing the immediate parent
Message-ID in the References and In-Reply-To headers. For deep email
threads, this broke threading in several mail user agents (MUAs), causing
review messages to appear fragmented or outside the main thread.

This change implements an O(1) references lookup header by adding a
`references_hdr` column to the `messages` table. When creating a reply,
the parent message's references are retrieved and its Message-ID is
appended, avoiding recursive database queries or in-memory graph traversal.

Testing:
- Verified DB schema migration on startup.
- Added unit tests in `src/db.rs` to verify that standard references chains
  are built correctly.
- Added a fallback unit test verifying that historical messages (where
  `references_hdr` is NULL/None) resolve safely to parent-only references,
  matching the original behavior.
- Validated with manual runs verifying the generated SMTP outbox headers
  contain the correct space-separated parent references sequence.

Signed-off-by: Davide Hu <davidehu@google.com>
…tch arm consistency

Clippy warns about collapsible matches in single-if match arms inside
normalize_tool_args. However, other arms in the same match block (such as
git_show and git_grep) require multiple separate if statements and cannot
be collapsed.

Rewriting the matched patterns with guard clauses to satisfy clippy was
evaluated but avoided, as it would render the code less consistent and less
readable. Adding the #[allow(clippy::collapsible_match)] attribute preserves
stylistic consistency across all match arms.

Signed-off-by: Davide Hu <davidehu@google.com>
@davidehu-69 davidehu-69 force-pushed the pr/references-header branch from 079dc67 to e7fb09f Compare June 16, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants