Skip to content

Specialize SQLGraph.from_other for SQLite-to-SQLite (closes #285)#286

Merged
JoOkuma merged 5 commits into
royerlab:mainfrom
JoOkuma:fix/sql-graph-from-other-285
May 18, 2026
Merged

Specialize SQLGraph.from_other for SQLite-to-SQLite (closes #285)#286
JoOkuma merged 5 commits into
royerlab:mainfrom
JoOkuma:fix/sql-graph-from-other-285

Conversation

@JoOkuma
Copy link
Copy Markdown
Member

@JoOkuma JoOkuma commented May 1, 2026

Closes #285.

BaseGraph.from_other ends with other.overlaps(), which on a SQL-rooted GraphView issues a single IN (...) containing every selected node id and overflows SQLite's variable limit.

This PR adds a SQL-level fast path on SQLGraph.from_other: when both ends are on-disk SQLite, the source's schema and rows are dumped straight into the destination via ATTACH DATABASE + replayed CREATE TABLE/INDEX DDL + INSERT INTO ... SELECT, then the destination is opened from the populated file. Filtered selections (GraphView) are joined against a temp table to sidestep the bound-parameter limit. Other configurations fall back to the generic path.

Separately, SQLGraph.overlaps(node_ids) is now chunked via the existing _chunked_sa_read helper so direct callers don't hit the same limit.

Tests cover the fast path, the issue reproducer (filter → subgraph → from_other with > 999 nodes + overlaps), full-copy id preservation, and fallbacks for :memory: and non-SQL sources.

JoOkuma added 3 commits May 1, 2026 09:59
)

The generic BaseGraph.from_other materializes the source graph in
Python and then calls other.overlaps(), which on a SQLGraph-backed
GraphView issues a single 'IN (...)' clause containing every selected
node id. With non-trivial selections this hits SQLite's
SQLITE_MAX_VARIABLE_NUMBER and raises 'too many SQL variables'.

Per @JoOkuma's suggestion in royerlab#285, add a specialized fast path that
bypasses the BaseGraph copy entirely when both sides are SQLite:

* Open the destination SQLGraph so it materializes its schema.
* Mirror any extra columns and metadata.
* Dispose the dst engine, ATTACH the dst database to the source's
  connection, and dump each table via INSERT INTO ... SELECT FROM.
* For GraphView sources, materialize the selected node ids in a temp
  table (chunked inserts) and JOIN against it instead of using a
  giant IN (...) clause.
* Re-open the dst engine and refresh derived caches.

For correctness in other code paths (summary, NN solver, direct
callers), also chunk SQLGraph.overlaps(node_ids) so it never
overflows the bound-parameter limit.

New tests cover:
* The ATTACH fast path is taken (BaseGraph.from_other not called).
* The exact issue royerlab#285 scenario: filter -> subgraph -> from_other
  with a selection > 999 nodes plus overlaps.
* Node ids are preserved on full copies.
* Fallback to the generic path for ':memory:' destinations and for
  non-SQL sources.
* SQLGraph.overlaps with > 999 node ids returns the right pairs.
Instead of instantiating the destination upfront and ALTER-ing it into
shape, replay the source's own CREATE TABLE/INDEX DDL from sqlite_master
against the attached destination, copy rows (including the Metadata
table verbatim), then open the destination via cls(**kwargs). The
constructor's normal reflection path rebuilds the in-memory state, so
the manual engine dispose/reopen, cache invalidation, and
_update_max_id_per_time call are no longer needed.

Also tighten the eligibility gate in from_other: the destination file
must start empty (DDL replay would clash with an existing schema), and
overwrite=True is excluded (the dst constructor would drop the tables
we just populated).
Replace the hand-rolled chunk loop with the existing _chunked_sa_read
helper. Only the source side is constrained per chunk via IN(...); the
target side is filtered in Polars afterwards, which avoids the
quadratic bound-parameter blow-up of having two IN clauses per chunk
and lets us use the full _sql_chunk_size() budget.

Also drop the seen-set deduplication, which was a behaviour change vs.
the pre-fix code (the original query.all() preserved duplicates if any
existed in the Overlap table).
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 1, 2026

Codecov Report

❌ Patch coverage is 90.16393% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.75%. Comparing base (6227efc) to head (9113e3d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/tracksdata/graph/_sql_graph.py 90.16% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
+ Coverage   87.70%   87.75%   +0.04%     
==========================================
  Files          57       57              
  Lines        4879     4947      +68     
  Branches      858      872      +14     
==========================================
+ Hits         4279     4341      +62     
- Misses        378      381       +3     
- Partials      222      225       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@yfukai yfukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the late response. I only have minor comments, and this looks good to me!

dst._engine.dispose()


def test_sql_from_other_full_copy_preserves_node_ids(tmp_path: Path) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but this looks like it has already been tested in _assert_graphs_equivalent in test_sql_from_other_uses_attach_dump_for_sqlite?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that the structural bits overlapped. I folded the node-id check into _assert_graphs_equivalent (it holds in every case since both backends support custom indices) and dropped the standalone test. Thanks!

dst._engine.dispose()


def test_sql_from_other_subgraph_view_with_overlaps(tmp_path: Path) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to explicitly test this scenario for all combinations of SQL(disk), SQL(memory), and RustworkX.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — parametrized over SQL(disk), SQL(memory), and RustWorkX as the source backend (destination stays as on-disk SQLGraph since that's what SQLGraph.from_other produces). Good suggestion, thanks!

…bgraph test

- Move node_ids preservation assertion into _assert_graphs_equivalent and
  drop the now-redundant test_sql_from_other_full_copy_preserves_node_ids.
- Parametrize test_sql_from_other_subgraph_view_with_overlaps over
  SQL(disk), SQL(memory), and RustWorkX source backends.
@yfukai
Copy link
Copy Markdown
Contributor

yfukai commented May 18, 2026

Looks great, I don't have additional comments, thanks @JoOkuma!

@yfukai
Copy link
Copy Markdown
Contributor

yfukai commented May 18, 2026

I was thinking maybe I can rebuild #281 on this PR but I need a week or so

@JoOkuma
Copy link
Copy Markdown
Member Author

JoOkuma commented May 18, 2026

@yfukai, thanks. Sounds good, let me know when you think #281 is ready for a new review.

@JoOkuma JoOkuma merged commit a4f7607 into royerlab:main May 18, 2026
7 checks passed
@JoOkuma JoOkuma deleted the fix/sql-graph-from-other-285 branch May 18, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLGraph.from_other fails with OperationalError

3 participants