Skip to content

Add sync_tables helper for multi-server base_schemas sync#6

Open
lecriste wants to merge 4 commits intomainfrom
leo_sync_tables_script
Open

Add sync_tables helper for multi-server base_schemas sync#6
lecriste wants to merge 4 commits intomainfrom
leo_sync_tables_script

Conversation

@lecriste
Copy link
Copy Markdown
Contributor

This PR introduces base_schemas/scripts/sync.py with a sync_tables() function that copies rows of base_schemas tables between two DataJoint servers using dj.Instance (DataJoint 2.2+). Callers pass two connection-config dicts and a list of table classes; the helper builds the instances and uses FreeTable(full_table_name) to resolve tables on each side. Supports per-table restrictions for incremental syncs, wraps each insert in a target-side transaction, and is idempotent via skip_duplicates=True.

Changes

  • base_schemas/scripts/sync.py — new helper module.
  • tests/test_sync.py — 6 unit tests (mocked, no live MySQL), all passing.
  • pyproject.toml — bumped datajoint>=2.2.

Test plan

  • pytest tests/test_sync.py — 6/6 pass
  • End-to-end sync between two live MySQL servers (validated in SCENE_MouseAR integration)

@lecriste lecriste self-assigned this Apr 21, 2026
@lecriste lecriste added the enhancement New feature or request label Apr 21, 2026
@lecriste lecriste requested a review from arturoptophys April 21, 2026 13:22
@lecriste
Copy link
Copy Markdown
Contributor Author

lecriste commented Apr 21, 2026

@arturoptophys, which are the base_schemas tables we should sync from (server3)?
The list would be passed via this tables argument.

Then populate_base.py is run and this sync script is called again with source and target server swapped, in order to update the base_schemas tables on server3.

@arturoptophys
Copy link
Copy Markdown
Collaborator

@lecriste
one-way sync S1 -> S2 All Lookup tables and Mouse, Sacrificed, Breed // Those will only be added via mice sheet or dj-mathis.

one-way S2-S1 -> MouseScoreSheet_WaterRestriction, MouseScoreSheet, Session, SessionScoreSheet. Here order is important so SessionScoreSheet need to be last. // Here there is no need to sync S1 entries into mouse_ar pipeline.

both-ways -> Surgery

Comment thread base_schemas/scripts/sync.py Outdated

rows = src_table.fetch(as_dict=True)
before = len(tgt_table)
with tgt_table.connection.transaction:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single insert is anyway wrapped in a transaction. The questions is should there be a try-except to continue with other tables if one errors or is this intended to fail completely if something goes wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removed: 67187e6
The with ... .transaction: would be useful in case DataJoint internally splits the call into multiple statements (e.g. master + part tables, or batch-splitting on large payloads), but it's not the case for the flat tables of base_schemas.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current behaviour (propagate and abort), tables are processed in FK order: if Mouse fails, continuing on to Session would hit FK errors anyway.
Do you think it's better to fail hard early on any table with dependents?

@lecriste
Copy link
Copy Markdown
Contributor Author

lecriste commented Apr 23, 2026

@lecriste one-way sync S1 -> S2 All Lookup tables and Mouse, Sacrificed, Breed // Those will only be added via mice sheet or dj-mathis.

one-way S2-S1 -> MouseScoreSheet_WaterRestriction, MouseScoreSheet, Session, SessionScoreSheet. Here order is important so SessionScoreSheet need to be last. // Here there is no need to sync S1 entries into mouse_ar pipeline.

both-ways -> Surgery

Thanks @arturoptophys! So the steps will be:

  1. sync from S1 to S2, in no particular order:
    [Mouse, Sacrificed, Breed, Strain, Surgery, SurgeryType, MouseLicensingGeneva, MouseScoreSheet_BodyCondition, MouseScoreSheet_GeneralAssay, MouseScoreSheet_HousingAssesment, Experimenter, Anesthesia, Rig, OptogeneticsRegion, OptogeneticsTiming, OptogeneticsVariant, Optogenetics, Task]
  2. run populate_base.py
  3. sync from S2 to S1, in this order:
    [Surgery, MouseScoreSheet_WaterRestriction, MouseScoreSheet, Session, SessionScoreSheet]

Can you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants