Skip to content

Make the v3 topic modeling workflow backend-agnostic and add tomotopy support#226

Open
sldyns wants to merge 1 commit into
aertslab:pycistopic_v3from
sldyns:pycistopic_v3
Open

Make the v3 topic modeling workflow backend-agnostic and add tomotopy support#226
sldyns wants to merge 1 commit into
aertslab:pycistopic_v3from
sldyns:pycistopic_v3

Conversation

@sldyns
Copy link
Copy Markdown

@sldyns sldyns commented Apr 10, 2026

Summary

This PR refactors the current pycistopic_v3 topic-modeling workflow so that v3 artifacts are backend-agnostic instead of being tied to Mallet-specific readers and filenames, and adds a tomotopy LDA backend that writes the same artifact bundle.

What changed

  • add a shared topic-model abstraction in topic_models.py
  • add TopicModelFilenames to centralize the v3 artifact layout
  • add backend resolution from the saved parameters file via load_topic_model_backend
  • add LDATomotopy to train directly from the binary accessibility matrix and write the standard v3 outputs
  • refactor LDAMallet to implement the same backend interface and emit the same artifact bundle
  • update the topic_modeling CLI to use a shared run entry point with --backend {mallet,tomotopy}
  • keep corpus creation as a Mallet-specific step, while allowing tomotopy to run directly from the matrix plus barcode/region inputs
  • update create_anndata, model-stat calculation, plotting, and topic binarization to load outputs through the backend abstraction instead of assuming Mallet-only artifacts
  • update log-likelihood/stat handling so backend-specific hyperparameters are interpreted correctly, including learned alpha values from tomotopy
  • add tomotopy>=0.14.0 as a dependency

@ghuls
Copy link
Copy Markdown
Member

ghuls commented May 5, 2026

This pull request will need to be split in multiple commits to be able to review it properly and refrain from rewriting whole functions that e.g. use args.variable instead of the original variable approach.

At first glance it seems like calling run_topic_modeling with the mallet backend will also run the tomotopy topic modeling afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants