Add masked language modeling models#69
Merged
shujaatTracebloc merged 5 commits intoMay 18, 2026
Merged
Conversation
Three MLM model definitions for biomedical graph pretraining: - simple_mlm.py: ~30M param transformer encoder for smoke testing - netmedgpt_style_scratch.py: ~110M param BERT-scale, train from scratch - netmedgpt_style_warmstart.py: BERT-base warm-started from HuggingFace Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
👋 Heads-up — Code review queue is at 18 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
…nguage-modeling-models
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NetMedGPTWarmStart is a factory function, not an nn.Module subclass, so it must declare main_method per the metadata contract. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6576d4b. Configure here.
- Tie lm_decoder weights to word_embeddings (BERT-style) to match the warmstart model architecture and get correct ~110M param count - Initialize nn.MultiheadAttention in_proj_weight and out_proj params which are stored as raw Parameters, not nn.Linear submodules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
divyasinghds
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
masked_language_modeling/pytorch/directory with three model definitions for biomedical graph MLM pretrainingtest_model_contract.pyto includemasked_language_modelingin known categoriesTest plan
pytest tests/test_model_contract.pypasses with new category and all three model files(batch, seq_len, vocab_size)🤖 Generated with Claude Code
Note
Medium Risk
Adds new PyTorch MLM model entrypoints, including one that imports HuggingFace
transformers, which may break model-contract imports if that dependency isn’t present in relevant CI/runtime environments.Overview
Introduces a new
masked_language_modelingtask in the model zoo with three PyTorch model definitions: a small transformer encoder (SimpleMaskedLM), a BERT-base-scale from-scratch encoder with tied LM head (NetMedGPTScratch), and a warm-started HuggingFace BERT loader that resizes token embeddings (NetMedGPTWarmStart).Updates the model contract test to recognize
masked_language_modelingas a valid category so these new model files are included in the import/metadata checks.Reviewed by Cursor Bugbot for commit 084cb95. Bugbot is set up for automated code reviews on this repo. Configure here.