MultiModal-Project

Team project for INF 385T: Deep Learning and Multimodal Systems

Overview

This project develops a multimodal classification system that predicts the presence of Pneumonia and Pneumothorax from chest X-ray images paired with corresponding radiology reports (Findings and Impression sections). Labels are sourced from CheXbert annotations, where 1 indicates a confirmed condition and 0 indicates no finding. Uncertain labels (-1) are excluded from training.

We experiment with six label combinations for the two conditions: (0,0), (1,0), (0,1), (1,1), (1,NaN), and (NaN,1). Our multimodal fusion model combines a ResNet50 image encoder with a BERT text encoder, and is compared against image-only and text-only baselines.

Repository Structure

MultiModal-Project/
├── Cross-Modal_Alignment_MIMIC-CXR.ipynb       # Main notebook: full pipeline
├── huggingface data/
│   └── mimic_cxr_with_chexbert_labels.ipynb    # How CheXbert labels were added to MIMIC-CXR
└── models/
    ├── image_only_baseline_HF.ipynb             # Image-only baseline
    └── text_only_baseline_HF.ipynb              # Text-only baseline

Dataset

The dataset used in this project is publicly available on Hugging Face:

cchitse/mimic-cxr-with-chexbert-labels

It combines MIMIC-CXR chest X-ray images with structured CheXbert labels. No access token is required to download it.

The notebook huggingface data/mimic_cxr_with_chexbert_labels.ipynb documents how we constructed this dataset by appending CheXbert labels to the original MIMIC-CXR records.

How to Run

All notebooks are designed to run on Google Colab. No local setup is required.

Open the desired notebook in Google Colab.
Run all cells sequentially. The dataset will be downloaded directly from Hugging Face at runtime — no manual data download or Drive mounting is needed.
Start with Cross-Modal_Alignment_MIMIC-CXR.ipynb for the complete analysis, which covers data preprocessing, model training, evaluation, and Grad-CAM visualization. The baseline notebooks under models/ were used for early-stage testing and can be run independently.

Dependencies

The following packages are required and can be installed within Colab:

torch, torchvision
transformers
datasets (Hugging Face)
scikit-learn
matplotlib, numpy, opencv-python

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
huggingface data		huggingface data
images/files/p14/p14851484/s58888647		images/files/p14/p14851484/s58888647
local testings		local testings
models		models
.gitignore		.gitignore
Cross-Modal_Alignment_MIMIC-CXR.ipynb		Cross-Modal_Alignment_MIMIC-CXR.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiModal-Project

Overview

Repository Structure

Dataset

How to Run

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiModal-Project

Overview

Repository Structure

Dataset

How to Run

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages