Apply computer vision algorithms to isolate lesions from the ISIC 2019 challenge dataset, then classify them using machine learning. The dataset comprises 25,331 dermoscopic images divided in nine diagnostic categories among which only two are used:
- Melanoma – 4,522 images.
- Melanocytic Nevus – 12,875 images.
This folder contains the main code of the project. Inside there are four files:
__init__.py: empty file to initialize the python package, so that the modules are callable from outside.cvTools.py: module containing a couple of functions to quickly show images with OpenCV.extractor.py: this is the core of the project. The script is used to process the images contained in custom folders (seefile_trimmer.py). It contains both the segmentation and the feature extraction pipelines.file_trimmer.py: this script was used to remove unused images from the dataset. It also separates the images based on their known classification, even though it is not necessary for the feature extraction, nor for the classification algorithm.
classification.ipynb: since the classification algorithm is a simple implementation of theRandomForestClassifierclass of scikit-learn, I have decided not to create a script for the task, but to have a notebook so as to run the cells more easily while changing hyperparameters values.segmentation.ipynb: this notebook contains testing code, and it is not meant to be shown. Nonetheless I have included it for completeness.
This folder contains:
features.csv: output ofextractor.py. In contains a table of features for each image.ISIC_2019_GroundTruth.csv: the list of used images with their classification (whole Melanoma and Melanocytic Nevus classes from the original dataset).
cv.yml contains the conda environment used throughout the project. Below are listed the main packages to install to let the code run.
- python=3.13
- numpy
- matplotlib
- opencv=4.x
- pandas
- tqdm
- scikit-learn