Skip to content

EXDEICIDA/retina-ml

Repository files navigation

MIT License Python scikit-learn Pandas Google Colab Kaggle

Retina ML

This project builds a complete machine learning pipeline to predict DNA methylation status based on genomic features. The goal is to identify epigenetic patterns that can serve as early detection biomarkers for rare genetic vision disorders, such as Retinitis Pigmentosa.

Results

After evaluating multiple algorithms, a simple linear model outperformed complex tree-based models, indicating that the feature boundaries in this specific dataset are highly linear.

  • Winning Model: Logistic Regression
  • Accuracy: 97.00%

Dataset

The data is sourced from the Kaggle dataset: DNA Methylation Data - Epigenetic Biomarkers. It contains 1,000 samples with four key genomic features.

Feature Name Type Description
cpg_density Genomic Feature The density or concentration of CpG sites in the specific genomic region.
genomic_location Genomic Feature The normalized physical position of the site within the chromosome.
regulatory_score Genomic Feature A metric indicating the potential regulatory activity (like promoters or enhancers) of the region.
conservation_score Genomic Feature A score representing how well this genomic sequence has been preserved across evolution.
methylation_status Target Variable The binary classification target (0 = Unmethylated, 1 = Methylated).

About

Complete machine learning model to predict DNA methylation status based on genomic features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors