Skip to content
View barbavegeta's full-sized avatar

Block or report barbavegeta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
barbavegeta/README.md

Hi, I’m Salvatore

Data Science & Bioinformatics
London, United Kingdom


About Me

Biomedical scientist with 7+ years in clinical and genomic laboratories, now focused on bioinformatics and data-driven analysis of sequencing data.

I specialise in building and analysing NGS pipelines, combining wet-lab expertise with computational workflows to extract biologically meaningful insights.

  • Clinical experience: UCLH, CooperGenomics (NGS, ATMPs, reporting)
  • Strong in Python, R, SQL, and Bash for genomic data workflows
  • Experience with RNA-seq, variant calling, and pipeline development

Tech Stack

Languages: Python · R · SQL · Bash
Bioinformatics: HISAT2 · STAR · SAMtools · BEDtools · bcftools · DESeq2
Data Viz: Tableau · ggplot2 · seaborn · matplotlib
Cloud: AWS · Google Cloud · GKE (Kubernetes)


Featured Projects

Project Description Tools
M. tuberculosis WGS Variant Analysis Workflow Galaxy-based workflow for QC, trimming, alignment, coverage assessment, variant calling, annotation, and IGV-supported review of resistance-associated loci in Mycobacterium tuberculosis Galaxy · BWA-MEM2 · Picard · SAMtools · mosdepth · bcftools · SnpEff · SnpSift · MultiQC · IGV
RNA-seq Pipeline Containerised RNA-seq workflow built with Nextflow and Docker, covering QC, trimming, alignment, quantification, MultiQC reporting, and differential expression analysis Nextflow · Docker · FastQC · Cutadapt · STAR · featureCounts · MultiQC · DESeq2
Genomic Data Science End-to-end RNA-seq & variant analysis using HISAT2, StringTie, and DESeq2 Python · R · Bash · Bioconductor
Salifort Motors Predictive modelling to understand drivers of employee turnover and inform retention strategy XGBoost · NumPy · SciPy · scikit-learn · Pandas · Statsmodels
TikTok Project Exploratory analysis of engagement metrics to uncover content trends and optimisation levers Matplotlib · Seaborn · Plotly · SciPy
Bellabeat Case Study Fitbit data analysis and Tableau dashboard R · dplyr · Tableau · SQL
AWS Solution Architecture Cloud deployment diagrams & IaC design AWS · ECS · S3 · Aurora
Fiber Business Intelligence Capstone Data integration and visualization for business insights BigQuery · Tableau · SQL
Portfolio Website Personal website showcasing bioinformatics and data projects HTML · CSS · JS

Education & Certifications

MSc Bioinformatics - Atlantic Technological University (Remote) - 2025-Present
MSc Cell & Gene Therapy - University College London - 2021-2023
BSc Biomedical Science - University of Catania - 2014-2017

Machine Learning & AI Certifications:
IBM: Machine Learning · AI Engineering

Data, Analytics, and Cloud Certifications:
Google: Data Analytics · Advanced Data Analytics · IT Automation with Python · Project Management · Business Intelligence
Google Cloud: Architecting with Google Kubernetes Engine
Amazon Web Services (AWS): Cloud Practitioner Essentials · Cloud Solutions Architect

Bioinformatics Certifications:
Johns Hopkins University: Genomic Data Science Specialization
Wellcome: Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R; Analysing and Interpreting Genomics Datasets

Programming & Data Science Courses:
freeCodeCamp: Data Analysis with Python; Relational Databases; Scientific Computing with Python & Databases
DE<code>LIFE: Genomes, Networks & Pathways; Data Science & Machine Learning with Python


Connect with Me

Pinned Loading

  1. mtb-wgs-galaxy-workflow mtb-wgs-galaxy-workflow Public

    Galaxy-based WGS variant analysis workflow for Mycobacterium tuberculosis, covering QC, alignment, coverage assessment, variant calling, annotation, and IGV-supported interpretation of resistance-a…

  2. RNA-seq_Nextflow_Pipeline_with_Docker RNA-seq_Nextflow_Pipeline_with_Docker Public

    Minimal end-to-end RNA-seq pipeline: FASTQ -> FastQC -> cutadapt -> STAR -> sorted BAM -> featureCounts -> MultiQC → DESeq2

    HTML 5

  3. Genomic_Data_Science_Specialization Genomic_Data_Science_Specialization Public

    Personal solutions, scripts, and command logs from the Johns Hopkins Genomic Data Science Specialization (alignment, RNA-seq, variant calling, and database querying).

    Jupyter Notebook 1

  4. Google_Advanced_Data_Analytics-Salifort_Motors Google_Advanced_Data_Analytics-Salifort_Motors Public

    HR analytics capstone for the Google Advanced Data Analytics certificate, building classification models to understand and predict employee attrition at Salifort Motors.

    HTML

  5. Google_Advanced_Data_Analytics-TikTok_Project Google_Advanced_Data_Analytics-TikTok_Project Public

    Google Advanced Data Analytics project analysing TikTok engagement data to uncover content patterns and factors linked to higher user interaction.

    Jupyter Notebook

  6. Google_Data_Analytics-Bellabeat_Project Google_Data_Analytics-Bellabeat_Project Public

    Google Data Analytics capstone using Fitbit data for Bellabeat to explore activity, sleep, and heart-rate patterns and build Tableau dashboards for actionable insights.

    R