Code for: CRISPR-engineered deletion of POGZ alters transcription factor binding at promoters of genes involved in synaptic signaling

This repository contains the analysis code associated with the publication:

[Authors]. [Paper Title]. [Journal], [Year]. DOI: [DOI]

Overview

This study investigates the transcriptional and chromatin accessibility consequences of heterozygous POGZ loss-of-function in human iPSC-derived neurons (iN) and neural stem cells (NSC). Experiments were performed in two independent iPSC genetic backgrounds (GM8330 and MGH) to identify robust, reproducible effects.

The analyses include:

Bulk RNA-seq differential expression and co-expression analysis
ATAC-seq peak calling, differential chromatin accessibility analysis
Transcription factor footprinting and differential TF binding analysis

Repository structure

code/
├── RNA-seq/
│   ├── rnaseq_analysis_iN_final.Rmd           # Main RNA-seq analysis notebook
│   ├── co-expression_analysis_allsamples.R    # WGCNA co-expression analysis
│   ├── co-expression_data.R                   # Preprocessing for co-expression
│   └── RNAseq.analysis/R/                     # Supporting R functions
│       ├── rnaseq_analysis_functions.R        # DESeq2 wrappers, volcano plots, PCA
│       ├── rnaseq_qc_functions.R              # QC helper functions
│       ├── co_expression_functions.R          # WGCNA helper functions
│       ├── enrichment.R                       # Pathway enrichment functions
│       └── read_pathway_db.R                  # Load pathway databases
└── ATAC-seq/
    ├── ATACSeq/                               # ATAC-seq processing pipeline (Python)
    │   └── bin/
    │       ├── callpeaks.py                   # Peak calling (MACS2)
    │       ├── poolpeaks.py                   # Merge peaks across samples
    │       ├── callfootprints.py              # TF footprinting (TOBIAS ATACorrect + ScoreBigwig)
    │       └── calltfbindings.py              # Differential TF binding (TOBIAS BINDetect)
    ├── Differential_accessibility/
    │   ├── diff_peaks_v2.R                    # DiffBind differential accessibility script
    │   └── differential_accessibility.Rmd    # Differential accessibility analysis notebook
    └── Transcription_factor_footprint/
        ├── 1. merge_ATACpeaks.sh              # Merge peak files per cell type / background
        ├── 2. get_bamfile_list.sh             # Collect BAM file lists per group
        ├── 3. call_TF_footprints.sh           # Run TOBIAS ATACorrect + ScoreBigwig
        ├── 4. iN_merge_bigwigs.sh             # Merge per-sample bigWigs per group
        ├── 5. call_TF_bindings.sh             # Run TOBIAS BINDetect (DEL vs WT)
        └── tf_binding.Rmd                     # TF binding analysis and visualization notebook

Dependencies

R (≥ 4.0)

Package	Purpose
DESeq2	Differential expression analysis
sva	Surrogate variable analysis (batch correction)
WGCNA	Weighted gene co-expression network analysis
DiffBind	Differential chromatin accessibility
GenomicRanges	Genomic interval operations
biomaRt	Gene annotation retrieval
ggplot2, ggrepel, ggpubr	Visualization
pheatmap	Heatmaps
VennDiagram	Overlap visualization
dplyr, stringr, data.table	Data manipulation

Python (≥ 3.7)

Tool	Purpose
samtools	BAM processing
bedtools	Genomic interval operations
MACS2	Peak calling
TOBIAS	TF footprinting and differential binding
deepTools	BigWig generation

Analysis workflows

Note: The code provided here covers the iN heterozygous deletion analysis as an example. The same workflows were applied to NSC and compound heterozygous samples in the paper.

RNA-seq

Preprocessing and QC — Gene counts and library-size-normalized CPM are computed from a STAR-aligned count matrix. Low-expressed genes are filtered at CPM ≥ 0.5 in at least one group.
Differential expression — DESeq2 is run separately for GM8330 and MGH backgrounds. Surrogate variables (SVA) are estimated and included as covariates to remove unwanted technical variation. Results are saved as CSV tables with log₂ fold change and adjusted p-values.
Cross-background replication — DEGs significant at FDR < 0.1 in both backgrounds and regulated in the same direction are taken as the consensus POGZ DEG set.
Pathway enrichment — The consensus DEG set is tested for enrichment in curated gene sets (GO, KEGG, and other pathway databases) using hypergeometric tests.
Co-expression analysis — WGCNA is run on SVA-corrected, log₂-normalized counts from all iN samples. Module–trait correlations are computed against genotype and other metadata covariates.

The main differential expression notebook is code/RNA-seq/rnaseq_analysis_iN_final.Rmd. Co-expression analysis is in code/RNA-seq/co-expression_analysis_allsamples.R.

ATAC-seq

Step 1 — Peak calling and IDR

python code/ATAC-seq/ATACSeq/bin/callpeaks.py --bam <sample.bam> --outdir <peaks_dir>

Peaks are called with MACS2. IDR is used to define conservative and optimal peak sets across replicates.

Step 2 — Differential accessibility (DiffBind)

Run diff_peaks_v2.R with a DiffBind sample sheet:

Rscript code/ATAC-seq/Differential_accessibility/diff_peaks_v2.R \
  <metadata.csv> <background> <tissue> <output_dir> <use_overlapped_peaks> [is_compound_het]

Arguments: (1) DiffBind sample sheet CSV, (2) genetic background (e.g. GM or MGH), (3) cell type (e.g. iN or NSC), (4) output directory, (5) whether to use overlapped peaks across sample groups (TRUE/FALSE), (6) whether samples are compound heterozygous — optional, defaults to FALSE.

Differentially accessible regions (DARs) are identified at FDR < 0.05. The full analysis including annotation and visualization is in code/ATAC-seq/Differential_accessibility/differential_accessibility.Rmd.

Step 3 — TF footprinting and differential binding (TOBIAS)

Run the numbered shell scripts in order within code/ATAC-seq/Transcription_factor_footprint/:

# 1. Merge IDR peaks per cell type and background
bash "code/ATAC-seq/Transcription_factor_footprint/1. merge_ATACpeaks.sh"

# 2. Collect per-group BAM file lists
bash "code/ATAC-seq/Transcription_factor_footprint/2. get_bamfile_list.sh"

# 3. Compute ATAC-seq bias correction and footprint scores (TOBIAS ATACorrect + ScoreBigwig)
bash "code/ATAC-seq/Transcription_factor_footprint/3. call_TF_footprints.sh"

# 4. Merge per-sample bigWigs per group
bash "code/ATAC-seq/Transcription_factor_footprint/4. iN_merge_bigwigs.sh"

# 5. Run differential TF binding analysis (TOBIAS BINDetect, DEL vs WT)
bash "code/ATAC-seq/Transcription_factor_footprint/5. call_TF_bindings.sh"

TF motifs are from the JASPAR vertebrates non-redundant collection, supplemented with custom POGZ motifs. Downstream statistical analysis and visualization are in code/ATAC-seq/Transcription_factor_footprint/tf_binding.Rmd.

Data availability

Raw sequencing data and processed files are deposited at GEO under accession [GEO accession].

Citation

If you use this code, please cite:

[Authors]. [Paper Title]. [Journal], [Year]. DOI: [DOI]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ATAC-seq		ATAC-seq
RNA-seq		RNA-seq
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for: CRISPR-engineered deletion of POGZ alters transcription factor binding at promoters of genes involved in synaptic signaling

Overview

Repository structure

Dependencies

R (≥ 4.0)

Python (≥ 3.7)

Analysis workflows

RNA-seq

ATAC-seq

Step 1 — Peak calling and IDR

Step 2 — Differential accessibility (DiffBind)

Step 3 — TF footprinting and differential binding (TOBIAS)

Data availability

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for: CRISPR-engineered deletion of POGZ alters transcription factor binding at promoters of genes involved in synaptic signaling

Overview

Repository structure

Dependencies

R (≥ 4.0)

Python (≥ 3.7)

Analysis workflows

RNA-seq

ATAC-seq

Step 1 — Peak calling and IDR

Step 2 — Differential accessibility (DiffBind)

Step 3 — TF footprinting and differential binding (TOBIAS)

Data availability

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages