Skip to content

feat: StringTie novel transcript discovery + hybrid GTF#182

Draft
pinin4fjords wants to merge 4 commits into
feat/162-ribotish-qc-canonical-docfrom
feat/164-stringtie-novel
Draft

feat: StringTie novel transcript discovery + hybrid GTF#182
pinin4fjords wants to merge 4 commits into
feat/162-ribotish-qc-canonical-docfrom
feat/164-stringtie-novel

Conversation

@pinin4fjords

Copy link
Copy Markdown
Member

Summary

Adds reference-guided novel-transcript assembly via StringTie (preferring RNA-seq BAMs, falling back to Ribo-seq with tightened defaults) or a user-supplied annotation via --novel_gtf. Either source flows through a common filtering chain (gffcompare class-code filter, optional strand-aware rRNA/repeat blacklist, concat with the canonical backbone) and produces a hybrid annotation at <outdir>/stringtie/hybrid_reference.gtf.

This is the foundation for #165 / #171 / extended ORF discovery. With the default --skip_stringtie true and no --novel_gtf, the new hybrid_gtf workflow channel equals ch_canonical_gtf from #161, so downstream wiring is uniform whether novel discovery is on or off.

Changes

  • New local subworkflow NOVEL_TRANSCRIPT_DISCOVERY chaining bam_stringtie_merge (per-sample StringTie + merge) and the upstream gtf_hybridmerge_gffcompare subworkflow (gffcompare + class-code filter + optional blacklist intersect + concat with canonical).
  • Install upstream modules: stringtie/stringtie, stringtie/merge, gffcompare, bedtools/intersect.
  • Install upstream subworkflows: bam_stringtie_merge, gtf_hybridmerge_gffcompare (PR feat(subworkflows): gtf_hybridmerge_gffcompare modules#11729, merged).
  • New params: --skip_stringtie (default true), --novel_gtf, --stringtie_class_codes (default 'u', intergenic only), --rrna_blacklist, --extra_stringtie_args, --extra_stringtie_merge_args, --stringtie_ribo_fallback_args (default '-m 100 -c 5 -j 3 -f 0.05 -g 100' to tighten the Ribo-seq fallback).
  • Workflow emits hybrid_gtf from the riboseq workflow.

Caveats

  • Ribo-seq StringTie defaults are first-pass empirical, not literature-validated. The pipeline warns when the fallback is active.
  • Default --stringtie_class_codes = u keeps only unambiguously intergenic transcripts. Stranded users can opt into 'u,x' to recover antisense; unsafe for non-stranded protocols.
  • --rrna_blacklist is silently skipped when not supplied.

Stacked PR notes

Sixth in the stack splitting #174. Targets #181 (feat/162-ribotish-qc-canonical-doc).

Closes #164

🤖 Generated with Claude Code

Add reference-guided novel-transcript assembly from RNA-seq BAMs
(or Ribo-seq fallback with tightened defaults), or accept a
user-supplied novel GTF via --novel_gtf. The two sources feed a
common chain: gffcompare class-code filter + optional strand-aware
rRNA/repeat blacklist + concat with the canonical backbone, via
the upstream gtf_hybridmerge_gffcompare subworkflow.

The hybrid GTF is published at <outdir>/stringtie/hybrid_reference.gtf
and exposed on the `hybrid_gtf` workflow channel. With default
--skip_stringtie true and no --novel_gtf, the channel equals
ch_canonical_gtf so downstream wiring is uniform.

Wiring of the hybrid GTF into ORF callers lives in follow-on issue
#165 (--extended_orf_analysis).

New params:
  --skip_stringtie (default true)
  --novel_gtf
  --stringtie_class_codes (default 'u', intergenic only)
  --rrna_blacklist
  --extra_stringtie_args
  --extra_stringtie_merge_args
  --stringtie_ribo_fallback_args (default '-m 100 -c 5 -j 3 -f 0.05 -g 100')

Modules and subworkflows added:
  stringtie/stringtie, stringtie/merge, gffcompare,
  bedtools/intersect, bam_stringtie_merge,
  gtf_hybridmerge_gffcompare
@nf-core-bot

Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants