The alignment workflow starts with the fastq data and performs QC, aligns reads to the genome, remove duplicate reads and create genomic coverage files.
- The base quality for fastq data
- All QC results
- Sample.genome.bam [.bai]
- Aligned bam files
- Sample.genome.dedup.bam [.bai]
- Aligned and deduplicated bam files, removing duplicated reads
- A figure showing % of reads that aligned
- The coverage information in bigwig format. This file may be used with UCSC genome browser, IGV browser, etc.
- Macs peaks in bed format, may be loaded in IGV or other browsers
- Peaks with annotation information added, showing overlap with promoter, overlap with gene body or closest genes.
- Removing peaks that overlap with Satellite regions, as such peaks are likely to be noise.
- Lists of target genes
- Motifs found in the peaks. There are 2 types of results: a) de-novo search for over-represented n-mers and then match them to known motifs (homerMotifs.html) and b) take known motifs and search for them in the peaks (knownMotifs.html).
Peak calling workflow
The peak calling workflow uses bam files from the alignment workflow to find peaks in a sample alone or differential peaks compared to a control. It annotates the peaks to show overlap with promoters, closest genes, etc., filters out the peaks overlapping with Satellite repeat regions (likely noise) and then makes list of Promoter, Genebody or Intergenic targets. Finally, it finds known and novel motifs in the peaks.