Alignment workflow

The alignment workflow starts with the fastq data and performs QC, aligns reads to the genome, remove duplicate reads and create genomic coverage files.


Sample.trimmed.per-base-quality.png
The base quality for fastq data
Sample.trimmed_fastqc.zip
All QC results
Sample.genome.bam [.bai]
Aligned bam files
Sample.genome.dedup.bam [.bai]
Aligned and deduplicated bam files, removing duplicated reads
Sample.genome.alignment-summary.png
A figure showing % of reads that aligned
Sample.genome.dedup.norm-1M.bigwig
The coverage information in bigwig format. This file may be used with UCSC genome browser, IGV browser, etc.


Peak calling workflow

The peak calling workflow uses bam files from the alignment workflow to find peaks in a sample alone or differential peaks compared to a control. It annotates the peaks to show overlap with promoters, closest genes, etc., filters out the peaks overlapping with Satellite repeat regions (likely noise) and then makes list of Promoter, Genebody or Intergenic targets. Finally, it finds known and novel motifs in the peaks.


Sample_1_vs_Sample_2.genome.hg19.macs_peaks.bed
Macs peaks in bed format, may be loaded in IGV or other browsers
Sample_1_vs_Sample_2.genome.macs_peaks.annotated.xls
Peaks with annotation information added, showing overlap with promoter, overlap with gene body or closest genes.
Sample_1_vs_Sample_2.genome.macs_peaks.filtered.xls
Removing peaks that overlap with Satellite regions, as such peaks are likely to be noise.
Sample_1_vs_Sample_2.genome.macs_peaks.filtered.targets.xls
Lists of target genes
Sample_1_vs_Sample_2.genome.macs_peaks.motifs.zip
Motifs found in the peaks. There are 2 types of results: a) de-novo search for over-represented n-mers and then match them to known motifs (homerMotifs.html) and b) take known motifs and search for them in the peaks (knownMotifs.html).