Peak calling workflow and Report page overview
ChIP-Seq peak calling using macs2 starts with the de-duplicated BAM file produced by our alignment and QC pipeline, calls peaks, annotates peaks with genes based on proximity, provides summary statistics of the samples used in the peak-calling pipeline and an embedded genome browser to directly view all datasets combined.
Results
Peaks Distribution
-Summary plots of peak locations relative to genes
Peaks table
-peaks called in your data with their underlying statistics and annotation of nearby genes
(detailed descriptions of the information in this table below)
Correlation and Scatterplots
-pairwise correlation heatmap and scatterplot of samples used in the analysis
Genome Browser
-an interactive, embedded IGV browser session displaying the normalized signal bigwig tracks of all samples used in the analysis
*multiple genomic loci can be visualized by specifying them with spaces in-between:chr8:128,740,266-128,761,729 chr10:3,818,235-3,836,806 chr19:16,433,128-16,438,505
FRiP table
-Fraction of Reads in Peaks (FRiP) quantifies the enrichment of biological signal (reads) in called peaks
Enrichment
-Gene set enrichment analysis results of genes associated with peaks
(detailed descriptions of the information in this table below)
Motifs
-motifs found to be enriched in peaks and the transcription factors known to bind to the motif
*further details on output tables below
Peaks table
-peaks called in your data with their underlying statistics and annotation of nearby genes
Chrom, Start, End, Length - genomic coordinates and size of the peak
Pileup - raw count signal at the peak summit
Pval - peak significance values ( -10 * log10 )
Qval - FDR-corrected peak significance values ( -10 * log10 )
Fold - calculation of signal relative to background
Annotation - location of peak in relation to the nearest (or overlapping) gene and it's features (ie. TSS, exons)
Accession - gene ID of nearest gene(s)
Symbol - gene symbol/name of nearest gene(s)
Enrichment
-Gene set enrichment analysis results of genes associated with peaks
Database - source of gene sets (or genomic regions/elements) that peaks are tested against
GO Biological Process - (Gene Ontology) gene sets associated with specific biological processes
GO Molecular Function - (Gene Ontology) gene sets associated with specific molecular functions
GO Cellular Component - (Gene Ontology) gene sets associated with specific cellular components
WikiPathways - gene sets associated with various cellular pathways and functions
Chromsome - distinct regions of chromosomes identified based on banding patterns
EMBL-EBI-Pfam - protein family database identified by multi-sequence alignment (now in InterPro)
Gene3D - protein domain database, classification at the level of superfamily
Interactions - gene sets known to interact with a specific protein, available through the NIH
InterPro - database of protein families and their function
miRNA - miRNA targets database from mirDB
MSIGDB - gene sets associated with a variety of molecular and cellular functions
PRINTS - protein family "fingerprints" database based on domain conservation (now part of InterPro)
Prosite - database of protein families, domains, and functional sites
SMART - protein domain database
TermID - a unique identifier of this gene set from its database
Term - name of the gene set
Enrichment logP - log-scaled significance of enrichment (p-value) for the gene set
Genes in Term - total number of genes in this gene set
Target Genes in Term - number of genes shared between your dataset and this gene set
Total Target Genes - total number of genes in your dataset
Total Genes in Database - Total number of genes in the gene set database
Fraction of Targets in Term - genes shared between your dataset and this gene set as a percent of the total number of genes in your dataset
Targets as Fraction of Genes - genes shared between your dataset and this gene set as a percent of the number of genes in the gene set
Output files
Under the "Info" tab, intermediate files produced by the workflow are available for download:
MACS2
macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.xls
-raw peak calls from macs2
Annotation
annotate/macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.annotated.xls
-raw peaks annotated with nearby genes
annotate/macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.filtered.xls
-remaining peaks after removal of those in blacklisted regions
annotate/macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.homer_anno_raw.xls
-peaks and their annotations from the HOMER tool
ChipSeeker Docker
chip_seeker/chip_seeker/Rplots.pdf
-summary pie chart of peak annotations displaying their proximity to specific gene elements
chip_seeker/chip_seeker/covplot.png
-display of peaks along each chromosome
chip_seeker/chip_seeker/plotannopie.png
-summary pie chart of peak annotations displaying their proximity to specific gene elements
chip_seeker/chip_seeker/upsetplot.png
-upset plot of peak annotations displaying their proximity to specific gene elements
chip_seeker/chip_seeker/upsetvennpie.png
-plot combining both the pie chart and upsetplotEnrichment
enrichment/macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.homer_anno_raw_go_summary.xls
-gene ontology enrichment results from the HOMER tool
Homer
homer_view/all_motif_data.json
-json file used to display motif results in the report page
homer_view/individual_motif_data.json
-json file used to display motif results in the report page
homer_view/macs2/<ANALYSIS_NAME.<genome>.macs2_peaks.filtered.motifs.zip
-compressed folder of full HOMER tool results
BigwigSummary
multiBigwigSummary/readCounts.npz
-normalized counts in peaks for all samples used in the analysis in npz format
multiBigwigSummary/readCounts.tab
-normalized counts in peaks for all samples used in the analysis in tab-separated format
PlotCorrelation
plotCorrelation_heatmap/outFileCorMatrix.tab
-correlation matrix of samples used in analysis
plotCorrelation_heatmap/heatmap.plotCorrelation.png
-correlation heatmap
PlotCorrelation
plotCorrelation_scatterplot/scatterplot.plotCorrelation.png
-pairwise correlation plots