DNA-Seq/WGS/WES Alignment and QC

Modified on Sat, 2 Sep, 2023 at 5:37 AM

Alignment workflow and Report page overview

The alignment of each individual sample starts with fastq data, performs QC, aligns reads to the genome using BWA, removes duplicate reads, and creates genomic coverage files. Summary statistics and plots are provided to allow for a quick assessment of data quality and an embedded genome browser provides a direct look at the data.

Results

Under the "Report" tab, a series of interactive and downloadable plots allow for in-depth data exploration

Quality Scores

-Average quality scores at each base of the raw sequencing reads and also after trimming

Number of Reads

-a sankey plot summarizing the trimming and alignment steps

Coverage

-a summary plot of read-coverage throughout the genomeGenome browser
-an interactive, embedded IGV browser session displaying the alignments

*multiple genomic loci can be visualized by specifying them with spaces in-between:

chr8:128,740,266-128,761,729 chr10:3,818,235-3,836,806 chr19:16,433,128-16,438,505

Output Files

Under the "Info" tab, intermediate files produced by the pipeline are available for viewing or download:

QC, Trim
trim/<SAMPLE_NAME>.trim.report.html

-a detailed report of raw sequencing quality, base content, estimates of PCR duplication level, insert size distribution, adapter content, and kmer overrepresentationBWA
bwa/<SAMPLE_NAME>.<genome>.bam.bai

-index file of the raw alignment bam filebwa/<SAMPLE_NAME>.<genome>.bam

-compressed BAM file of the raw alignmentsSummary
summary/<SAMPLE_NAME>.<genome>.alignment-summary.png

-summary plot of read alignment

Remove duplicates
dedup/<SAMPLE_NAME>.<genome>.dedup.bam.bai

-index file of the de-duplicated alignment bam filededup/duplicate_reads.stats

-a text file containing the statistics on PCR duplication leveldedup/<SAMPLE_NAME>.<genome>.dedup.bam

-compressed BAM file of the alignments after removing PCR duplicates ("de-duplicated")Coverage
coverage/<SAMPLE_NAME>.<genome>.dedup.coverage-summary.xls

-summary statistics of genome-wide coverage