Workflow and Report page overview

The analysis of each sample starts with fastq data, performs QC, extracts barcodes and UMI's, aligns reads to the genome using STAR, and generates a raw gene-by-cell count table. This table is then used to create a Seurat object in R, before filtering, normalization, dimension-reduction using PCA, clustering at resolutions from 0.1-2.0, UMAP and t-SNE plot generation, and identification of cluster marker genes. Summary statistics and plots are displayed for a quick assessment of data quality. Interactive UMAP, t-SNE, and PCA plots allow for real-time data exploration at both the cluster- and gene-level along with a summary table and heatmap of marker gene expression levels. After this analysis completes, our downstream single-cell RNA-seq integration pipeline can be performed to combine and compare multiple single-cell RNA-seq samples.


Under the "Report" tab, a series of interactive and downloadable plots allow for in-depth data exploration

Quality Scores

-Average quality scores at each base of the raw sequencing reads

Number of reads

-a sankey plot summarizing read trimming, alignment, and filtering steps


-a summary plot of how the reads have aligned relative to gene annotations
("mRNA" represents reads mapping to exons)


-summary statistics at the cell- and gene/feature- level both before and after filtering


-cell projections in 2D space resulting from t-SNE, UMAP, and PCA algorithms at different resolutions. This allows for in-depth exploration of the number of preferred clusters along with gene expression levels within those clusters

-also included is a summary violin + dot-plot showing the distribution of expression at the chosen gene and resolution

Cluster table

-table of marker genes for each cluster at the chosen resolution, identified using differential gene expression analysis comparing each cell cluster to all other cells

GO Table

-enrichment results of gene ontology terms for each cluster at the chosen resolution
Pathway Table
-enrichment results of biological pathways for each cluster at the chosen resolution


-heatmap of marker genes identified using differential gene expression analysis for each cluster

Output Files

Under the "Info" tab, intermediate files produced by the pipleline are available for viewing or download:


-cell barcode whitelist used in the analysisQC, Trim
qc/<SAMPLE_NAME> detailed report of raw sequencing quality, base content, estimates of PCR duplication level, insert size distribution, adapter content, and kmer overrepresentation

Align (STAR)
star/<SAMPLE_NAME>.<genome>.bam-compressed BAM file of the alignments

star/<SAMPLE_NAME>.<genome>.bam.bai-index file of the BAM alignments file

Expr count
featurecounts/<SAMPLE_NAME>.<genome>.counts_gene.txt-gene-level count matrix

featurecounts/<SAMPLE_NAME>.<genome>.counts_transcript.txt-transcript-level count matrix


-gene-by-cell count matrix with both gene name/symbol and gene ID present in tab-separated format


-gene-by-cell count matrix with gene ID in tab-separated format

-gene-by-cell count matrix in comma-separated format


-table of the statistical significance of genes in each principle component


-table of seurat object metadata for each cell after filtering, normalization, and clustering


-table of seurat object metadata for each cell prior to any data transformations


-resulting principle component data in a comma-separated table


-seurat R data object containing raw and normalized data, and metadata


-table of cell coordinates in t-SNE space


-table of cell coordinates in UMAP space



-results of marker gene identification and heatmaps at resolutions from 0.1-2.0

GSEA GO & Pathway Analysis

-gene set enrichment of GO terms for each cluster at each resolution in tab-separated (.tsv) files


-gene set enrichment of biological pathways for each cluster at each resolution in tab-separated (.tsv) files