Single cell RNA-seq

Modified on Tue, 2 Jan, 2024 at 1:05 PM

Workflow and Report page overview

The analysis of each sample starts with fastq data, performs QC, extracts barcodes and UMI's, aligns reads to the genome using STAR, and generates a raw gene-by-cell count table. This table is then used to create a Seurat object in R, before filtering, normalization, dimension-reduction using PCA, clustering at resolutions from 0.1-2.0, UMAP and t-SNE plot generation, and identification of cluster marker genes. Summary statistics and plots are displayed for a quick assessment of data quality. Interactive UMAP, t-SNE, and PCA plots allow for real-time data exploration at both the cluster- and gene-level along with a summary table and heatmap of marker gene expression levels. After this analysis completes, our downstream single-cell RNA-seq integration pipeline can be performed to combine and compare multiple single-cell RNA-seq samples.

Results

Under the "Report" tab, a series of interactive and downloadable plots allow for in-depth data exploration

Quality Scores

-Average quality scores at each base of the raw sequencing reads

Number of reads

-a sankey plot summarizing read trimming, alignment, and filtering steps

Metrics

-a summary plot of how the reads have aligned relative to gene annotations
("mRNA" represents reads mapping to exons)

Summary

-summary statistics at the cell- and gene/feature- level both before and after filtering

Scatterplot

-cell projections in 2D space resulting from t-SNE, UMAP, and PCA algorithms at different resolutions. This allows for in-depth exploration of the number of preferred clusters along with gene expression levels within those clusters

-also included is a summary violin + dot-plot showing the distribution of expression at the chosen gene and resolution

Cluster table

-table of marker genes for each cluster at the chosen resolution, identified using differential gene expression analysis comparing each cell cluster to all other cells

GO Table

-enrichment results of gene ontology terms for each cluster at the chosen resolution
Pathway Table
-enrichment results of biological pathways for each cluster at the chosen resolution

Heatmap

-heatmap of marker genes identified using differential gene expression analysis for each cluster

Output Files

Under the "Info" tab, intermediate files produced by the pipleline are available for viewing or download:

Alevin
extract/out1/alevin/whitelist.txt

-cell barcode whitelist used in the analysisQC, Trim
qc/<SAMPLE_NAME>.trim.report.html-a detailed report of raw sequencing quality, base content, estimates of PCR duplication level, insert size distribution, adapter content, and kmer overrepresentation

Align (STAR)
star/<SAMPLE_NAME>.<genome>.bam-compressed BAM file of the alignments

star/<SAMPLE_NAME>.<genome>.bam.bai-index file of the BAM alignments file

Expr count
featurecounts/<SAMPLE_NAME>.<genome>.counts_gene.txt-gene-level count matrix

featurecounts/<SAMPLE_NAME>.<genome>.counts_transcript.txt-transcript-level count matrix
Matrix

matrix/<SAMPLE_NAME>.<genome>.counts.tsv

-gene-by-cell count matrix with both gene name/symbol and gene ID present in tab-separated format

matrix/<SAMPLE_NAME>.<genome>.counts_id.tsv

-gene-by-cell count matrix with gene ID in tab-separated format
Seurat

seurat/features.csv
-gene-by-cell count matrix in comma-separated format

seurat/jackstraw.overall.pvalues.csv

-table of the statistical significance of genes in each principle component

seurat/metadata.csv

-table of seurat object metadata for each cell after filtering, normalization, and clustering

seurat/metadata-unfiltered.csv

-table of seurat object metadata for each cell prior to any data transformations

seurat/pcdata.csv

-resulting principle component data in a comma-separated table

seurat/seurat_object.rds

-seurat R data object containing raw and normalized data, and metadata

seurat/tsne.csv

-table of cell coordinates in t-SNE space

seurat/umap.csv

-table of cell coordinates in UMAP space

TopList

toplist/toplist/toplist/toplist.zip

-results of marker gene identification and heatmaps at resolutions from 0.1-2.0

GSEA GO & Pathway Analysis
gsea/gsea/<SAMPLE_NAME>.GO.zip

-gene set enrichment of GO terms for each cluster at each resolution in tab-separated (.tsv) files

gsea/gsea/<SAMPLE_NAME>.pathway.zip

-gene set enrichment of biological pathways for each cluster at each resolution in tab-separated (.tsv) files