Workflow and Report page overview
The analysis of each sample starts with fastq data, performs QC, extracts barcodes and UMI's, aligns reads to the genome using STAR, and generates a raw gene-by-cell count table. This table is then used to create a Seurat object in R, before filtering, normalization, dimension-reduction using PCA, clustering at resolutions from 0.1-2.0, UMAP and t-SNE plot generation, and identification of cluster marker genes. Summary statistics and plots are displayed for a quick assessment of data quality. Interactive UMAP, t-SNE, and PCA plots allow for real-time data exploration at both the cluster- and gene-level along with a summary table and heatmap of marker gene expression levels. After this analysis completes, our downstream single-cell RNA-seq integration pipeline can be performed to combine and compare multiple single-cell RNA-seq samples.
Results
Under the "Report" tab, a series of interactive and downloadable plots allow for in-depth data exploration
Quality Scores
-Average quality scores at each base of the raw sequencing reads
Number of reads
-a sankey plot summarizing read trimming, alignment, and filtering steps
Metrics
-a summary plot of how the reads have aligned relative to gene annotations
("mRNA" represents reads mapping to exons)
Summary
-summary statistics at the cell- and gene/feature- level both before and after filtering
Scatterplot
-cell projections in 2D space resulting from t-SNE, UMAP, and PCA algorithms at different resolutions. This allows for in-depth exploration of the number of preferred clusters along with gene expression levels within those clusters
-also included is a summary violin + dot-plot showing the distribution of expression at the chosen gene and resolution
Cluster table
-table of marker genes for each cluster at the chosen resolution, identified using differential gene expression analysis comparing each cell cluster to all other cells
GO Table
-enrichment results of gene ontology terms for each cluster at the chosen resolution
Pathway Table
-enrichment results of biological pathways for each cluster at the chosen resolution
Heatmap
-heatmap of marker genes identified using differential gene expression analysis for each cluster
Output Files
Under the "Info" tab, intermediate files produced by the pipleline are available for viewing or download:
Alevin
extract/out1/alevin/whitelist.txt
-cell barcode whitelist used in the analysisQC, Trim
qc/<SAMPLE_NAME>.trim.report.html-a detailed report of raw sequencing quality, base content, estimates of PCR duplication level, insert size distribution, adapter content, and kmer overrepresentation
Align (STAR)
star/<SAMPLE_NAME>.<genome>.bam-compressed BAM file of the alignments
star/<SAMPLE_NAME>.<genome>.bam.bai-index file of the BAM alignments file
Expr count
featurecounts/<SAMPLE_NAME>.<genome>.counts_gene.txt-gene-level count matrix
featurecounts/<SAMPLE_NAME>.<genome>.counts_transcript.txt-transcript-level count matrix
Matrix
matrix/<SAMPLE_NAME>.<genome>.counts.tsv
-gene-by-cell count matrix with both gene name/symbol and gene ID present in tab-separated format
matrix/<SAMPLE_NAME>.<genome>.counts_id.tsv
-gene-by-cell count matrix with gene ID in tab-separated format
Seurat
seurat/features.csv
-gene-by-cell count matrix in comma-separated format
seurat/jackstraw.overall.pvalues.csv
-table of the statistical significance of genes in each principle component
seurat/metadata.csv
-table of seurat object metadata for each cell after filtering, normalization, and clustering
seurat/metadata-unfiltered.csv
-table of seurat object metadata for each cell prior to any data transformations
seurat/pcdata.csv
-resulting principle component data in a comma-separated table
seurat/seurat_object.rds
-seurat R data object containing raw and normalized data, and metadata
seurat/tsne.csv
-table of cell coordinates in t-SNE space
seurat/umap.csv
-table of cell coordinates in UMAP space
TopList
toplist/toplist/toplist/toplist.zip
-results of marker gene identification and heatmaps at resolutions from 0.1-2.0
GSEA GO & Pathway Analysis
gsea/gsea/<SAMPLE_NAME>.GO.zip
-gene set enrichment of GO terms for each cluster at each resolution in tab-separated (.tsv) files
gsea/gsea/<SAMPLE_NAME>.pathway.zip
-gene set enrichment of biological pathways for each cluster at each resolution in tab-separated (.tsv) files
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article