Input Files

Upload one table where the first column contains feature IDs and all remaining columns are samples.

For RNA-seq count workflows, use raw non-negative integer counts. Do not upload TPM, CPM, FPKM, or log values and then select count-based methods.

Plasmidsaurus files contain gene annotation columns plus paired sample columns ending in `_cpm` and `_count`. Choose the Plasmidsaurus upload format to let the app convert those columns into a standard feature-by-sample matrix.

Use CPM columns as normalized expression. Count columns may contain fractional estimated counts, so do not assume they are valid raw integer counts for DESeq2 unless your file truly contains integer counts.

When gene symbols are selected, blank symbols fall back to Ensembl gene IDs so unnamed rows do not break the upload.

Gene symbols in Plasmidsaurus files can be duplicated. Automatic duplicate handling averages duplicated symbols for CPM columns and sums duplicated symbols for count columns.

Core raw-count files can contain sample names in the first row and Ensembl gene IDs in an unlabeled first column of each data row. Choose the Core-generated raw-count upload format to repair that missing first-column header automatically.

The app preserves Ensembl IDs internally for uniqueness, count validation, DESeq2, and export. For display, mouse or human gene symbols can be shown when the matching annotation package is available.

If no metadata file is uploaded, the app creates a one-column sample metadata template from the sample names.

Upload one table where the first column contains sample IDs. These IDs must match the expression matrix column names exactly.

Additional columns can describe condition, batch, donor, sex, timepoint, treatment, tissue, or other annotations.

Automatic mode keeps strict validation for standard uploads and uses provider-aware defaults for Plasmidsaurus gene-symbol uploads.

Strict mode stops when feature IDs are duplicated. If duplicates are expected, choose whether to keep the first row, sum duplicate rows, or average duplicate rows.

For raw RNA-seq counts, summing duplicated gene IDs is usually more appropriate than averaging because counts are additive.

Normalization

Raw integer counts enables DESeq2, edgeR CPM, and edgeR filtering. Normalized expression and already log-transformed matrices keep count-based methods disabled.

This protects users from accidentally treating TPM, CPM, or log expression as raw counts.

DESeq2 variance-stabilizing transformation estimates size factors from raw counts and transforms the data for exploratory heatmaps, clustering, sample distances, and PCA.

This app uses a blind VST for visualization and QC. It is not differential expression testing and does not correct for a study design formula.

For Plasmidsaurus estimated count columns, the app also offers an experimental VST option that rounds fractional estimates first. This requires an explicit checkbox because rounding changes the uploaded values.

edgeR calculates normalization factors that adjust for library size and composition effects, then reports counts per million on the normalized scale.

The log2 CPM option uses edgeR's log CPM calculation with a prior count to reduce instability from very small counts.

This simple transform compresses large values and can be useful for quick visualization. It is not a substitute for DESeq2 or edgeR normalization when raw RNA-seq counts are available.

Filtering and Gene Selection

edgeR filterByExpr removes genes with too little expression to be useful for downstream RNA-seq exploration.

Choose a filtering group that reflects the biological comparison of interest, such as condition. If no group is chosen, edgeR treats the data without a biological grouping.

Manual filtering keeps genes that reach the selected CPM threshold in at least the selected number of samples.

This is useful when users want simple, transparent filtering rules or when edgeR filtering is too strict for a small pilot dataset.

Users can pick genes, paste a gene list, or upload a gene-list file. If no gene list is provided, the app keeps the top genes with the largest across-sample change after any low-count filtering.

For non-negative expression values, ranking uses log2(x + 1) before measuring the sample-to-sample range. This is for display only; it is not differential expression analysis.

Heatmaps and QC

Shows the selected normalized or uploaded values using the chosen color palette, clustering, labels, and metadata annotations.

Centers and scales each feature across samples so patterns are easier to compare. Z-scores show relative within-gene patterns, not absolute expression abundance.

Subtracts the baseline group mean from each comparison group mean using the displayed matrix values.

This is descriptive only. It is not DESeq2 or edgeR differential expression, does not estimate dispersion, and does not provide p-values or adjusted p-values.

DESeq2 is used for raw non-negative integer counts with biological groups, replicates, and an explicit design.

limma-voom is available for count-like inputs, including fractional estimated counts, and models the mean-variance trend with precision weights.

Plasmidsaurus fractional count columns should be treated as estimated count-like values. DESeq2 rounding is available only as an explicit opt-in and is labeled in the output.

The GSEA tab runs preranked gene set enrichment analysis from the current modeled differential-expression result.

Choose the organism and gene identifier type that match the DE feature IDs. Hallmark, KEGG, Reactome, C5 ontology, GO Biological Process, and C7 immunologic-signature gene sets are loaded from MSigDB through msigdbr, and fgsea computes normalized enrichment scores and adjusted p-values.

The Differential Expression tab contains modeled volcano plots from DESeq2 or limma-voom.

The Sample Contrast tab is descriptive. It compares two selected samples and does not compute p-values or model replicates.

For large uploads, the sample contrast plot can show the strongest-changing genes while the table retains every gene. This keeps hover labels usable in browsers without WebGL.

These QC plots are shown only for raw-count input. Library size is the total count per sample, and detected features are rows with counts greater than zero.

PCA and correlation help identify sample relationships, batch effects, outliers, or other strong structure.

For non-count expression uploads, these QC views use log2(x + 1)-scaled top-changing features so raw abundance does not dominate the plots.

These views are exploratory and can reflect batch, library quality, tissue composition, or other covariates.

The workflow JSON records the selected settings so the same heatmap choices can be reviewed or reproduced later.

The generated methods text includes a suggested acknowledgment that credits the Russell Jones Lab and Brandon Oswald, PhD, and states that the app was developed for use at Van Andel Institute.

If app-derived figures, results, or workflow settings are used in a manuscript, thesis, poster, or presentation, report the software in the Methods section and keep the credit language unless journal policy requires edited wording.

Data Check

                
Expression Preview
Metadata Preview
Selected Data Summary

                    
Selected Genes
Editable Metadata
Click cells to edit. Use Add column for condition, batch, treatment, donor, or other sample annotations, then apply the edits.
Active Metadata Preview
Download Current Heatmap
Heatmap PDF with methods Heatmap SVG

                    

                    
Download Workflow
Methods Text
Generated from the current workflow settings. Review and edit for journal style before submission. The export includes a suggested acknowledgment and manuscript reminder.
Methods and versions TXT
Close Lab Heatmap Workflow

Use this when you are finished. It stops the local Shiny server for this app.