Lab Heatmap Workflow

Input Files

Expression matrix

Upload one table where the first column contains feature IDs and all remaining columns are samples.

For RNA-seq count workflows, use raw non-negative integer counts. Do not upload TPM, CPM, FPKM, or log values and then select count-based methods.

Plasmidsaurus expression matrix

Plasmidsaurus files contain gene annotation columns plus paired sample columns ending in `_cpm` and `_count`. Choose the Plasmidsaurus upload format to let the app convert those columns into a standard feature-by-sample matrix.

Use CPM columns as normalized expression. Count columns may contain fractional estimated counts, so do not assume they are valid raw integer counts for DESeq2 unless your file truly contains integer counts.

When gene symbols are selected, blank symbols fall back to Ensembl gene IDs so unnamed rows do not break the upload.

Gene symbols in Plasmidsaurus files can be duplicated. Automatic duplicate handling averages duplicated symbols for CPM columns and sums duplicated symbols for count columns.

Core-generated raw counts

Core raw-count files can contain sample names in the first row and Ensembl gene IDs in an unlabeled first column of each data row. Choose the Core-generated raw-count upload format to repair that missing first-column header automatically.

The app preserves Ensembl IDs internally for uniqueness, count validation, DESeq2, and export. For display, mouse or human gene symbols can be shown when the matching annotation package is available.

If no metadata file is uploaded, the app creates a one-column sample metadata template from the sample names.

Sample metadata

Upload one table where the first column contains sample IDs. These IDs must match the expression matrix column names exactly.

Additional columns can describe condition, batch, donor, sex, timepoint, treatment, tissue, or other annotations.

Duplicate feature IDs

Automatic mode keeps strict validation for standard uploads and uses provider-aware defaults for Plasmidsaurus gene-symbol uploads.

Strict mode stops when feature IDs are duplicated. If duplicates are expected, choose whether to keep the first row, sum duplicate rows, or average duplicate rows.

For raw RNA-seq counts, summing duplicated gene IDs is usually more appropriate than averaging because counts are additive.

Normalization

Input data type

Raw integer counts enables DESeq2, edgeR CPM, and edgeR filtering. Normalized expression and already log-transformed matrices keep count-based methods disabled.

This protects users from accidentally treating TPM, CPM, or log expression as raw counts.

DESeq2 VST

DESeq2 variance-stabilizing transformation estimates size factors from raw counts and transforms the data for exploratory heatmaps, clustering, sample distances, and PCA.

This app uses a blind VST for visualization and QC. It is not differential expression testing and does not correct for a study design formula.

For Plasmidsaurus estimated count columns, the app also offers an experimental VST option that rounds fractional estimates first. This requires an explicit checkbox because rounding changes the uploaded values.

edgeR TMM-normalized CPM

edgeR calculates normalization factors that adjust for library size and composition effects, then reports counts per million on the normalized scale.

The log2 CPM option uses edgeR's log CPM calculation with a prior count to reduce instability from very small counts.

log2(x + 1)

This simple transform compresses large values and can be useful for quick visualization. It is not a substitute for DESeq2 or edgeR normalization when raw RNA-seq counts are available.

Filtering and Gene Selection

edgeR filterByExpr

edgeR filterByExpr removes genes with too little expression to be useful for downstream RNA-seq exploration.

Choose a filtering group that reflects the biological comparison of interest, such as condition. If no group is chosen, edgeR treats the data without a biological grouping.

Manual CPM threshold

Manual filtering keeps genes that reach the selected CPM threshold in at least the selected number of samples.

This is useful when users want simple, transparent filtering rules or when edgeR filtering is too strict for a small pilot dataset.

Specific genes and top changing genes

Users can pick genes, paste a gene list, or upload a gene-list file. If no gene list is provided, the app keeps the top genes with the largest across-sample change after any low-count filtering.

For non-negative expression values, ranking uses log2(x + 1) before measuring the sample-to-sample range. This is for display only; it is not differential expression analysis.

Heatmaps and QC

Expression heatmap

Shows the selected normalized or uploaded values using the chosen color palette, clustering, labels, and metadata annotations.

Z-score heatmap

Centers and scales each feature across samples so patterns are easier to compare. Z-scores show relative within-gene patterns, not absolute expression abundance.

Mean difference of displayed values

Subtracts the baseline group mean from each comparison group mean using the displayed matrix values.

This is descriptive only. It is not DESeq2 or edgeR differential expression, does not estimate dispersion, and does not provide p-values or adjusted p-values.

Differential expression

DESeq2 is used for raw non-negative integer counts with biological groups, replicates, and an explicit design.

limma-voom is available for count-like inputs, including fractional estimated counts, and models the mean-variance trend with precision weights.

Plasmidsaurus fractional count columns should be treated as estimated count-like values. DESeq2 rounding is available only as an explicit opt-in and is labeled in the output.

GSEA

The GSEA tab runs preranked gene set enrichment analysis from the current modeled differential-expression result.

Choose the organism and gene identifier type that match the DE feature IDs. Hallmark, KEGG, Reactome, C5 ontology, GO Biological Process, and C7 immunologic-signature gene sets are loaded from MSigDB through msigdbr, and fgsea computes normalized enrichment scores and adjusted p-values.

Volcano and sample contrast plots

The Differential Expression tab contains modeled volcano plots from DESeq2 or limma-voom.

The Sample Contrast tab is descriptive. It compares two selected samples and does not compute p-values or model replicates.

For large uploads, the sample contrast plot can show the strongest-changing genes while the table retains every gene. This keeps hover labels usable in browsers without WebGL.

Library size and detected features

These QC plots are shown only for raw-count input. Library size is the total count per sample, and detected features are rows with counts greater than zero.

PCA and sample correlation

PCA and correlation help identify sample relationships, batch effects, outliers, or other strong structure.

For non-count expression uploads, these QC views use log2(x + 1)-scaled top-changing features so raw abundance does not dominate the plots.

These views are exploratory and can reflect batch, library quality, tissue composition, or other covariates.

Workflow export

The workflow JSON records the selected settings so the same heatmap choices can be reviewed or reproduced later.

Acknowledgment and manuscript reporting

The generated methods text includes a suggested acknowledgment that credits the Russell Jones Lab and Brandon Oswald, PhD, and states that the app was developed for use at Van Andel Institute.

If app-derived figures, results, or workflow settings are used in a manuscript, thesis, poster, or presentation, report the software in the Methods section and keep the credit language unless journal policy requires edited wording.

Data Check

Expression Preview

Metadata Preview

Upload format

Plasmidsaurus values

Feature names

Use CPM columns for default Plasmidsaurus heatmaps. If you choose estimated count columns, use the explicit rounded DESeq2 VST option for exploratory heatmaps or limma-voom for DE.

Core feature labels

Core raw-count files can have an unlabeled first Ensembl-ID column. The app preserves Ensembl IDs internally and can display gene symbols when annotation packages are available.

Expression matrix

Browse...

Sample metadata

Browse...

Duplicate feature IDs

Expression matrix: first column is feature IDs, remaining columns are samples. Metadata: first column is sample IDs matching count matrix columns.

Selected Data Summary

Selected Genes

DESeq2 VST here is a blind transformation for exploratory visualization and QC, not design-aware correction or differential expression.

Low-count filtering

Minimum CPM

Minimum samples

Use selected sample order

Top changing genes when no gene list is selected

Pick genes

Paste genes

Gene list file

Browse...

Editable Metadata

Click cells to edit. Use Add column for condition, batch, treatment, donor, or other sample annotations, then apply the edits.

Active Metadata Preview

Download Figures

Width in inches

Height in inches

Expression PDF Z-score PDF Mean difference PDF Expression SVG Z-score SVG Mean difference SVG DE volcano PDF DE volcano SVG Expression PNG

Download Data

Metadata TSV Expression matrix CSV Z-score matrix CSV Mean difference matrix CSV QC CSV DE results CSV GSEA results CSV

Download Workflow

Workflow JSON

Methods Text

Generated from the current workflow settings. Review and edit for journal style before submission. The export includes a suggested acknowledgment and manuscript reminder.

Methods and versions TXT

Close Lab Heatmap Workflow

Use this when you are finished. It stops the local Shiny server for this app.