Skip to contents

This function performs hierarchical clustering on microbiome data based on grouping variables and strata variables in sample metadata and generates stacked heatmaps using the “pheatmap” package. It can also save the resulting heatmap as a PDF file.

Usage

generate_taxa_change_heatmap_long(
  data.obj,
  subject.var,
  time.var,
  t0.level = NULL,
  ts.levels = NULL,
  group.var = NULL,
  strata.var = NULL,
  feature.level,
  feature.change.func = "relative change",
  feature.dat.type = c("count", "proportion", "other"),
  features.plot = NULL,
  top.k.plot = NULL,
  top.k.func = NULL,
  prev.filter = 0.01,
  abund.filter = 0.01,
  base.size = 10,
  palette = NULL,
  cluster.rows = NULL,
  cluster.cols = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5,
  ...
)

Arguments

data.obj

A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list).

subject.var

A character string specifying the subject variable in the metadata.

time.var

A character string specifying the time variable in the metadata.

t0.level

Character or numeric, baseline time point for longitudinal analysis, e.g. "week_0" or 0. Required.

ts.levels

Character vector, names of follow-up time points, e.g. c("week_4", "week_8"). Required.

group.var

A character string specifying the grouping variable in the metadata. Default is NULL.

strata.var

(Optional) A character string specifying the stratification variable in the metadata. Default is NULL.

feature.level

A character string defining the taxonomic level to analyze ('Phylum', 'Family', or 'Genus').

feature.change.func

A function or character string specifying how to calculate the change from baseline value. This allows flexible options: - If a function is provided, it will be applied to each row to calculate change. The function should take 2 arguments: value at timepoint t and value at baseline t0. - If a character string is provided, following options are supported: - 'relative change': (value_t - value_t0) / (value_t + value_t0) - 'absolute change': value_t - value_t0 - 'log fold change': log2(value_t + 1e-5) - log2(value_t0 + 1e-5) - Default is 'relative change'.

If none of the above options are matched, an error will be thrown indicating the acceptable options or prompting the user to provide a custom function.

feature.dat.type

The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.

features.plot

A character vector specifying which feature IDs (e.g. OTU IDs) to plot. Default is NULL, in which case features will be selected based on `top.k.plot` and `top.k.func`.

top.k.plot

A numeric value specifying the number of top taxa to be plotted if features.plot is NULL. If NULL (default), all taxa will be plotted.

top.k.func

A function to compute the top k taxa if features.plot is NULL. If NULL (default), the mean function will be used.

prev.filter

Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present. Default 0 removes no taxa by prevalence filtering.

abund.filter

Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type. Default 0 removes no taxa by abundance filtering.

base.size

Base font size for the generated plots.

palette

Specifies the color palette to be used for annotating groups and strata in the heatmap. The parameter can be provided in several ways: - As a character string denoting a predefined palette name. Available predefined palettes include 'npg', 'aaas', 'nejm', 'lancet', 'jama', 'jco', and 'ucscgb', sourced from the `mStat_get_palette` function. - As a vector of color codes in a format accepted by ggplot2 (e.g., hexadecimal color codes). If `palette` is NULL or an unrecognized string, a default color palette will be used. The function assigns colors from this palette to the unique levels of `group.var` and, if provided, `strata.var`. When both `group.var` and `strata.var` are present, `group.var` levels are colored using the beginning of the palette, while `strata.var` levels are colored using the reversed palette, ensuring a distinct color representation for each. If only `group.var` is provided, its levels are assigned colors from the palette sequentially. If neither `group.var` nor `strata.var` is provided, no annotation colors are applied.

cluster.rows

A logical variable indicating if rows should be clustered. Default is TRUE.

cluster.cols

A logical variable indicating if columns should be clustered. Default is FALSE.

pdf

If TRUE, save the plot as a PDF file (default: TRUE)

file.ann

(Optional) A character string specifying a file annotation to include in the generated PDF file's name.

pdf.wid

Width of the PDF plots.

pdf.hei

Height of the PDF plots.

...

Additional arguments passed to pheatmap.

Value

A list of ggplot heatmap objects, one for each taxonomic level.

An object of class pheatmap, the generated heatmap plot

Details

This parameter is used to compute the change columns from baseline (specified by t0.level) for each taxon. The change values are calculated for each timepoint and appended as new columns in the data frame before plotting heatmap. This allows flexibly customizing how change is quantified.

This function generates a separate heatmap for each taxonomic level specified, with rows clustered and layers arranged by groups over timepoints. It automatically rarefies raw count data using Rarefy-TSS normalization in MicrobiomeStat. Annotation columns are generated and ordered properly for visually stacking the layers. Colormaps are also generated for group and strata variables.

See also

pheatmap for heatmap, mStat_normalize_data for data normalization.

pheatmap

Examples

if (FALSE) { # \dontrun{
library(pheatmap)
data(ecam.obj)

generate_taxa_change_heatmap_long(
  data.obj = ecam.obj,
  subject.var = "studyid",
  time.var = "month",
  t0.level = unique(ecam.obj$meta.dat$month)[1],
  ts.levels = unique(ecam.obj$meta.dat$month)[-1],
  group.var = "antiexposedall",
  strata.var = "diet",
  feature.level = c("Family","Class"),
  feature.change.func = "log fold change",
  feature.dat.type = "proportion",
  features.plot = NULL,
  top.k.plot = 10,
  top.k.func = "sd",
  palette = NULL,
  prev.filter = 0.01,
  abund.filter = 0.01,
  pdf = TRUE,
  file.ann = NULL
)

generate_taxa_change_heatmap_long(
  data.obj = ecam.obj,
  subject.var = "studyid",
  time.var = "month",
  t0.level = unique(ecam.obj$meta.dat$month)[1],
  ts.levels = unique(ecam.obj$meta.dat$month)[-1],
  group.var = "antiexposedall",
  strata.var = "diet",
  feature.level = c("Family","Class"),
  feature.change.func = "log fold change",
  feature.dat.type = "proportion",
  features.plot = NULL,
  cluster.rows = FALSE,
  top.k.plot = 10,
  top.k.func = "sd",
  palette = NULL,
  prev.filter = 0.01,
  abund.filter = 0.01,
  pdf = TRUE,
  file.ann = NULL
)

data(subset_T2D.obj)
generate_taxa_change_heatmap_long(
  data.obj = subset_T2D.obj,
  subject.var = "subject_id",
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  group.var = "subject_gender",
  strata.var = "subject_race",
  feature.level = c("Phylum"),
  feature.change.func = "log fold change",
  feature.dat.type = "count",
  features.plot = NULL,
  top.k.plot = 50,
  top.k.func = "mean",
  prev.filter = 0.01,
  abund.filter = 0.01,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)

generate_taxa_change_heatmap_long(
  data.obj = subset_T2D.obj,
  subject.var = "subject_id",
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  group.var = "subject_gender",
  strata.var = "subject_race",
  feature.level = c("Phylum"),
  feature.change.func = "log fold change",
  feature.dat.type = "count",
  features.plot = NULL,
  cluster.rows = FALSE,
  top.k.plot = 50,
  top.k.func = "mean",
  prev.filter = 0.01,
  abund.filter = 0.01,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
} # }