Generate Taxonomic Heatmap Pair
Source:R/generate_taxa_heatmap_pair.R
generate_taxa_heatmap_pair.Rd
This function performs hierarchical clustering on microbiome data based on grouping variables and strata variables in sample metadata and generates stacked heatmaps using the “pheatmap” package. It can also save the resulting heatmap as a PDF file.
Usage
generate_taxa_heatmap_pair(
data.obj,
subject.var,
time.var,
group.var = NULL,
strata.var = NULL,
feature.level = NULL,
feature.dat.type = c("count", "proportion", "other"),
features.plot = NULL,
top.k.plot = NULL,
top.k.func = NULL,
prev.filter = 0.01,
abund.filter = 0.01,
base.size = 10,
palette = NULL,
cluster.rows = NULL,
cluster.cols = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5,
...
)
Arguments
- data.obj
A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.
- subject.var
A character string specifying the subject variable in the metadata.
- time.var
A character string specifying the time variable in the metadata.
- group.var
A character string specifying the grouping variable in the metadata. Default is NULL.
- strata.var
A character string specifying the stratification variable in the metadata. Default is NULL.
- feature.level
The column name in the feature annotation matrix (feature.ann) of data.obj to use for summarization and plotting. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be plotted separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.
- feature.dat.type
The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.
- features.plot
A character vector specifying which feature IDs (e.g. OTU IDs) to plot. Default is NULL, in which case features will be selected based on `top.k.plot` and `top.k.func`.
- top.k.plot
Integer specifying number of top k features to plot, when `features.plot` is NULL. Default is NULL, in which case all features passing filters will be plotted.
- top.k.func
Function to use for selecting top k features, when `features.plot` is NULL. Options include inbuilt functions like "mean", "sd", or a custom function. Default is NULL, in which case features will be selected by abundance.
- prev.filter
Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present. Default 0 removes no taxa by prevalence filtering.
- abund.filter
Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on
feature.dat.type
. Default 0 removes no taxa by abundance filtering.- base.size
Base font size for the generated plots.
- palette
The color palette to be used for annotating the plots. This parameter can be specified in several ways: - As a character string representing a predefined palette name. Available predefined palettes include 'npg', 'aaas', 'nejm', 'lancet', 'jama', 'jco', and 'ucscgb'. - As a vector of color codes in a format accepted by ggplot2 (e.g., hexadecimal color codes). The function uses `mStat_get_palette` to retrieve or generate the color palette. If `palette` is NULL or an unrecognized string, a default color palette will be used. The colors are applied to the specified grouping variables (`group.var`, `strata.var`) in the heatmap, ensuring each level of these variables is associated with a unique color. If both `group.var` and `strata.var` are specified, the function assigns colors to `group.var` from the start of the palette and to `strata.var` from the end, ensuring distinct color representations for each annotation layer.
- cluster.rows
A logical variable indicating if rows should be clustered. Default is TRUE.
- cluster.cols
A logical variable indicating if columns should be clustered. Default is FALSE.
A logical value. If TRUE (default), saves the plot as a PDF file. If FALSE, the plot will be displayed interactively without creating a PDF.
- file.ann
(Optional) A character string specifying a file annotation to include in the generated PDF file's name.
- pdf.wid
Width of the PDF plots.
- pdf.hei
Height of the PDF plots.
- ...
Additional parameters to be passed to the pheatmap() function from the “pheatmap::pheatmap” package.
Examples
if (FALSE) { # \dontrun{
# Load required libraries and example data
library(pheatmap)
data(peerj32.obj)
generate_taxa_heatmap_pair(
data.obj = peerj32.obj,
subject.var = "subject",
time.var = "time",
group.var = "group",
strata.var = NULL,
feature.level = c("Phylum","Family","Genus"),
feature.dat.type = "count",
features.plot = NULL,
top.k.plot = NULL,
top.k.func = NULL,
prev.filter = 0.01,
abund.filter = 0.001,
cluster.rows = NULL,
cluster.cols = NULL,
base.size = 12,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
data(subset_T2D.obj)
subset_T2D.obj2 <- mStat_subset_data(subset_T2D.obj,
condition = "visit_number %in% c(' 1', ' 2')")
generate_taxa_heatmap_pair(
data.obj = subset_T2D.obj2,
subject.var = "subject_id",
time.var = "visit_number",
group.var = "subject_race",
strata.var = "subject_gender",
feature.level = c("Phylum","Family","Genus"),
feature.dat.type = "count",
features.plot = NULL,
top.k.plot = NULL,
top.k.func = NULL,
prev.filter = 0.01,
abund.filter = 0.001,
cluster.rows = NULL,
cluster.cols = NULL,
base.size = 12,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
data("subset_pairs.obj")
generate_taxa_heatmap_pair(
data.obj = subset_pairs.obj,
subject.var = "MouseID",
time.var = "Antibiotic",
group.var = "Sex",
strata.var = NULL,
feature.level = c("Phylum","Family","Genus"),
feature.dat.type = "count",
features.plot = NULL,
top.k.plot = NULL,
top.k.func = NULL,
prev.filter = 0.01,
abund.filter = 0.001,
cluster.rows = NULL,
cluster.cols = NULL,
base.size = 12,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
generate_taxa_heatmap_pair(
data.obj = subset_pairs.obj,
subject.var = "MouseID",
time.var = "Antibiotic",
group.var = "Sex",
strata.var = NULL,
feature.level = c("Phylum","Family","Genus"),
feature.dat.type = "count",
features.plot = NULL,
top.k.plot = NULL,
top.k.func = NULL,
prev.filter = 0.01,
abund.filter = 0.001,
cluster.rows = FALSE,
cluster.cols = NULL,
base.size = 12,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
} # }