Generate Taxa Boxplots for Single Time Point — generate_taxa_boxplot

Creates boxplots showing taxa abundance distributions at a single time point. Supports grouping, stratification, and various transformations.

Usage

generate_taxa_boxplot_single(
  data.obj,
  time.var = NULL,
  t.level = NULL,
  group.var = NULL,
  strata.var = NULL,
  feature.level,
  feature.dat.type = c("count", "proportion", "other"),
  features.plot = NULL,
  top.k.plot = NULL,
  top.k.func = NULL,
  transform = c("sqrt", "identity", "log"),
  prev.filter = 0.01,
  abund.filter = 0.01,
  base.size = 16,
  theme.choice = "bw",
  custom.theme = NULL,
  palette = NULL,
  point.alpha = 0.6,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5,
  ...
)

Arguments

data.obj

A MicrobiomeStat data object, which is a list containing at minimum the following components:

feature.tab: A matrix of feature abundances (taxa/genes as rows, samples as columns)
meta.dat: A data frame of sample metadata (samples as rows)

Optional components include:

feature.ann: A matrix/data frame of feature annotations (e.g., taxonomy)
tree: A phylogenetic tree object (class "phylo")
feature.agg.list: Pre-aggregated feature tables by taxonomy

Data objects can be created using converters like mStat_convert_phyloseq_to_data_obj or importers like mStat_import_qiime2_as_data_obj.

time.var

Character string specifying the column name in meta.dat containing the time variable. Required for longitudinal and paired analyses. Supports character/factor labels (e.g., "baseline", "week4") and numeric values. Some trend/volatility methods require numeric or coercible-to-numeric time values.

t.level

Character string specifying the time level to subset data to. Default NULL uses all data.

group.var

Character string specifying the column name in meta.dat containing the grouping variable (e.g., treatment, condition, phenotype). Used for between-group comparisons.

strata.var

Character string specifying the column name in meta.dat for stratification. When provided, analyses and visualizations will be performed separately within each stratum (e.g., by site, batch, or sex).

feature.level

Character vector specifying the taxonomic or annotation level(s) for analysis. Should match column names in feature.ann, such as "Phylum", "Family", "Genus", etc. Use "original" to analyze at the original feature level without aggregation.

feature.dat.type

Character string specifying the data type of feature.tab. One of:

"count": Raw count data (will be normalized)
"proportion": Relative abundance data (should sum to 1 per sample)
"other": Pre-transformed data (no transformation applied)

features.plot

Character vector of specific feature IDs to plot. If NULL, features are selected based on top.k.plot and top.k.func.

top.k.plot

Integer specifying number of top features to plot.

top.k.func

Function for selecting top features (e.g., "mean", "sd").

transform

Transformation to apply: "identity", "sqrt", or "log".

prev.filter

Numeric value between 0 and 1. Features with prevalence (proportion of non-zero samples) below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

abund.filter

Numeric value. Features with mean abundance below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

base.size

Numeric value specifying the base font size for plot text elements. Default is typically 16.

theme.choice

Character string specifying the ggplot2 theme to use. Options include:

"bw": Black and white theme (theme_bw)
"classic": Classic theme (theme_classic)
"gray": Gray theme (theme_gray)
"light": Light theme (theme_light)
"dark": Dark theme (theme_dark)
"minimal": Minimal theme (theme_minimal)
"void": Void theme (theme_void)
"prism": GraphPad Prism-like theme

Can also use a custom ggplot2 theme object via custom.theme.

custom.theme

A custom ggplot2 theme object to override theme.choice. Should be created using ggplot2::theme() or a complete theme function.

palette

Character vector of colors or a named palette for the plot. If NULL, uses default MicrobiomeStat color scheme. Can be:

A vector of color codes (e.g., c("#E41A1C", "#377EB8"))
A palette name recognized by the plotting function

point.alpha

Numeric value (0-1) for jitter point transparency. Default 0.6.

pdf

Logical. If TRUE, saves the plot(s) to PDF file(s) in the current working directory. Default is TRUE.

file.ann

Character string for additional annotation to append to output filenames. Useful for distinguishing multiple outputs.

pdf.wid

Numeric value specifying the width of PDF output in inches. Default is typically 11.

pdf.hei

Numeric value specifying the height of PDF output in inches. Default is typically 8.5.

...

Additional arguments passed to underlying functions.

Value

A list of ggplot objects for each taxonomic level.

Examples

if (FALSE) { # \dontrun{
# Generate the boxplot pair
data(ecam.obj)
generate_taxa_boxplot_single(
  data.obj = ecam.obj,
  time.var = "month",
  t.level = "1",
  group.var = "diet",
  strata.var = NULL,
  feature.level = c("Phylum"),
  features.plot = sample(unique(ecam.obj$feature.ann[,"Phylum"]),3),
  feature.dat.type = "proportion",
  transform = "log",
  prev.filter = 0,
  abund.filter = 0,
  base.size = 12,
  theme.choice = "classic",
  custom.theme = NULL,
  palette = NULL,
  point.alpha = 0.6,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
generate_taxa_boxplot_single(
  data.obj = ecam.obj,
  time.var = "month",
  t.level = "1",
  group.var = "diet",
  strata.var = "antiexposedall",
  feature.level = c("Phylum"),
  features.plot = sample(unique(ecam.obj$feature.ann[,"Phylum"]),3),
  feature.dat.type = "proportion",
  transform = "log",
  prev.filter = 0,
  abund.filter = 0,
  base.size = 12,
  theme.choice = "classic",
  custom.theme = NULL,
  palette = NULL,
  point.alpha = 0.6,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
generate_taxa_boxplot_single(
  data.obj = ecam.obj,
  time.var = "month",
  t.level = "1",
  group.var = NULL,
  strata.var = NULL,
  feature.level = c("Order", "Phylum", "Genus"),
  features.plot = NULL,
  feature.dat.type = "proportion",
  transform = "log",
  prev.filter = 0,
  abund.filter = 0,
  base.size = 12,
  theme.choice = "classic",
  custom.theme = NULL,
  palette = NULL,
  point.alpha = 0.6,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
data(peerj32.obj)
generate_taxa_boxplot_single(
  data.obj = peerj32.obj,
  time.var = "time",
  t.level = "1",
  group.var = "group",
  strata.var = NULL,
  feature.level = c("Family"),
  feature.dat.type = "count",
  features.plot = NULL,
  top.k.plot = NULL,
  top.k.func = NULL,
  transform = "log",
  prev.filter = 0.1,
  abund.filter = 0.0001,
  base.size = 12,
  theme.choice = "bw",
  custom.theme = NULL,
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
data(peerj32.obj)
generate_taxa_boxplot_single(
  data.obj = peerj32.obj,
  time.var = "time",
  t.level = "1",
  group.var = "group",
  strata.var = "sex",
  feature.level = c("Family"),
  feature.dat.type = "count",
  features.plot = NULL,
  top.k.plot = NULL,
  top.k.func = NULL,
  transform = "log",
  prev.filter = 0.1,
  abund.filter = 0.0001,
  base.size = 12,
  theme.choice = "bw",
  custom.theme = NULL,
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)
} # }