Generate a report for microbial ecology analysis of paired data — mStat_generate_report

This function generates a comprehensive report for microbial ecology analysis, including changes in alpha diversity, beta diversity, and taxonomic features between paired data. The function is specifically designed for analysis of paired data.

Usage

mStat_generate_report_pair(
  data.obj,
  group.var = NULL,
  strata.var = NULL,
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  subject.var,
  time.var,
  change.base,
  alpha.obj = NULL,
  alpha.name = c("shannon", "observed_species"),
  alpha.change.func = "log fold change",
  depth = NULL,
  dist.obj = NULL,
  dist.name = c("BC", "Jaccard"),
  pc.obj = NULL,
  prev.filter = 0.1,
  abund.filter = 1e-04,
  bar.area.feature.no = 30,
  heatmap.feature.no = 30,
  dotplot.feature.no = 30,
  vis.feature.level,
  test.feature.level,
  feature.dat.type = c("count", "proportion", "other"),
  feature.analysis.rarafy = TRUE,
  feature.change.func = "relative change",
  feature.mt.method = c("fdr"),
  feature.sig.level = 0.1,
  feature.box.axis.transform = c("sqrt"),
  base.size = 16,
  theme.choice = "bw",
  custom.theme = NULL,
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5,
  output.file
)

Arguments

data.obj

A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.

group.var

Variable name used for grouping samples.

strata.var

Character, column name in metadata containing stratification variable, e.g "sex". Optional.

test.adj.vars

Character vector, names of columns in the metadata containing covariates to be adjusted for in statistical tests and models, such as linear mixed effects models for longitudinal data analysis. This allows the user to account for the effects of additional variables in assessing the effects of primary variables of interest such as time and groups. Default is NULL, which indicates no covariates are adjusted for in statistical testing.

vis.adj.vars

Character vector, names of columns in the metadata containing covariates to visualize in plots, in addition to the primary variables of interest such as groups. For example, if sex is provided in vis.adj.vars, plots will display facets or colors for different sex groups. This allows visualization of effects across multiple covariates. Default is NULL, which indicates only the primary variables of interest will be visualized without additional covariates.

subject.var

Variable name used for subject identification.

time.var

Character, column name in metadata containing time variable, e.g. "week". Required.

change.base

The base level for calculating changes in paired data.

alpha.obj

An optional list containing pre-calculated alpha diversity indices. If NULL (default), alpha diversity indices will be calculated using mStat_calculate_alpha_diversity function from MicrobiomeStat package.

alpha.name

Names of alpha diversity indices to include in the analysis.

alpha.change.func

Function or method for calculating change in alpha diversity between two timepoints. This allows flexible options to quantify change:

- If a function is provided: The function will be applied to compare alpha diversity at timepoint t vs baseline t0. The function should take two arguments representing the alpha diversity values at t and t0. For instance, a custom function to calculate the percentage change might look like:


      percentage_change <- function(t, t0) {
        return ((t - t0) / t0) * 100
      }

You can then pass this function as the value for `alpha.change.func`.

- If a string is provided, the following options are supported: - 'log fold change': Calculates the log2 fold change of alpha diversity at t compared to t0. - 'absolute change': Calculates the absolute difference in alpha diversity at t compared to t0. - Any other value: A warning will be given that the provided method is not recognized, and the default method ('absolute change') will be used.

- Default behavior (if no recognized string or function is provided) is to compute the absolute difference between t and t0.

depth

An integer specifying the sequencing depth for the "Rarefy" and "Rarefy-TSS" methods. If NULL, no rarefaction is performed.

dist.obj

Distance matrix between samples, usually calculated using mStat_calculate_beta_diversity function. If NULL, beta diversity will be automatically computed from data.obj using mStat_calculate_beta_diversity.

dist.name

A character vector specifying which beta diversity indices to calculate. Supported indices are "BC" (Bray-Curtis), "Jaccard", "UniFrac" (unweighted UniFrac), "GUniFrac" (generalized UniFrac), "WUniFrac" (weighted UniFrac), and "JS" (Jensen-Shannon divergence). If a name is provided but the corresponding object does not exist within dist.obj, it will be computed internally. If the specific index is not supported, an error message will be returned. Default is c('BC', 'Jaccard').

pc.obj

A list containing the results of dimension reduction/Principal Component Analysis. This should be the output from functions like mStat_calculate_PC, containing the PC coordinates and other metadata. If NULL (default), dimension reduction will be automatically performed using metric multidimensional scaling (MDS) via mStat_calculate_PC. The pc.obj list structure should contain:

points: A matrix with samples as rows and PCs as columns containing the coordinates.
eig: Eigenvalues for each PC dimension.
vectors: Loadings vectors for features onto each PC.
Other metadata: like method, dist.name, etc.

See mStat_calculate_PC function for details on output format.

prev.filter

Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present. Default 0 removes no taxa by prevalence filtering.

abund.filter

Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type. Default 0 removes no taxa by abundance filtering.

bar.area.feature.no

A numeric value indicating the number of top abundant features to retain in both barplot and areaplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 40.

heatmap.feature.no

A numeric value indicating the number of top abundant features to retain in the heatmap. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 40.

dotplot.feature.no

A numeric value indicating the number of top abundant features to retain in the dotplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 40.

vis.feature.level

The column name in the feature annotation matrix (feature.ann) of data.obj to use for visualization and plotting. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be plotted separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.

test.feature.level

The column name in the feature annotation matrix (feature.ann) of data.obj to use for testing or analytical purposes. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be analyzed separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.

feature.dat.type

The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.

feature.analysis.rarafy

Logical, indicating whether to rarefy the data at the feature-level for analysis. If TRUE, the feature data will be rarefied before analysis. Default is TRUE.

feature.change.func

A function or character string specifying how to calculate the change from baseline value. This allows flexible options: - If a function is provided, it will be applied to each row to calculate change. The function should take 2 arguments: value at timepoint t and value at baseline t0. - If a character string is provided, following options are supported: - 'relative change': (value_t - value_t0) / (value_t + value_t0) - 'absolute change': value_t - value_t0 - 'log fold change': log2(value_t + 1e-5) - log2(value_t0 + 1e-5) - Default is 'relative change'.

If none of the above options are matched, an error will be thrown indicating the acceptable options or prompting the user to provide a custom function.

feature.mt.method

Character, multiple testing method for features, "fdr" or "none", default is "fdr".

feature.sig.level

Numeric, significance level cutoff for highlighting features, default is 0.1.

feature.box.axis.transform

A string indicating the transformation to apply to the y-axis of the feature's boxplot visualization before plotting. Options are: - "identity": No transformation (default). - "sqrt": Square root transformation. - "log": Logarithmic transformation. Zeros are replaced with half of the minimum non-zero value for each taxon before log transformation.

base.size

Base font size for the generated plots.

theme.choice

Plot theme choice. Specifies the visual style of the plot. Can be one of the following pre-defined themes: - "prism": Utilizes the ggprism::theme_prism() function from the ggprism package, offering a polished and visually appealing style. - "classic": Applies theme_classic() from ggplot2, providing a clean and traditional look with minimal styling. - "gray": Uses theme_gray() from ggplot2, which offers a simple and modern look with a light gray background. - "bw": Employs theme_bw() from ggplot2, creating a classic black and white plot, ideal for formal publications and situations where color is best minimized. - "light": Implements theme_light() from ggplot2, featuring a light theme with subtle grey lines and axes, suitable for a fresh, modern look. - "dark": Uses theme_dark() from ggplot2, offering a dark background, ideal for presentations or situations where a high-contrast theme is desired. - "minimal": Applies theme_minimal() from ggplot2, providing a minimalist theme with the least amount of background annotations and colors. - "void": Employs theme_void() from ggplot2, creating a blank canvas with no axes, gridlines, or background, ideal for custom, creative plots. Each theme option adjusts various elements like background color, grid lines, and font styles to match the specified aesthetic. Default is "bw", offering a universally compatible black and white theme suitable for a wide range of applications.

custom.theme

A custom ggplot theme provided as a ggplot2 theme object. This allows users to override the default theme and provide their own theme for plotting. Custom themes are useful for creating publication-ready figures with specific formatting requirements.

To use a custom theme, create a theme object with ggplot2::theme(), including any desired customizations. Common customizations for publication-ready figures might include adjusting text size for readability, altering line sizes for clarity, and repositioning or formatting the legend. For example:

“`r my_theme <- ggplot2::theme( axis.title = ggplot2::element_text(size=14, face="bold"), # Bold axis titles with larger font axis.text = ggplot2::element_text(size=12), # Slightly larger axis text legend.position = "top", # Move legend to the top legend.background = ggplot2::element_rect(fill="lightgray"), # Light gray background for legend panel.background = ggplot2::element_rect(fill="white", colour="black"), # White panel background with black border panel.grid.major = ggplot2::element_line(colour = "grey90"), # Lighter color for major grid lines panel.grid.minor = ggplot2::element_blank(), # Remove minor grid lines plot.title = ggplot2::element_text(size=16, hjust=0.5) # Centered plot title with larger font ) “`

Then pass `my_theme` to `custom.theme`. If `custom.theme` is NULL (the default), the theme is determined by `theme.choice`. This flexibility allows for both easy theme selection for general use and detailed customization for specific presentation or publication needs.

palette

An optional parameter specifying the color palette to be used for the plot. It can be either a character string specifying the name of a predefined palette or a vector of color codes in a format accepted by ggplot2 (e.g., hexadecimal color codes). Available predefined palettes include 'npg', 'aaas', 'nejm', 'lancet', 'jama', 'jco', and 'ucscgb', inspired by various scientific publications and the `ggsci` package. If `palette` is not provided or an unrecognized palette name is given, a default color palette will be used. Ensure the number of colors in the palette is at least as large as the number of groups being plotted.

pdf

Logical indicating whether to save plots as PDF files (default: TRUE).

file.ann

Annotation text for the PDF file names.

pdf.wid

Width of the PDF plots.

pdf.hei

Height of the PDF plots.

output.file

Output file name for the report.

Value

A report file containing the microbial ecology analysis results for paired data.

Examples

if (FALSE) { # \dontrun{
data(peerj32.obj)
mStat_generate_report_pair(
  data.obj = peerj32.obj,
  dist.obj = NULL,
  alpha.obj = NULL,
  group.var = "group",
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  subject.var = "subject",
  time.var = "time",
  alpha.name = c("shannon", "observed_species"),
  dist.name = c("BC",'Jaccard'),
  change.base = "1",
  feature.change.func = "relative change",
  strata.var = NULL,
  vis.feature.level = c("Phylum","Family","Genus"),
  test.feature.level = c("Genus"),
  feature.dat.type = "count",
  feature.mt.method = "none",
  feature.sig.level = 0.1,
  theme.choice = "bw",
  base.size = 18,
  output.file = "/Users/apple/MicrobiomeStat/report.pdf"
)

data(peerj32.obj)
mStat_generate_report_pair(
  data.obj = peerj32.obj,
  group.var = "group",
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  strata.var = "sex",
  subject.var = "subject",
  time.var = "time",
  change.base = "1",
  alpha.obj = NULL,
  alpha.name = c("shannon", "observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  feature.change.func = "relative change",
  vis.feature.level = c("Genus"),
  test.feature.level = c("Genus"),
  bar.area.feature.no = 30,
  heatmap.feature.no = 30,
  dotplot.feature.no = 20,
  feature.dat.type = "count",
  feature.mt.method = "none",
  feature.sig.level = 0.1,
  theme.choice = "bw",
  base.size = 18,
  output.file = "/Users/apple/MicrobiomeStat/report.pdf")

data(subset_pairs.obj)
mStat_generate_report_pair(
  data.obj = subset_pairs.obj,
  group.var = "Sex",
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  strata.var = NULL,
  subject.var = "MouseID",
  time.var = "Antibiotic",
  change.base = "Baseline",
  alpha.obj = NULL,
  alpha.name = c("shannon", "observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  feature.change.func = "relative change",
  vis.feature.level = c("Genus", "Family"),
  test.feature.level = c("Genus", "Family"),
  bar.area.feature.no = 30,
  heatmap.feature.no = 30,
  dotplot.feature.no = 20,
  feature.dat.type = "count",
  feature.mt.method = "none",
  feature.sig.level = 0.1,
  theme.choice = "bw",
  base.size = 18,
  output.file = "/Users/apple/MicrobiomeStat/report.pdf"
)

} # }