Visualize Beta Diversity Changes Over Time
Source:R/generate_beta_pc_change_boxplot_pair.R
generate_beta_pc_change_boxplot_pair.Rd
This function generates a series of boxplots visualizing changes in beta diversity PCoA/PCA coordinates over time. It is designed to work with longitudinal microbiome composition data across multiple time points. The primary output is boxplots showing distributions of coordinate changes between two user-specified time points. These changes can be stratified by groups. Violin plots are overlaid to show density distributions.
Usage
generate_beta_pc_change_boxplot_pair(
data.obj = NULL,
dist.obj = NULL,
pc.obj = NULL,
pc.ind = c(1, 2),
subject.var,
time.var,
group.var = NULL,
strata.var = NULL,
adj.vars = NULL,
change.base = NULL,
change.func = "absolute change",
dist.name = c("BC", "Jaccard"),
base.size = 16,
theme.choice = "bw",
custom.theme = NULL,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5,
...
)
Arguments
- data.obj
A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.
- dist.obj
Distance matrix between samples, usually calculated using
mStat_calculate_beta_diversity
function. If NULL, beta diversity will be automatically computed fromdata.obj
usingmStat_calculate_beta_diversity
.- pc.obj
A list containing the results of dimension reduction/Principal Component Analysis. This should be the output from functions like
mStat_calculate_PC
, containing the PC coordinates and other metadata. If NULL (default), dimension reduction will be automatically performed using metric multidimensional scaling (MDS) viamStat_calculate_PC
. The pc.obj list structure should contain:- points
A matrix with samples as rows and PCs as columns containing the coordinates.
- eig
Eigenvalues for each PC dimension.
- vectors
Loadings vectors for features onto each PC.
- Other metadata
like method, dist.name, etc.
See
mStat_calculate_PC
function for details on output format.- pc.ind
A numeric vector specifying which principal coordinate (PC) axes to use for plotting. This refers to the PC axes from the dimension reduction method specified in pc.obj or calculated by default. For example, c(1,2) will generate plots for PC1 and PC2. Default is c(1,2) to plot the first two PCs.
- subject.var
Character string specifying the column name in metadata containing unique subject IDs. Required to connect samples from the same subject across timepoints.
- time.var
Character string specifying the column name in metadata containing time values for each sample. Required to identify pairs of timepoints to calculate changes between.
- group.var
Character string specifying the column in metadata containing grouping categories. Used for stratification in plots. Optional, can be NULL.
- strata.var
Character string specifying the column in metadata containing stratification categories. Used for nested faceting in plots. Optional, can be NULL.
- adj.vars
Character vector specifying columns in metadata containing covariates to adjust for in distance matrix calculation. Optional, can be NULL.
- change.base
The baseline time point value in the time variable to be used as the reference for calculating changes. Required if time.var contains multiple time points. Changes will be calculated from this baseline time point to the other later time point(s). If NULL, the first time point will be used as change.base automatically.
- change.func
A function or string specifying how to calculate changes between time points. If a function is provided, it should take two arguments "value_time_2" and "value_time_1" representing the PC values at the two time points. The function should return the change value. If a string, currently only "absolute change" is supported, which calculates simple differences between time points. More options could be added in the future as needed. Default is "absolute change".
- dist.name
A character vector specifying which beta diversity indices to calculate. Supported indices are "BC" (Bray-Curtis), "Jaccard", "UniFrac" (unweighted UniFrac), "GUniFrac" (generalized UniFrac), "WUniFrac" (weighted UniFrac), and "JS" (Jensen-Shannon divergence). If a name is provided but the corresponding object does not exist within dist.obj, it will be computed internally. If the specific index is not supported, an error message will be returned. Default is c('BC', 'Jaccard').
- base.size
A numeric value for the base size of the plot. Default is 16.
- theme.choice
Plot theme choice. Specifies the visual style of the plot. Can be one of the following pre-defined themes: - "prism": Utilizes the ggprism::theme_prism() function from the ggprism package, offering a polished and visually appealing style. - "classic": Applies theme_classic() from ggplot2, providing a clean and traditional look with minimal styling. - "gray": Uses theme_gray() from ggplot2, which offers a simple and modern look with a light gray background. - "bw": Employs theme_bw() from ggplot2, creating a classic black and white plot, ideal for formal publications and situations where color is best minimized. - "light": Implements theme_light() from ggplot2, featuring a light theme with subtle grey lines and axes, suitable for a fresh, modern look. - "dark": Uses theme_dark() from ggplot2, offering a dark background, ideal for presentations or situations where a high-contrast theme is desired. - "minimal": Applies theme_minimal() from ggplot2, providing a minimalist theme with the least amount of background annotations and colors. - "void": Employs theme_void() from ggplot2, creating a blank canvas with no axes, gridlines, or background, ideal for custom, creative plots. Each theme option adjusts various elements like background color, grid lines, and font styles to match the specified aesthetic. Default is "bw", offering a universally compatible black and white theme suitable for a wide range of applications.
- custom.theme
A custom ggplot theme provided as a ggplot2 theme object. This allows users to override the default theme and provide their own theme for plotting. Custom themes are useful for creating publication-ready figures with specific formatting requirements.
To use a custom theme, create a theme object with ggplot2::theme(), including any desired customizations. Common customizations for publication-ready figures might include adjusting text size for readability, altering line sizes for clarity, and repositioning or formatting the legend. For example:
“`r my_theme <- ggplot2::theme( axis.title = ggplot2::element_text(size=14, face="bold"), # Bold axis titles with larger font axis.text = ggplot2::element_text(size=12), # Slightly larger axis text legend.position = "top", # Move legend to the top legend.background = ggplot2::element_rect(fill="lightgray"), # Light gray background for legend panel.background = ggplot2::element_rect(fill="white", colour="black"), # White panel background with black border panel.grid.major = ggplot2::element_line(colour = "grey90"), # Lighter color for major grid lines panel.grid.minor = ggplot2::element_blank(), # Remove minor grid lines plot.title = ggplot2::element_text(size=16, hjust=0.5) # Centered plot title with larger font ) “`
Then pass `my_theme` to `custom.theme`. If `custom.theme` is NULL (the default), the theme is determined by `theme.choice`. This flexibility allows for both easy theme selection for general use and detailed customization for specific presentation or publication needs.
- palette
An optional parameter specifying the color palette to be used for the plot. It can be either a character string specifying the name of a predefined palette or a vector of color codes in a format accepted by ggplot2 (e.g., hexadecimal color codes). Available predefined palettes include 'npg', 'aaas', 'nejm', 'lancet', 'jama', 'jco', and 'ucscgb', inspired by various scientific publications and the `ggsci` package. If `palette` is not provided or an unrecognized palette name is given, a default color palette will be used. Ensure the number of colors in the palette is at least as large as the number of groups being plotted.
A logical value indicating whether to save the plot as a PDF. Default is TRUE.
- file.ann
A string for additional annotation to the file name. Default is NULL.
- pdf.wid
A numeric value specifying the width of the PDF. Default is 11.
- pdf.hei
A numeric value specifying the height of the PDF. Default is 8.5.
- ...
Additional arguments to be passed to the function.
Value
A named list of ggplot objects, with one element per combination of pc.ind and dist.name. Each element contains the plot for that PC index and distance metric.
Details
This function generates boxplots visualizing changes in beta diversity PCoA coordinates over time. It is designed for longitudinal microbiome data with multiple time points. The primary output is a boxplot showing distributions of changes in PCoA coordinates between two user-specified time points, optionally stratified by groups. Violin plots are also overlaid to show density. The function offers flexibility to control: - Distance metrics used (via dist.name argument) - PCoA axes to plot (via pc.ind argument) - Subject variable for pairing time points (subject.var) - Time points to compare (time.var and change.base) - Stratification variable(s) (group.var and strata.var) - Calculation of change between time points (change.func argument) - Plot aesthetics like theme, color, file saving, etc. For large datasets, the data are subset to the two time points of interest before plotting. Jitter is added to handle overlapping points. These steps help in generating readable plots. The output plot list allows downstream iteration through multiple PC axes and/or distance metrics. Plots can be accessed via e.g. plotlist$BC_PC1 for BC dissimilarity PC1 coordinates.
Examples
if (FALSE) { # \dontrun{
library(vegan)
library(ggh4x)
data(peerj32.obj)
generate_beta_pc_change_boxplot_pair(
data.obj = peerj32.obj,
dist.obj = NULL,
pc.obj = NULL,
pc.ind = c(1, 2),
subject.var = "subject",
time.var = "time",
group.var = "group",
strata.var = "sex",
change.base = "1",
change.func = "absolute change",
dist.name = c('BC'),
base.size = 20,
theme.choice = "bw",
custom.theme = NULL,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
data(peerj32.obj)
generate_beta_pc_change_boxplot_pair(
data.obj = subset_pairs.obj,
dist.obj = NULL,
pc.obj = NULL,
pc.ind = c(1, 2),
subject.var = "MouseID",
time.var = "Antibiotic",
group.var = "Sex",
strata.var = NULL,
change.base = "Baseline",
change.func = "absolute change",
dist.name = c('BC'),
base.size = 20,
theme.choice = "bw",
custom.theme = NULL,
palette = NULL,
pdf = TRUE,
file.ann = NULL,
pdf.wid = 11,
pdf.hei = 8.5
)
} # }