Longitudinal Taxa Abundance Volatility Test — generate_taxa_volatility_test

This function calculates the volatility of taxa abundances in longitudinal microbiome data. It tests for association between abundance volatility and a grouping variable.

Usage

generate_taxa_volatility_test_long(
  data.obj,
  time.var,
  subject.var,
  group.var,
  adj.vars = NULL,
  prev.filter = 0,
  abund.filter = 0,
  feature.level,
  feature.dat.type = c("count", "proportion", "other"),
  transform = "CLR",
  ...
)

Arguments

data.obj: A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.
time.var: Character string specifying the column name in metadata containing the numeric time variable. Should contain ordered time points for each subject. Required to calculate volatility over time.
subject.var: Character string specifying the column name in metadata containing unique subject IDs. Required to calculate volatility within subjects over time.
group.var: Character string specifying the column name in metadata containing grouping categories. Volatility will be compared between groups using linear models. Required.
adj.vars: Character vector specifying column names in metadata containing covariates to adjust for in linear models. Optional, can be NULL.
prev.filter: Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present.
abund.filter: Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type.
feature.level: Character vector specifying taxonomic level(s) to aggregate abundance data to before volatility calculation, e.g. c("Phylum", "Genus"). The special value "original" can also be provided, which will use the original taxon identifiers.
feature.dat.type: Character string specifying the data type of the abundance data. Should be one of "count", "proportion", or "other". Determines transform. This should match the units of data used in feature.level.
transform: Character string specifying transformation method. If "CLR", count and proportion data will be CLR transformed before volatility calculation. Default "CLR".
...: Additional arguments passed to other methods.

Value

A list of test results. The results are returned in a tidy dataframe format, including coefficients, standard errors, statistics, and p-values from linear models and ANOVA tests.

Details

Volatility is calculated as the mean absolute difference in abundance between consecutive time points, normalized by time difference: mean(|abundance(t+1) - abundance(t)| / (time(t+1) - time(t)))

The function transforms the abundance data first before volatility calculation. Default transform is 'CLR' for count and proportion data. No transform for other types.

For count data, a pseudocount of 0.5 is added before CLR transform. For proportion data, zeros are replaced with 1/2 of the minimum positive value before CLR.

It then calculates volatility within each subject, and tests for association with the grouping variable using linear models. If the grouping variable has multiple levels, an ANOVA is performed.

Examples

if (FALSE) { # \dontrun{
data("subset_T2D.obj")
test.list <- generate_taxa_volatility_test_long(
data.obj = subset_T2D.obj,
time.var = "visit_number",
subject.var = "subject_id",
group.var = "subject_race",
adj.vars = "sample_body_site",
prev.filter = 0.1,
abund.filter = 0.0001,
feature.level = c("Genus"),
feature.dat.type = "count",
transform = "CLR"
)
plot.list <- generate_taxa_volatility_volcano_long(data.obj = subset_T2D.obj,
                                                   group.var = "subject_race",
                                                   test.list = test.list,
                                                   feature.sig.level = 0.1,
                                                   feature.mt.method = "none")

data("ecam.obj")
test.list <- generate_taxa_volatility_test_long(
  data.obj = ecam.obj,
  time.var = "month_num",
  subject.var = "subject.id",
  group.var = "antiexposedall",
  adj.vars = "delivery",
  prev.filter = 0.1,
  abund.filter = 0.0001,
  feature.level = c("Order", "Family", "Genus"),
  feature.dat.type = "proportion",
  transform = "CLR"
)
plot.list <- generate_taxa_volatility_volcano_long(
  data.obj = ecam.obj,
  group.var = "antiexposedall",
  test.list = test.list,
  feature.sig.level = 0.2,
  feature.mt.method = "none"
)
} # }