Longitudinal Taxa Association Test Generation — generate_taxa_association_test

This function performs association testing between taxa abundances and a grouping variable in longitudinal microbiome data. It discerns how taxa abundances differ between experimental or observational groups over time.

Usage

generate_taxa_association_test_long(
  data.obj,
  subject.var,
  group.var = NULL,
  adj.vars = NULL,
  prev.filter = 0,
  abund.filter = 0,
  feature.level,
  feature.dat.type = c("count", "proportion"),
  ...
)

Arguments

data.obj: A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.
subject.var: A character string that indicates the column name in the metadata which uniquely identifies each subject or sample.
group.var: A character string specifying the grouping variable column in the metadata. This variable differentiates between different experimental or observational groups.
adj.vars: A vector of character strings. Each string should denote a column name in the metadata that will serve as a covariate in the analysis. These variables might account for potential confounding influences. Default is NULL.
prev.filter: Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present. Default 0 removes no taxa by prevalence filtering.
abund.filter: Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type. Default 0 removes no taxa by abundance filtering.
feature.level: A character string indicating the taxonomic resolution for analysis (e.g., "Phylum", "Class"). This choice will determine the granularity of the analysis.
feature.dat.type: A character string, either "count" or "proportion", indicating the nature of the data in the `data.obj`. This helps the function to determine if normalization is required. Default is "count".
...: Additional arguments to cater to any specialized requirements. For now, these are placeholder and not used.

Value

A list of dataframes, with each dataframe representing a specific taxonomic level (as specified in `feature.level`). These dataframes contain essential statistics, including taxa changes, p-values, and other metrics derived from the linear model.

Details

Based on whether group.var and adj.vars are NULL, the formula tests:

- When group.var is NULL and adj.vars is NOT NULL: - Tests adj.vars main effects only. - Adjusted for adj.vars but not group.var.

- When group.var is NOT NULL and adj.vars is NOT NULL: - Tests adj.vars and group.var main effects. - Adjusted for adj.vars.

- When group.var is NOT NULL and adj.vars is NULL: - Tests group.var main effects only. - Unadjusted analysis.

- When both group.var and adj.vars are NULL: - Tests the intercept only. - Unadjusted analysis.

The formula combines the appropriate terms based on which variables are NULL. Subject variability is accounted for through random effects.

When group.var and adj.vars are NULL, the intercept is tested without adjusting for any covariates.

Examples

if (FALSE) { # \dontrun{
# Example 1: Generate taxa association tests and volcano plots for the ecam dataset
data("ecam.obj")
test.list <- generate_taxa_association_test_long(
  data.obj = ecam.obj,
  subject.var = "studyid",
  group.var = "delivery",
  feature.level = c("Phylum", "Class"),
  feature.dat.type = c("count")
)

volcano_plots_ecam <- generate_taxa_volcano_single(
  data.obj = ecam.obj,
  group.var = "delivery",
  test.list = test.list,
  feature.sig.level = 0.1,
  feature.mt.method = "fdr"
)


# Example 2: Generate taxa association tests and volcano plots for a subset of the T2D dataset
data("subset_T2D.obj")
test.list_T2D <- generate_taxa_association_test_long(
  data.obj = subset_T2D.obj,
  subject.var = "subject_id",
  feature.level = "Genus",
  group.var = "subject_race",
  feature.dat.type = c("count"),
  prev.filter = 0.1,
  abund.filter = 0.001
)

volcano_plots_T2D <- generate_taxa_volcano_single(
  data.obj = subset_T2D.obj,
  group.var = "subject_race",
  test.list = test.list_T2D,
  feature.sig.level = 0.1,
  feature.mt.method = "none"
)
} # }