Skip to contents

This function, `generate_taxa_test_pair`, is designed for analysis in microbiome studies. It takes a MicrobiomeStat data object as input and performs several key steps in the analysis of microbiome data. The function filters taxa based on prevalence and abundance thresholds, aggregates taxon abundances by sample, and applies the linda method to fit linear mixed effects models. These models are used to identify significant taxon changes across different groups over time, taking into account various covariates.

Usage

generate_taxa_test_pair(
  data.obj,
  subject.var,
  time.var = NULL,
  change.base,
  group.var,
  adj.vars,
  feature.level,
  prev.filter = 0,
  abund.filter = 0,
  feature.dat.type = c("count", "proportion", "other"),
  ...
)

Arguments

data.obj

A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.

subject.var

A string that specifies the name of the subject variable column in the metadata.

time.var

A string that specifies the name of the time variable column in the metadata. If not provided, it's NULL by default.

change.base

A value indicating the base level for the time variable. If provided, the specified level will be used as the reference category in the model. Default is NULL, which means the first level of the factor will be used.

group.var

A string that specifies the name of the grouping variable column in the metadata for linear modelling.

adj.vars

A vector of strings that specify the names of additional variables to be used as covariates in the analysis.

feature.level

The column name in the feature annotation matrix (feature.ann) of data.obj to use for summarization and plotting. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be plotted separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.

prev.filter

Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present. Default 0 removes no taxa by prevalence filtering.

abund.filter

Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type. Default 0 removes no taxa by abundance filtering.

feature.dat.type

The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.

...

Additional parameters to be passed to the linda function.

Value

A named list containing data frames summarizing taxon test results for each taxonomic level.

Details

Each list element corresponds to a taxonomic level specified in `feature.level`. The data frame contains columns for taxon name, log2 fold change, p-values, adjusted p-values, mean abundance, mean prevalence, and the output element from `linda` where the taxon was found significant.

Examples

if (FALSE) { # \dontrun{
data(peerj32.obj)
test.list <- generate_taxa_test_pair(
  data.obj = peerj32.obj,
  subject.var = "subject",
  time.var = "time",
  group.var = "group",
  adj.vars = c("sex"),
  feature.level = c("Genus"),
  prev.filter = 0.1,
  abund.filter = 0.0001,
  feature.dat.type = "count"
)
plot.list <-
generate_taxa_volcano_single(
 data.obj = peerj32.obj,
 group.var = "group",
 test.list = test.list,
 feature.sig.level = 0.1,
 feature.mt.method = "none"
)

data("subset_pairs.obj")
test.list <- generate_taxa_test_pair(
  data.obj = subset_pairs.obj,
  subject.var = "MouseID",
  time.var = "Antibiotic",
  group.var = "Sex",
  adj.vars = NULL,
  feature.level = c("Genus"),
  prev.filter = 0.1,
  abund.filter = 0.0001,
  feature.dat.type = "count"
)
plot.list <-
generate_taxa_volcano_single(
 data.obj = subset_pairs.obj,
 group.var = "Sex",
 test.list = test.list,
 feature.sig.level = 0.1,
 feature.mt.method = "none"
)
} # }