Compute taxa changes and analyze differential abundance — generate_taxa_change_test

This function calculates taxa abundance changes between two time points and performs differential abundance analysis between groups using linear models or ANOVA.

Usage

generate_taxa_change_test_pair(
  data.obj,
  subject.var,
  time.var = NULL,
  group.var = NULL,
  adj.vars = NULL,
  change.base,
  feature.change.func = "relative change",
  feature.level,
  prev.filter = 0.1,
  abund.filter = 1e-04,
  feature.dat.type = c("count", "proportion", "other"),
  winsor.qt = 0.97
)

Arguments

data.obj

A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.

subject.var

The name of the subject variable column in the metadata.

time.var

The name of the time variable column in the metadata (optional).

group.var

The name of the grouping variable column for linear modeling in the metadata.

adj.vars

Names of additional variables to be used as covariates in the analysis.

change.base

The baseline time point for detecting changes in taxa. If NULL, the first unique value from the time.var column will be used (optional).

feature.change.func

Specifies the method or function used to compute the change between two time points. Options include:

- "absolute change" (default): Computes the absolute difference between the values at the two time points (`value_time_2` and `value_time_1`).

- "log fold change": Computes the log2 fold change between the two time points. For zero values, imputation is performed using half of the minimum nonzero value for each feature level at the respective time point before taking the logarithm.

- "relative change": Computes the relative change as `(value_time_2 - value_time_1) / (value_time_2 + value_time_1)`. If both time points have a value of 0, the change is defined as 0.

- A custom function: If a user-defined function is provided, it should take two numeric vectors as input corresponding to the values at the two time points (`value_time_1` and `value_time_2`) and return a numeric vector of the computed change. This custom function will be applied directly to calculate the difference.

feature.level

The column name in the feature annotation matrix (feature.ann) of data.obj to use for summarization and plotting. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be plotted separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.

prev.filter

Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present.

abund.filter

Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on feature.dat.type.

feature.dat.type

The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.

winsor.qt

A numeric value between 0 and 1, specifying the quantile for winsorization (default: 0.97). Winsorization is a data preprocessing method used to limit extreme values or outliers in the data. The `winsor.qt` parameter determines the upper and lower quantiles for winsorization. For example, if `winsor.qt` is set to 0.97, the lower quantile will be (1 - 0.97) / 2 = 0.015, and the upper quantile will be 1 - (1 - 0.97) / 2 = 0.985. Values below the lower quantile will be replaced with the lower quantile, and values above the upper quantile will be replaced with the upper quantile. This helps to reduce the impact of extreme values or outliers on subsequent analyses.

Value

A named list where each element corresponds to a feature level and contains a dataframe with the calculated taxa changes, their corresponding p-values, and other statistics from the linear model.

Examples

if (FALSE) { # \dontrun{
data(peerj32.obj)
generate_taxa_change_test_pair(
  data.obj = peerj32.obj,
  subject.var = "subject",
  time.var = "time",
  group.var = "group",
  adj.vars = "sex",
  change.base = "1",
  feature.change.func = "log fold change",
  feature.level = c("Genus"),
  prev.filter = 0.1,
  abund.filter = 1e-4,
  feature.dat.type = "count"
)

data(subset_pairs.obj)
generate_taxa_change_test_pair(
  data.obj = subset_pairs.obj,
  subject.var = "MouseID",
  time.var = "Antibiotic",
  group.var = "Sex",
  adj.vars = NULL,
  change.base = "Baseline",
  feature.change.func = "log fold change",
  feature.level = c("Genus"),
  prev.filter = 0.1,
  abund.filter = 1e-4,
  feature.dat.type = "count"
)
} # }