Conduct Differential Abundance Testing Using LinDA Method in MicrobiomeStat Package
Source:R/generate_taxa_test_single.R
generate_taxa_test_single.Rd
This function applies a differential abundance analysis using LinDA on a data set. The function filters taxa based on prevalence and abundance, then it aggregates and applies the LinDA method. Finally, it creates a report of significant taxa with relevant statistics.
Usage
generate_taxa_test_single(
data.obj,
time.var = NULL,
t.level = NULL,
group.var,
adj.vars = NULL,
prev.filter = 0,
abund.filter = 0,
feature.level,
feature.dat.type = c("count", "proportion", "other"),
...
)
Arguments
- data.obj
A list object in a format specific to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). The data.obj can be converted from other formats using several functions from the MicrobiomeStat package, including: 'mStat_convert_DGEList_to_data_obj', 'mStat_convert_DESeqDataSet_to_data_obj', 'mStat_convert_phyloseq_to_data_obj', 'mStat_convert_SummarizedExperiment_to_data_obj', 'mStat_import_qiime2_as_data_obj', 'mStat_import_mothur_as_data_obj', 'mStat_import_dada2_as_data_obj', and 'mStat_import_biom_as_data_obj'. Alternatively, users can construct their own data.obj. Note that not all components of data.obj may be required for all functions in the MicrobiomeStat package.
- time.var
Character string specifying the column name in metadata containing time variable. Used to subset data to a single timepoint if provided. Default NULL does not subset.
- t.level
Character string specifying the time level/value to subset data to, if a time variable is provided. Default NULL does not subset data.
- group.var
Character string specifying the column name in metadata containing grouping categories. This will be used as the predictor in differential abundance testing.
- adj.vars
Character vector specifying column names in metadata containing covariates. These will be used for adjustment in differential abundance testing.
- prev.filter
Numeric value specifying the minimum prevalence threshold for filtering taxa before analysis. Taxa with prevalence below this value will be removed. Prevalence is calculated as the proportion of samples where the taxon is present.
- abund.filter
Numeric value specifying the minimum abundance threshold for filtering taxa before analysis. Taxa with mean abundance below this value will be removed. Abundance refers to counts or proportions depending on
feature.dat.type
.- feature.level
The column name in the feature annotation matrix (feature.ann) of data.obj to use for summarization and plotting. This can be the taxonomic level like "Phylum", or any other annotation columns like "Genus" or "OTU_ID". Should be a character vector specifying one or more column names in feature.ann. Multiple columns can be provided, and data will be plotted separately for each column. Default is NULL, which defaults to all columns in feature.ann if `features.plot` is also NULL.
- feature.dat.type
The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: - "count": Raw count data, will be normalized by the function. - "proportion": Data that has already been normalized to proportions/percentages. - "other": Custom abundance data that has unknown scaling. No normalization applied. The choice affects preprocessing steps as well as plot axis labels. Default is "count", which assumes raw OTU table input.
- ...
Additional arguments to be passed to the ZicoSeq function.
Value
A list of tibble(s) containing information about significant taxa, including R.Squared, F.Statistic, Estimate, P.Value, Adjusted.P.Value, Mean.Proportion, Mean.Prevalence, SD.Abundance and SD.Prevalence.
Details
This function facilitates differential abundance analysis utilizing the LinDA method:
1. If a specific time variable and level are provided, the data is subsetted accordingly.
2. Extracts OTU table and sample metadata.
3. If the feature data type is of "count", it normalizes the data using the "TSS" transformation.
4. If the feature level is not "original", it aggregates the OTU table to the taxonomic levels specified by feature.level
.
5. Executes the LinDA method on the aggregated or original table considering the grouping and adjustment variables.
6. Extracts significant taxa's statistics into results tables, which include coefficients, standard errors, p-values, adjusted p-values, average abundances, and prevalence.
7. Returns a list of result tables where each element corresponds to a particular taxonomic level.
In essence, the function streamlines preprocessing, executes LinDA-based differential abundance testing, and assembles tables with pertinent results for significant taxa. It also supports adjusting for covariates and allows taxonomic aggregation at diverse levels for customized analyses.
Examples
if (FALSE) { # \dontrun{
data(peerj32.obj)
test.list <- generate_taxa_test_single(
data.obj = peerj32.obj,
time.var = "time",
t.level = "2",
group.var = "group",
adj.vars = "sex",
feature.dat.type = "count",
feature.level = c("Phylum","Genus","Family"),
prev.filter = 0.1,
abund.filter = 0.0001,
)
plot.list <- generate_taxa_volcano_single(
data.obj = peerj32.obj,
group.var = "group",
test.list = test.list,
feature.sig.level = 0.1,
feature.mt.method = "none"
)
} # }