Filter a Microbiome Data Matrix by Prevalence and Average Abundance
Source:R/mStat_filter.R
mStat_filter.Rd
This function filters taxa in a microbiome matrix based on specified prevalence and average abundance thresholds.
Arguments
- x
A matrix containing taxa (in rows) and sample (in columns) microbial abundance data.
- prev.filter
Numeric, the minimum prevalence threshold for a taxon to be retained. Prevalence is calculated as the proportion of samples where the taxon is present.
- abund.filter
Numeric, the minimum average abundance threshold for a taxon to be retained.
Details
The function first converts the input data into a long format and then groups by taxa. It computes both the average abundance and prevalence for each taxon. Subsequently, it filters out taxa that do not meet the provided prevalence and average abundance thresholds.
Examples
# Example with simulated data
if (FALSE) { # \dontrun{
data_matrix <- matrix(c(0, 3, 4, 0, 2, 7, 8, 9, 10), ncol=3)
colnames(data_matrix) <- c("sample1", "sample2", "sample3")
rownames(data_matrix) <- c("taxa1", "taxa2", "taxa3")
filtered_data_simulated <- mStat_filter(data_matrix, 0.5, 5)
print(filtered_data_simulated)
} # }
# Example with real dataset: peerj32.obj
if (FALSE) { # \dontrun{
data(peerj32.obj)
data_matrix_real <- peerj32.obj$feature.tab
# Assuming the matrix contains counts with taxa in rows and samples in columns
filtered_data_real <- mStat_filter(data_matrix_real, 0.5, 5)
print(filtered_data_real)
} # }