ggpicrust2 is a comprehensive package that integrates pathway name/description annotations, ten of the most advanced differential abundance (DA) methods, and visualization of DA results. It offers a comprehensive solution for analyzing and interpreting the results of PICRUSt2 functional prediction in a seamless and intuitive way. Whether you are a researcher, data scientist, or bioinformatician, ggpicrust2 can help you better understand the underlying biological processes and mechanisms at play in your PICRUSt2 output data. So if you are interested in exploring the output data of PICRUSt2, ggpicrust2 is the tool you need.
Citation
If you use ggpicrust2 in your research, please cite the following paper:
Chen Yang, Aaron Burberry, Xuan Cao, Jiahao Mai, Liangliang Zhang. (2023). ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization. arXiv preprint arXiv:2303.10388.
BibTeX entry: @misc{yang2023ggpicrust2, title={ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization}, author={Chen Yang and Aaron Burberry and Jiahao Mai and Xuan Cao and Liangliang Zhang}, year={2023}, eprint={2303.10388}, archivePrefix={arXiv}, primaryClass={stat.AP} }
ResearchGate preprint link: Click here
Installation
You can install the stable version of ggpicrust2 from CRAN with:
install.packages("ggpicrust2")
To install the latest development version of ggpicrust2 from GitHub, you can use:
# Install the devtools package if not already installed
# install.packages("devtools")
# Install ggpicrust2 from GitHub
devtools::install_github("cafferychen777/ggpicrust2")
Workflow
The easiest way to analyze the PICRUSt2 output is using ggpicrust2() function. The entire pipeline can be run with ggpicrust2() function.
ggpicrust2() integrates ko abundance to kegg pathway abundance conversion, annotation of pathway, differential abundance (DA) analysis, DA results visualization. When you have trouble running ggpicrust2(), you can debug it by running a separate function, which will greatly increase the speed of your analysis and visualization
ggpicrust2()
#If you want to analysis kegg pathway abundance instead of ko within the pathway. You should turn ko_to_kegg to TRUE.
#The kegg pathway typically have the more explainable description.
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
<-
metadata read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)<- "Enviroment"
group <-
daa_results_list ggpicrust2(
file = "/Users/apple/Microbiome/C9orf72/Code And Data/picrust2_out/KO_metagenome_out/pred_metagenome_unstrat.tsv/pred_metagenome_unstrat.tsv",
metadata = metadata,
group = "Enviroment",
pathway = "KO",
daa_method = "LinDA",
p_values_bar = TRUE,
p.adjust = "BH",
ko_to_kegg = TRUE,
order = "pathway_class"
select = NULL,
reference = NULL # If your metadata[,group] has more than two levels, please specify a reference.
)
#If you want to analysis the EC. MetaCyc. KO without conversions. You should turn ko_to_kegg to FALSE.
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
<-
metadata read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)<- "Enviroment"
group <-
daa_results_list ggpicrust2(
file = "//Users/apple/Microbiome/C9orf72/Code And Data/picrust2_out/EC_metagenome_out/pred_metagenome_unstrat.tsv/pred_metagenome_unstrat.tsv",
metadata = metadata,
group = "Enviroment",
pathway = "EC",
daa_method = "LinDA",
p_values_bar = TRUE,
order = "group",
ko_to_kegg = FALSE,
x_lab = "description",
p.adjust = "BH",
select = NULL,
reference = NULL
)
If an error occurs with ggpicrust2, please use the following workflow.
#If you want to analysis kegg pathway abundance instead of ko within the pathway. You should turn ko_to_kegg to TRUE.
#The kegg pathway typically have the more explainable description.
#metadata should be tibble.
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
metadata <-
read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)
kegg_abundance <-
ko2kegg_abundance(
"/Users/apple/Downloads/pred_metagenome_unstrat.tsv/pred_metagenome_unstrat.tsv"
)
group <- "Enviroment"
daa_results_df <-
pathway_daa(
abundance = kegg_abundance,
metadata = metadata,
group = group,
daa_method = "ALDEx2",
select = NULL,
reference = NULL
)
#if you are using LinDA, limme voom and Maaslin2, please specify a reference just like followings code.
# daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = group, daa_method = "LinDA", select = NULL, reference = "Harvard BRI")
daa_sub_method_results_df <-
daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]
daa_annotated_sub_method_results_df <-
pathway_annotation(pathway = "KO",
daa_results_df = daa_sub_method_results_df,
ko_to_kegg = TRUE)
Group <-
metadata$Enviroment # column which you are interested in metadata
# select parameter format in pathway_error() is c("ko00562", "ko00440", "ko04111", "ko05412", "ko00310", "ko04146", "ko00600", "ko04142", "ko00604", "ko04260", "ko04110", "ko04976", "ko05222", "ko05416", "ko00380", "ko05322", "ko00625", "ko00624", "ko00626", "ko00621")
daa_results_list <-
pathway_errorbar(
abundance = kegg_abundance,
daa_results_df = daa_annotated_sub_method_results_df,
Group = Group,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
ko_to_kegg = TRUE,
p_value_bar = TRUE,
colors = NULL,
x_lab = "pathway_name"
)
# x_lab can be set "feature" which represents kegg pathway id.
# If you want to analysis the EC. MetaCyc. KO without conversions. You should turn ko_to_kegg to FALSE.
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
metadata <-
read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)
#ko_abundance should be data.frame instead of tibble
#ID should be in the first column
ko_abundance <-
read.delim(
"/Users/apple/Downloads/pred_metagenome_unstrat.tsv/pred_metagenome_unstrat.tsv"
)
#sometimes there are function, pathway or something else. Change function. when needed
rownames(ko_abundance) <- ko_abundance$function.
ko_abundance <- ko_abundance[, -1]
group <- "Enviroment"
daa_results_df <-
pathway_daa(
abundance = ko_abundance,
metadata = metadata,
group = group,
daa_method = "ALDEx2",
select = NULL,
reference = NULL
)
#if you are using LinDA, limme voom and Maaslin2, please specify a reference just like followings code.
# daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = group, daa_method = "LinDA", select = NULL, reference = "Harvard BRI")
daa_sub_method_results_df <-
daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]
daa_annotated_sub_method_results_df <-
pathway_annotation(pathway = "KO",
daa_results_df = daa_sub_method_results_df,
ko_to_kegg = FALSE)
Group <-
metadata$Enviroment # column which you are interested in metadata
# select parameter format in pathway_error() is c("K00001", "K00002", "K00003", "K00004")
daa_results_list <-
pathway_errorbar(
abundance = ko_abundance,
daa_results_df = daa_annotated_sub_method_results_df,
Group = Group,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
ko_to_kegg = FALSE,
p_value_bar = TRUE,
colors = NULL,
x_lab = "description"
)
function details
ko2kegg_abundance()
KEGG Orthology(KO) is a classification system developed by the Kyoto Encyclopedia of Genes and Genomes (KEGG) data-base(Kanehisa et al., 2022). It uses a hierarchical structure to classify enzymes based on the reactions they catalyze. To better understand pathways’ role in different groups and classify the pathways, the KO abundance table needs to be converted to KEGG pathway abundance. But PICRUSt2 removes the function from PICRUSt. ko2kegg_abundance() can help convert the table.
# Sample usage of the ko2kegg_abundance function
# Assume that the KO abundance table is stored in a file named "ko_abundance.tsv"
ko_abundance_file <- "ko_abundance.tsv"
kegg_abundance <- ko2kegg_abundance(ko_abundance_file)
# The resulting kegg_abundance data frame can now be used for further analysis and visualization.
pathway_daa()
Differential abundance(DA) analysis plays a major role in PICRUSt2 downstream analysis. pathway_daa() integrates almost all DA methods applicable to the predicted functional profile which there excludes ANCOM and ANCOMBC. It includes ALDEx2(Fernandes et al., 2013), DEseq2(Love et al., 2014), Maaslin2(Mallick et al., 2021), LinDA(Zhou et al., 2022), edgeR(Robinson et al., 2010) , limma voom(Ritchie et al., 2015), metagenomeSeq(Paulson et al., 2013), lefser(Segata et al., 2011).
#the abundance table is better to be data.frame rather than tibble
#you can use ko2_kegg_abundance output
abundance <- ko2kegg_abundance(ko_abundance_file)
metadata <-
read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)
group <- "Enviroment"
#daa_method default is "ALDex2"
daa_results_df <- pathway_daa(abundance = abundance,
metadata = metadata,
group = group ,
daa_method = "limma voom",
select = NULL,
p.adjust = "BH",
reference = NULL)
#If you group levels >3 and want to use the LinDA, limme voom and Maaslin2, you should give a reference.
metadata <-
read_delim(
"~/Microbiome/C9orf72/Code And Data/new_metadata.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)
group <- "Group"
daa_results_df <- pathway_daa(abundance = abundance,
metadata = metadata,
group = group ,
daa_method = "limma voom",
select = NULL,
p.adjust = "BH",
reference = "Harvard BRI")
pathway_annotation()
If you are in China and you are using kegg pathway annotation, Please make sure your internet can break through the firewall.
daa_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_results_df, ko_to_kegg = TRUE)
pathway_errorbar()
pathway_errorbar(abundance = abundance,
daa_results_df = daa_results_df,
Group = metadata$Enviroment,
ko_to_kegg = TRUE,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
p_value_bar = TRUE,
colors = NULL,
x_lab = NULL)
pathway_heatmap()
In this section, we will demonstrate how to create a pathway heatmap using the pathway_heatmap
function in the ggpicrust2 package. This function visualizes the relative abundance of pathways in different samples.
First, we will create some example functional pathway abundance data and metadata:
# Create example functional pathway abundance data
abundance_example <- matrix(rnorm(30), nrow = 10, ncol = 3)
rownames(abundance_example) <- paste0("Sample", 1:10)
colnames(abundance_example) <- c("PathwayA", "PathwayB", "PathwayC")
# Create example metadata
# Please change your sample id's column name to sample_name
metadata_example <- data.frame(sample_name = rownames(abundance_example),
group = factor(rep(c("Control", "Treatment"), each = 5)))
Next, we will use the pathway_heatmap function to create a heatmap:
# Create a heatmap
heatmap_plot <- ggpicrust2::pathway_heatmap(t(abundance_example), metadata_example, "group")
Finally, we will display the resulting heatmap:
print(heatmap_plot)
pathway_pca()
In this section, we will demonstrate how to perform Principal Component Analysis (PCA) on functional pathway abundance data and create visualizations of the PCA results using the pathway_pca
function in the ggpicrust2 package.
First, we will create some example functional pathway abundance data and metadata:
# Create example functional pathway abundance data
abundance_example <- data.frame(A = rnorm(10), B = rnorm(10), C = rnorm(10))
# Create example metadata
metadata_example <- tibble::tibble(sample_id = 1:10,
group = factor(rep(c("Control", "Treatment"), each = 5)))
Next, we will use the pathway_pca function to perform PCA and create visualizations:
# Perform PCA and create visualizations
pca_plot <- ggpicrust2::pathway_pca(t(abundance_example), metadata_example, "group")
Finally, we will display the resulting PCA plot:
print(pca_plot)
FAQ
Issue 1: pathway_errorbar error
When using pathway_errorbar
with the following parameters:
pathway_errorbar(abundance = abundance,
daa_results_df = daa_results_df,
Group = metadata$Enviroment,
ko_to_kegg = TRUE,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
p_value_bar = TRUE,
colors = NULL,
x_lab = "pathway_name")
You may encounter an error:
in `ggplot_add()`:
Error ! Can't add `e2` to a <ggplot> object.
Run `rlang::last_trace()` to see where the error occurred.
Make sure you have the patchwork
package loaded:
Issue 2: guide_train.prism_offset_minor error
You may encounter an error with guide_train.prism_offset_minor
:
in guide_train.prism_offset_minor(guide, panel_params[[aesthetic]]) :
Error No minor breaks exist, guide_prism_offset_minor needs minor breaks to work
Ensure that the ggprism
package is loaded:
Issue 3: SSL certificate problem
When encountering the following error:
: [rest.kegg.jp] SSL certificate problem: certificate has expired SSL peer certificate or SSH remote key was not OK
If you are in China, make sure your computer network can bypass the firewall.
Issue 4: Bad Request (HTTP 400)
When encountering the following error:
in .getUrl(url, .flatFileParser) : Bad Request (HTTP 400). Error
Please restart R session.
Issue 5: Error in grid.Call(C_textBounds, as.graphicsAnnot(xlabel),xx, xy, :
When encountering the following error:
in grid.Call(C_textBounds, as.graphicsAnnot(xlabel),x$x, x$y, : Error
Please having some required fonts installed. You can refer to this thread.
Author’s other projects
MicrobiomeGallery, a web-based platform for sharing microbiome data visualization code and datasets. (It’s still under construction)