This is my (atb) first time making any changes to this document. My goals are:
my current understanding of the experiment is that RNA was collected from white adipose tissue from strains of mice which were iron deficient (by diet) and strains with an adipose specific knockout of TfR1. In addition, some mice were treated with a mimic of long-term cold stress.
Thus the categories of note:
In the set of figures I see that the original colors were:
IAD-Saline: #808080 IAD-CL: #8a4412 IDD-Saline: #aed8e4 IDD-CL: #6592ef
I am using mm38_100.
mm_annot <- load_biomart_annotations(species = "mmusculus", year = "2023", month = "07")## Using mart: ENSEMBL_MART_ENSEMBL from host: jul2023.archive.ensembl.org.
## Successfully connected to the mmusculus_gene_ensembl database.
## Finished downloading ensembl gene annotations.
## Finished downloading ensembl structure annotations.
## symbol columns is null, pattern matching 'symbol'.
## Including symbols, there are 57550 vs the 149547 gene annotations.
## Not dropping haplotype chromosome annotations, set drop_haplotypes = TRUE if this is bad.
## Saving annotations to mmusculus_biomart_annotations.rda.
## Finished save().
mm_annot <- mm_annot[["annotation"]]
mm_annot[["txid"]] <- paste0(mm_annot[["ensembl_transcript_id"]], ".", mm_annot[["version"]])
rownames(mm_annot) <- make.names(mm_annot[["ensembl_gene_id"]], unique = TRUE)
tx_gene_map <- mm_annot[, c("txid", "ensembl_gene_id")]color_choices <- list(
"IAD-Saline" = "#808080",
"IAD-CL" = "#8a4412",
"IDD-Saline" = "#aed8e4",
"IDD-CL" = "#6592ef")So, I now have a table of mouse annotations.
The sample sheet is called ‘Experimental_design_Kim_v1.0.xlsx’ and has a column to point to the names of the count tables to load. Here I combine the metadata, count data, and annotations.
hisat_annot <- mm_annot
##rownames(hisat_annot) <- paste0("gene:", rownames(hisat_annot))
mm38_hisat <- create_expt("sample_sheets/Experimental_design_Kim_v1.0.xlsx",
gene_info = hisat_annot) %>%
set_expt_colors(color_choices) %>%
sanitize_expt_pData(columns = "treatment", spaces = TRUE)## Reading the sample metadata.
## Did not find the column: sampleid.
## Setting the ID column to the first column.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 16 rows(samples) and 23 columns(metadata fields).
## Matched 25583 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 25760 features and 16 samples.
plot_legend(mm38_hisat)## The colors used in the expressionset are: #aed8e4, #6592ef, #808080, #8a4412.
plot_libsize(mm38_hisat)## Library sizes of 16 samples,
## ranging from 30,859,887 to 60,727,032.
plot_nonzero(mm38_hisat)## The following samples have less than 16744 genes.
## [1] "5214_S1" "3740_S3" "3750_S4" "3772_S5" "3774_S6" "3775_S7"
## [7] "3766_S9" "3767_S10" "3742_S12" "3747_S13" "3741b_S14"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 16 samples.
## These samples have an average 48.73 CPM coverage and 16621 genes observed, ranging from 16058 to
## 17079.
First we will filter out the genes which have low counts across all samples. Then I will normalize without background correction (it was determined there is no need for adjusting for background noise after evaluating the SVA correction).
mm38_filt <- normalize_expt(mm38_hisat, filter = TRUE)## Removing 13318 low-count genes (12442 remaining).
mm38_norm <- normalize_expt(mm38_filt, convert = "cpm", norm = "quant", transform = "log2")## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
mm38_nb <- normalize_expt(mm38_hisat, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")## Removing 13318 low-count genes (12442 remaining).
## Setting 128 low elements to zero.
## transform_counts: Found 128 values equal to 0, adding 1 to the matrix.
pca_norm <- plot_pca(mm38_norm, plot_labels = FALSE)
pca_norm## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by IDD-Saline, IDD-CL, IAD-Saline, IAD-CL
## Shapes are defined by 1, 2.
pca_sva <- plot_pca(mm38_nb)
pca_sva## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by IDD-Saline, IDD-CL, IAD-Saline, IAD-CL
## Shapes are defined by 1, 2.
plot_meta_sankey(mm38_hisat, factors = c("diet", "treatment", "batchnumber"))## A sankey plot describing the metadata of 16 samples,
## including 14 out of 0 nodes and traversing metadata factors:
## .
How many samples do we have currently for each group?
table(pData(mm38_hisat)[,c( "diet", "treatment")])## treatment
## diet cl salinevehicle
## IAD 4 4
## IDD 5 3
In the conditions column, we have IDD-Saline, IDD-CL, IAD-Saline, and IAD-CL. For the differential expression analysis, we will perform the following contrasts:
Normal Pairwise Contrasts:
Contrasts after adjustment:
I will conduct these contrasts below and provide a volcano plot (both interactive and static), as well as a searchable table of the significant DE results for each contrast.
mm_de_normal <- all_pairwise(mm38_filt, do_ebseq = FALSE,
model_batch = "svaseq", parallel = FALSE)##
## IDD-Saline IDD-CL IAD-Saline IAD-CL
## 3 5 4 4
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
mm_de_normal## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
keepers <- list("IDDCL_vs_IADCL" = c("IDDCL", "IADCL"),
"IDDSaline_vs_IADSaline" = c("IDDSaline", "IADSaline"),
"IDDSaline_vs_IDDCL" = c("IDDSaline", "IDDCL"),
"IADSaline_vs_IADCL" = c("IADSaline", "IADCL"))
mm_de_tables <- combine_de_tables(mm_de_normal, excel="excel/DE_20221003.xlsx",
keepers = keepers, label_column = "mgi_symbol")
mm_de_tables## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 IDDCL_vs_IADCL 132 115 142 117
## 2 IDDSaline_vs_IADSaline 48 55 53 58
## 3 IDDSaline_vs_IDDCL 4 27 5 40
## 4 IADSaline_vs_IADCL 12 5 12 7
## limma_sigup limma_sigdown
## 1 84 71
## 2 15 7
## 3 0 0
## 4 1 0
## Plot describing unique/shared genes in a differential expression table.
mm_de_sig <- extract_significant_genes(mm_de_tables,
excel = "excel/DE_sig_20221003.xlsx",
according_to = "deseq")
mm_de_sig## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## IDDCL_vs_IADCL 132 115
## IDDSaline_vs_IADSaline 48 55
## IDDSaline_vs_IDDCL 4 27
## IADSaline_vs_IADCL 12 5
res_tbls <- c()
for (t in seq_along(keepers)) {
tbl <- names(keepers)[t]
deseq_tbl <- mm_de_normal$deseq$all_tables[[tbl]]
res_tbls[[tbl]] <- merge(deseq_tbl, hisat_annot, by = "row.names")
res_tbls[[tbl]] <- set_sig_limma(res_tbls[[tbl]],
factors = c(paste(gsub("\\_.*","",tbl), "\nEnriched"),
paste(sub('.*\\_', '', tbl), "\nEnriched")))
res_tbls[[paste0(tbl, "volc_plotly")]] <- volc_plot(res_tbls[[tbl]],
title = paste("Volcano Plot:", tbl),
type = "plotly",
tbl = "limma",
gene_name_col = "external_gene_name")
res_tbls[[paste0(tbl, "volc_ggplot")]] <- volc_plot(res_tbls[[tbl]],
title = paste("Volcano Plot:", tbl),
tbl = "limma")
}mm_de_tables[["plots"]][["IDDCL_vs_IADCL"]][["deseq_vol_plots"]]pp(file = "pdf/iddcl_vs_iadcl_volcano.pdf")## Warning in pp(file = "pdf/iddcl_vs_iadcl_volcano.pdf"): The directory: pdf does
## not exist, will attempt to create it.
mm_de_tables[["plots"]][["IDDCL_vs_IADCL"]][["deseq_vol_plots"]]
dev.off()## png
## 2
mm_de_tables[["plots"]][["IDDSaline_vs_IADSaline"]][["deseq_vol_plots"]]pp(file = "pdf/iddsaline_vs_iadsaline_volcano.pdf")
mm_de_tables[["plots"]][["IDDSaline_vs_IADSaline"]][["deseq_vol_plots"]]
dev.off()## png
## 2
mm_de_tables[["plots"]][["IDDSaline_vs_IDDCL"]][["deseq_vol_plots"]]pp(file = "pdf/iddsaline_vs_iddcl_volcano.pdf")
mm_de_tables[["plots"]][["IDDSAline_vs_IDDCL"]][["deseq_vol_plots"]]## NULL
dev.off()## png
## 2
mm_de_tables[["plots"]][["IADSaline_vs_IADCL"]][["deseq_vol_plots"]]## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
pp(file = "pdf/iadsaline_vs_iadcl_volcano.pdf")
mm_de_tables[["plots"]][["IADSaline_vs_IADCL"]][["deseq_vol_plots"]]
dev.off()## png
## 2
Volcano Plot
res_tbls$IDDSaline_vs_IDDCLvolc_plotlyTable of Significant DE results
This table is significant results with adjusted p-value < 0.05 and log2FC greater than 0.5.
res_tbls$IDDSaline_vs_IDDCL %>%
filter(abs(logFC) > 0.5 & adj.P.Val < 0.05) %>%
arrange(desc(logFC)) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "logFC", "adj.P.Val", "Significance") %>%
DT::datatable(rownames = FALSE)GSEA
IDD_genes <- res_tbls$IDDSaline_vs_IDDCL %>%
filter(abs(logFC) >= 0.5 & adj.P.Val <= 0.05) %>%
arrange(desc(abs(logFC)))
IDD_gost <- gost(query = IDD_genes$ensembl_gene_id,
organism = "mmusculus", ordered_query = TRUE,
evcodes = TRUE)#IDD Saline Enriched
IDD_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: IDD Significant GSEA")IDD_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
datatable(rownames = FALSE, caption = "REAC: IDD Significant GSEA")IDD_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "KEGG") %>%
datatable(rownames = FALSE, caption = "KEGG: IDD Significant GSEA")##### IDD Enriched
gostplot(IDD_gost, capped = FALSE, interactive = TRUE)Volcano Plot
res_tbls$IADSaline_vs_IADCLvolc_plotlyTable of Significant DE results
res_tbls$IADSaline_vs_IADCL %>%
filter(abs(logFC) > 0.5 & adj.P.Val < 0.05) %>%
arrange(desc(logFC)) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "logFC", "adj.P.Val", "Significance") %>%
DT::datatable(rownames = FALSE)GSEA
IAD_genes <- res_tbls$IADSaline_vs_IADCL %>%
filter(abs(logFC) >= 0.5 & adj.P.Val <= 0.05) %>%
arrange(desc(abs(logFC)))
IAD_gost <- gost(query = IAD_genes$ensembl_gene_id,
organism = "mmusculus", ordered_query = TRUE,
evcodes = TRUE)#IAD Saline Enriched
IAD_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: IAD Significant GSEA")##### IAD Enriched
gostplot(IAD_gost, capped = FALSE, interactive = TRUE)Volcano Plot
res_tbls$IDDCL_vs_IADCLvolc_plotlyTable of Significant DE results
res_tbls$IDDCL_vs_IADCL %>%
filter(abs(logFC) > 0.5 & adj.P.Val < 0.05) %>%
arrange(desc(logFC)) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "logFC", "adj.P.Val", "Significance") %>%
DT::datatable(rownames = FALSE)GSEA
CL_genes <- res_tbls$IDDCL_vs_IADCL %>%
filter(abs(logFC) >= 0.5 & adj.P.Val <= 0.05) %>%
arrange(desc(abs(logFC)))
CL_gost <- gost(query = CL_genes$ensembl_gene_id,
organism = "mmusculus", ordered_query = TRUE,
evcodes = TRUE)#CL Saline Enriched
CL_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: CL Significant GSEA")CL_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
datatable(rownames = FALSE, caption = "REAC: CL Significant GSEA")CL_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "KEGG") %>%
datatable(rownames = FALSE, caption = "KEGG: CL Significant GSEA")##### CL Enriched
gostplot(CL_gost, capped = FALSE, interactive = TRUE)Volcano Plot
res_tbls$IDDSaline_vs_IADSalinevolc_plotlyTable of Significant DE results
res_tbls$IDDSaline_vs_IADSaline %>%
filter(abs(logFC) > 1 & adj.P.Val < 0.05) %>%
arrange(desc(logFC)) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "logFC", "adj.P.Val", "Significance") %>%
DT::datatable(rownames = FALSE)GSEA
Saline_genes <- res_tbls$IDDSaline_vs_IADSaline %>%
filter(abs(logFC) >= 0.5 & adj.P.Val <= 0.05) %>%
arrange(desc(abs(logFC)))
Saline_gost <- gost(query = Saline_genes$ensembl_gene_id,
organism = "mmusculus", ordered_query = TRUE,
evcodes = TRUE)#CL Saline Enriched
Saline_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: Saline Significant GSEA")Saline_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
datatable(rownames = FALSE, caption = "REAC: Saline Significant GSEA")Saline_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "KEGG") %>%
datatable(rownames = FALSE, caption = "KEGG: Saline Significant GSEA")##### CL Enriched
gostplot(Saline_gost, capped = FALSE, interactive = TRUE)countdata <- exprs(mm38_filt)
coldata <- colData(mm38_filt)
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ diet + treatment)## converting counts to integer mode
dds <- DESeq(ddsMat)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
res <- merge(as.data.frame(res), hisat_annot, by = 0)
res <- res[!is.na(res$padj),]
res <- set_sig(res, factors = c("Saline \n Enriched", "CL \n Enriched"))
volc_plot(res_tbl = res, type = "plotly", title = "Saline versus CL after adjusting for Diet")res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "log2FoldChange", "padj", "Significance") %>%
arrange(desc(log2FoldChange)) %>%
datatable(rownames = FALSE)all_pairwise() can do this too!
mm_diet_treatment <- set_expt_conditions(mm38_filt, fact = "diet") %>%
set_expt_batches(fact = "treatment")## The numbers of samples by condition are:
##
## IAD IDD
## 8 8
## The number of samples by batch are:
##
## cl salinevehicle
## 9 7
mm_diet_treatment_de <- all_pairwise(mm_diet_treatment, model_batch = TRUE)##
## IAD IDD
## 8 8
##
## cl salinevehicle
## 9 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
mm_diet_treatment_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## IDD_vs_IAD
## basic_vs_deseq 0.8709
## basic_vs_dream 0.9748
## basic_vs_ebseq 0.9416
## basic_vs_edger 0.8938
## basic_vs_limma 0.9907
## basic_vs_noiseq 0.9741
## deseq_vs_dream 0.8856
## deseq_vs_ebseq 0.8923
## deseq_vs_edger 0.9966
## deseq_vs_limma 0.8627
## deseq_vs_noiseq 0.9118
## dream_vs_ebseq 0.9644
## dream_vs_edger 0.9083
## dream_vs_limma 0.9835
## dream_vs_noiseq 0.9819
## ebseq_vs_edger 0.9153
## ebseq_vs_limma 0.9492
## ebseq_vs_noiseq 0.9646
## edger_vs_limma 0.8861
## edger_vs_noiseq 0.9317
## limma_vs_noiseq 0.9668
mm_diet_treatment_table <- combine_de_tables(mm_diet_treatment_de, excel = "excel/DE_diet_treatment-table.xlsx",
label_column = "mgi_symbol")
mm_diet_treatment_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 IDD_vs_IAD 74 70 77 76 75
## limma_sigdown
## 1 48
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
mm_diet_treatment_sig <- extract_significant_genes(mm_diet_treatment_table, excel = "excel/DE_diet_treatment-sig.xlsx")
mm_diet_treatment_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## IDD_vs_IAD 75 48 77 76 74 70 60
## ebseq_down basic_up basic_down
## IDD_vs_IAD 36 779 0
GSEA
adj_genes <- res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
arrange(desc(abs(log2FoldChange)))
adj_gost <- gost(query = adj_genes$ensembl_gene_id,
organism = "mmusculus", ordered_query = TRUE,
evcodes = TRUE)adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: Adjusted for Diet Significant GSEA")adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
datatable(rownames = FALSE, caption = "REAC: Adjusted for Diet Significant GSEA")adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "KEGG") %>%
datatable(rownames = FALSE, caption = "KEGG: Adjusted for Diet Significant GSEA")##### CL Enriched
gostplot(adj_gost, capped = FALSE, interactive = TRUE)library(openxlsx)
res_df <- res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= 0.5) %>%
arrange(desc(log2FoldChange)) %>%
select(-Row.names, -lfcSE, -stat, -pvalue, -ensembl_gene_id, -version, -transcript_version, -mgi_symbol)
openxlsx::write.xlsx(res_df, file = "excel/adjdiet_DE.xlsx")countdata <- exprs(mm38_filt)
coldata <- colData(mm38_filt)
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ treatment + diet)## converting counts to integer mode
dds <- DESeq(ddsMat)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
res <- merge(as.data.frame(res), hisat_annot, by = 0)
res <- res[!is.na(res$padj),]
res <- set_sig(res, factors = c("IDD \n Enriched", "IAD \n Enriched"))
volc_plot(res_tbl = res, type = "plotly", title = "IDD versus IAD after adjusting for Treatment")res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
select("ensembl_gene_id", "mgi_symbol", "description", "log2FoldChange", "padj", "Significance") %>%
arrange(desc(log2FoldChange)) %>%
datatable(rownames = FALSE)mm_treatment_diet <- set_expt_conditions(mm38_filt, fact = "treatment") %>%
set_expt_batches(fact = "diet")## The numbers of samples by condition are:
##
## cl salinevehicle
## 9 7
## The number of samples by batch are:
##
## IAD IDD
## 8 8
mm_treatment_diet_de <- all_pairwise(mm_treatment_diet, model_batch = TRUE)##
## cl salinevehicle
## 9 7
##
## IAD IDD
## 8 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
mm_treatment_diet_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## slnvhcl_v_
## basic_vs_deseq 0.8454
## basic_vs_dream 0.9393
## basic_vs_ebseq 0.8910
## basic_vs_edger 0.8614
## basic_vs_limma 0.9733
## basic_vs_noiseq 0.9469
## deseq_vs_dream 0.9152
## deseq_vs_ebseq 0.9611
## deseq_vs_edger 0.9974
## deseq_vs_limma 0.8734
## deseq_vs_noiseq 0.8917
## dream_vs_ebseq 0.9492
## dream_vs_edger 0.9304
## dream_vs_limma 0.9609
## dream_vs_noiseq 0.9742
## ebseq_vs_edger 0.9709
## ebseq_vs_limma 0.9025
## ebseq_vs_noiseq 0.9400
## edger_vs_limma 0.8871
## edger_vs_noiseq 0.9086
## limma_vs_noiseq 0.9272
mm_treatment_diet_table <- combine_de_tables(mm_treatment_diet_de, excel = "excel/DE_treatment_diet-table.xlsx",
label_column = "mgi_symbol")
mm_treatment_diet_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 salinevehicle_vs_cl 70 27 80 27
## limma_sigup limma_sigdown
## 1 1 26
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
mm_treatment_diet_sig <- extract_significant_genes(mm_treatment_diet_table, excel = "excel/DE_treatment_diet-sig.xlsx")
mm_treatment_diet_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## salinevehicle_vs_cl 1 26 80 27 70 27
## ebseq_up ebseq_down basic_up basic_down
## salinevehicle_vs_cl 5 11 0 0
GSEA
adj_genes <- res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= 1) %>%
arrange(desc(abs(log2FoldChange)))
adj_gost <- gost(query = adj_genes$ensembl_gene_id,
organism = "mmusculus",
ordered_query = TRUE,
evcodes = TRUE)adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
datatable(rownames = FALSE, caption = "GO: Diet after Adjusted for Treatment Significant GSEA")adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
datatable(rownames = FALSE, caption = "REAC: Diet after Adjusted for Treatment Significant GSEA")adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "KEGG") %>%
datatable(rownames = FALSE, caption = "KEGG: Diet after Adjusted for Treatment Significant GSEA")##### CL Enriched
gostplot(adj_gost, capped = FALSE, interactive = TRUE)gseaplot_GO_diet_adjtreat <- adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "GO:BP" | source == "GO:MF" | source == "GO:CC") %>%
head(n = 10) %>%
ggplot(aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity")+
scale_fill_continuous(low = "blue", high = "red") +
theme_bw()+
ylab("") +
xlab("GSEA Score")
gseaplot_REAC_diet_adjtreat <- adj_gost$result %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
filter(source == "REAC") %>%
head(n = 10) %>%
ggplot(aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity")+
scale_fill_continuous(low = "blue", high = "red") +
theme_bw()+
ylab("") +
xlab("GSEA Score")
gseaplot_GO_diet_adjtreatgseaplot_REAC_diet_adjtreatlibrary(openxlsx)
res_df <- res %>%
filter(padj < 0.05 & abs(log2FoldChange) >= .5) %>%
arrange(desc(log2FoldChange)) %>%
select(-Row.names, -lfcSE, -stat, -pvalue, -ensembl_gene_id, -version, -transcript_version, -hgnc_symbol)
openxlsx::write.xlsx(res_df, file = "excel/adjtreatment_DE.xlsx")I am going to use the three data structures I created in order to do the various GO/reactome/etc searches.
mm_de_gp <- all_gprofiler(mm_de_sig, species = "mmusculus")## There are only, 4 returning null.
## There are only, 5 returning null.
iddcl_vs_iadcl_up_gp <- write_gprofiler_data(
mm_de_gp[["IDDCL_vs_IADCL_up"]],
excel = "excel/iddcl_vs_iadcl_up_gp.xlsx")
iddcl_vs_iadcl_down_gp <- write_gprofiler_data(
mm_de_gp[["IDDCL_vs_IADCL_down"]],
excel = "excel/iddcl_vs_iadcl_down_gp.xlsx")
iddsaline_vs_iadsaline_up_gp <- write_gprofiler_data(
mm_de_gp[["IDDSaline_vs_IADSaline_up"]],
excel = "excel/iddsaline_vs_iadsaline_up_gp.xlsx")## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
iddsaline_vs_iadsaline_down_gp <- write_gprofiler_data(
mm_de_gp[["IDDSaline_vs_IADSaline_down"]],
excel = "excel/iddsaline_vs_iadsaline_up_gp.xlsx")## Deleting the file excel/iddsaline_vs_iadsaline_up_gp.xlsx before writing the tables.
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?`geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?`geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?`geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
iddsaline_vs_iddcl_up_gp <- write_gprofiler_data(
mm_de_gp[["IDDSaline_vs_IDDCL_up"]],
excel = "excel/iddsaline_vs_iddcl_up_gp.xlsx")## Warning: Workbook does not contain any worksheets. A worksheet will be added.
iddsaline_vs_iddcl_down_gp <- write_gprofiler_data(
mm_de_gp[["IDDSaline_vs_IDDCL_down"]],
excel = "excel/iddsaline_vs_iddcl_down_gp.xlsx")## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
The excel files created above have some pictures in them, but we can make some here as well…
Najib like the tree plots the best, let us try that first.
iddcl_iadcl_up_tree <- enrichplot::pairwise_termsim(
mm_de_gp[["IDDCL_vs_IADCL_up"]][["BP_enrich"]])
sm(enrichplot::treeplot(iddcl_iadcl_up_tree))sm(enrichplot::dotplot(iddcl_iadcl_up_tree))iddsaline_iadsaline_up_tree <- enrichplot::pairwise_termsim(
mm_de_gp[["IDDSaline_vs_IADSaline_up"]][["BP_enrich"]])
sm(enrichplot::treeplot(iddsaline_iadsaline_up_tree))sm(enrichplot::dotplot(iddsaline_iadsaline_up_tree))iddsaline_iddcl_up_tree <- enrichplot::pairwise_termsim(
mm_de_gp[["IDDSaline_vs_IDDCL_up"]][["BP_enrich"]])## Error: unable to find an inherited method for function 'pairwise_termsim' for signature 'x = "NULL"'
sm(enrichplot::treeplot(iddcl_iadcl_up_tree))sm(enrichplot::dotplot(iddcl_iadcl_up_tree))mm_diet_treatment_gp <- all_gprofiler(mm_diet_treatment_sig, species = "mmusculus")
mm_diet_treatment_up_gp <- write_gprofiler_data(
mm_diet_treatment_gp[["IDD_vs_IAD_up"]],
excel = "excel/mm_diet_treatment_idd_vs_iad_up_gp.xlsx")
mm_diet_treatment_down_gp <- write_gprofiler_data(
mm_diet_treatment_gp[["IDD_vs_IAD_down"]],
excel = "excel/mm_diet_treatment_idd_vs_iad_down_gp.xlsx")And some plots!
idd_iad_up_tree <- enrichplot::pairwise_termsim(
mm_diet_treatment_gp[["IDD_vs_IAD_up"]][["BP_enrich"]])
sm(enrichplot::treeplot(idd_iad_up_tree))sm(enrichplot::dotplot(idd_iad_up_tree))idd_iad_down_tree <- enrichplot::pairwise_termsim(
mm_diet_treatment_gp[["IDD_vs_IAD_down"]][["BP_enrich"]])
sm(enrichplot::treeplot(idd_iad_down_tree))sm(enrichplot::dotplot(idd_iad_down_tree))idd_iad_up_tree <- enrichplot::pairwise_termsim(
mm_diet_treatment_gp[["IDD_vs_IAD_up"]][["REAC_enrich"]])
sm(enrichplot::dotplot(idd_iad_up_tree))idd_iad_down_tree <- enrichplot::pairwise_termsim(
mm_diet_treatment_gp[["IDD_vs_IAD_down"]][["REAC_enrich"]])
sm(enrichplot::dotplot(idd_iad_down_tree))mm_treatment_diet_gp <- all_gprofiler(mm_treatment_diet_sig, species = "mmusculus")
mm_treatment_diet_up_gp <- write_gprofiler_data(
mm_treatment_diet_gp[["IDD_vs_IAD_up"]],
excel = "excel/mm_treatment_diet_idd_vs_iad_up_gp.xlsx")## Warning: Workbook does not contain any worksheets. A worksheet will be added.
mm_treatment_diet_down_gp <- write_gprofiler_data(
mm_treatment_diet_gp[["IDD_vs_IAD_down"]],
excel = "excel/mm_treatment_diet_idd_vs_iad_down_gp.xlsx")## Warning: Workbook does not contain any worksheets. A worksheet will be added.
And some plots!
saline_cl_up_tree <- enrichplot::pairwise_termsim(
mm_treatment_diet_gp[["salinevehicle_vs_cl_up"]][["BP_enrich"]])
sm(enrichplot::treeplot(idd_iad_up_tree))sm(enrichplot::dotplot(idd_iad_up_tree))saline_cl_down_tree <- enrichplot::pairwise_termsim(
mm_treatment_diet_gp[["salinevehicle_vs_cl_down"]][["BP_enrich"]])
sm(enrichplot::dotplot(idd_iad_down_tree))mm_de_cp <- all_cprofiler(mm_de_sig, mm_de_tables, orgdb = "org.Mm.eg.db")## Error in loadNamespace(orgdb): there is no package called 'org.Mm.eg.db'
iddcl_vs_iadcl_gsea <- plot_topn_gsea(mm_de_cp[["IDDCL_vs_IADCL_up"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'gse' in selecting a method for function 'plot_topn_gsea': object 'mm_de_cp' not found
iddcl_vs_iadcl_gsea[["GO"]][[1]]## Error in eval(expr, envir, enclos): object 'iddcl_vs_iadcl_gsea' not found
enrichplot::dotplot(mm_de_cp[["IDDCL_vs_IADCL_up"]][["enrich_objects"]][["BP_all"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'mm_de_cp' not found
iddsaline_vs_iadsaline_gsea <- plot_topn_gsea(mm_de_cp[["IDDSaline_vs_IADSaline_up"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'gse' in selecting a method for function 'plot_topn_gsea': object 'mm_de_cp' not found
iddsaline_vs_iadsaline_gsea[["GO"]][[1]]## Error in eval(expr, envir, enclos): object 'iddsaline_vs_iadsaline_gsea' not found
enrichplot::dotplot(mm_de_cp[["IDDSaline_vs_IADSaline_up"]][["enrich_objects"]][["BP_all"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'mm_de_cp' not found
iadsaline_vs_iadcl_gsea <- plot_topn_gsea(mm_de_cp[["IADSaline_vs_IADCL_up"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'gse' in selecting a method for function 'plot_topn_gsea': object 'mm_de_cp' not found
iadsaline_vs_iadcl_gsea[["GO"]][[1]]## Error in eval(expr, envir, enclos): object 'iadsaline_vs_iadcl_gsea' not found
enrichplot::dotplot(mm_de_cp[["IADSaline_vs_IADCL_up"]][["enrich_objects"]][["BP_all"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'mm_de_cp' not found
enrichplot::dotplot(mm_de_cp[["IADSaline_vs_IADCL_down"]][["enrich_objects"]][["BP_all"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'mm_de_cp' not found