The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document.
I am going to try to standardize how I name the various data structures created in this document. Most of the large data created are either sets of differential expression analyses, their combined results, or the set of results deemed ‘significant’.
Hopefully by now they all follow these guidelines:
{clinic(s)}sample-subset}{primary-question(s)}{datatype}{batch-method}
With this in mind, ‘tc_biopsies_clinic_de_sva’ should be the Tumaco+Cali biopsy data after performing the differential expression analyses comparing the clinics using sva.
I suspect there remain some exceptions and/or errors.
Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus tc_cf_contrasts is expected to be used for the Tumaco+Cali data and provide a series of cure/fail comparisons which (to the extent possible) across both locations. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.
Most (all?) of the overenrichment/GSEA analyses used in this paper were done via gProfiler and clusterProfiler rather than goseq/topGO/GOstats. Primarily because it is so easy to invoke gprofiler and because clusterProfiler makes it very easy to invoke the GSEA analyses. One fun thing I did relatively recently is coerce the results from all methods into the clusterProfiler enrichment object type, thus you may pass any result directly to the various enrichplot functions.
clinic_contrasts <- list(
"clinics" = c("cali", "tumaco"))
## In some cases we have no Cali failure samples, so there remain only 2
## contrasts that are likely of interest
tc_cf_contrasts <- list(
"tumaco" = c("tumaco_failure", "tumaco_cure"),
"cure" = c("tumaco_cure", "cali_cure"))
## In other cases, we have cure/fail for both places.
clinic_cf_contrasts <- list(
"cali" = c("cali_failure", "cali_cure"),
"tumaco" = c("tumaco_failure", "tumaco_cure"),
"cure" = c("tumaco_cure", "cali_cure"),
"fail" = c("tumaco_failure", "cali_failure"))
cf_contrast <- list(
"outcome" = c("tumaco_failure", "tumaco_cure"))
t_cf_contrast <- list(
"outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
"v1cf" = c("v1_failure", "v1_cure"),
"v2cf" = c("v2_failure", "v2_cure"),
"v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
"v2v1" = c("c2", "c1"),
"v3v1" = c("c3", "c1"),
"v3v2" = c("c3", "c2"))
visit_v1later <- list(
"later_vs_first" = c("later", "first"))
celltypes <- list(
"eo_mono" = c("eosinophils", "monocytes"),
"ne_mono" = c("neutrophils", "monocytes"),
"eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
"mestizo_indigenous" = c("mestiza", "indigena"),
"mestizo_afrocol" = c("mestiza", "afrocol"),
"indigenous_afrocol" = c("indigena", "afrocol"))Perform a svaseq-guided comparison of the two clinics. Ideally this will give some clue about just how strong the clinic-based batch effect really is and what its causes are.
tc_clinic_type <- tc_valid %>%
set_expt_conditions(fact = "clinic") %>%
set_expt_batches(fact = "typeofcells")## The numbers of samples by condition are:
##
## cali tumaco
## 61 123
## The number of samples by batch are:
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
table(pData(tc_clinic_type)[["condition"]])##
## cali tumaco
## 61 123
tc_all_clinic_de_sva <- all_pairwise(tc_clinic_type, model_batch = "svaseq",
filter = TRUE, methods = methods)##
## cali tumaco
## 61 123
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_all_clinic_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tumc_vs_cl
## basic_vs_deseq 0.7977
## basic_vs_dream 0.9210
## basic_vs_ebseq 0.7545
## basic_vs_edger 0.8733
## basic_vs_limma 0.9696
## basic_vs_noiseq 0.9114
## deseq_vs_dream 0.8696
## deseq_vs_ebseq 0.8431
## deseq_vs_edger 0.9262
## deseq_vs_limma 0.7895
## deseq_vs_noiseq 0.8657
## dream_vs_ebseq 0.8550
## dream_vs_edger 0.9442
## dream_vs_limma 0.9372
## dream_vs_noiseq 0.9353
## ebseq_vs_edger 0.8612
## ebseq_vs_limma 0.7772
## ebseq_vs_noiseq 0.8376
## edger_vs_limma 0.8623
## edger_vs_noiseq 0.9364
## limma_vs_noiseq 0.8900
tc_all_clinic_de_sva[["deseq"]][["contrasts_performed"]]## [1] "tumaco_vs_cali"
tc_all_clinic_table_sva <- combine_de_tables(
tc_all_clinic_de_sva, keepers = clinic_contrasts,
excel = glue("{clinic_prefix}/tc_all_clinic_table_sva-v{ver}.xlsx"))
tc_all_clinic_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 tumaco_vs_cali-inverted 270 1788 322 1660
## limma_sigup limma_sigdown
## 1 388 604
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tc_all_clinic_sig_sva <- extract_significant_genes(
tc_all_clinic_table_sva,
excel = glue("{clinic_prefix}/compare_clinics/tc_clinic_type_sig_sva-v{ver}.xlsx"))
tc_all_clinic_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## clinics 388 604 322 1660 270 1788 222
## ebseq_down basic_up basic_down
## clinics 420 0 5902
increased_tumaco_categories_up <- simple_gprofiler(
tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]],
excel = glue("{gsea_prefix}/tumaco_cateogies_up-v{ver}.xlsx"))
increased_tumaco_categories_up## A set of ontologies produced by gprofiler using 270
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 17 MF
## 5 BP
## 1 CC
## 1 KEGG
## 1 REAC
## 0 WP
## 105 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
increased_tumaco_categories_up[["pvalue_plots"]][["BP"]]## NULL
increased_cali_categories <- simple_gprofiler(
tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]],
excel = glue("{gsea_prefix}/cali_cateogies_up-v{ver}.xlsx"))
increased_cali_categories## A set of ontologies produced by gprofiler using 1788
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 59 MF
## 689 BP
## 89 CC
## 2 KEGG
## 20 REAC
## 8 WP
## 356 TF
## 2 MIRNA
## 18 HPA
## 0 CORUM
## 15 HP hits.
increased_cali_categories[["pvalue_plots"]][["BP"]]## NULL
Let us take a quick look at the results of the comparison of Tumaco/Cali.
Note: I keep re-introducing an error which causes these (volcano and MA) plots to be reversed with respect to the logFC values. Pay careful attention to these and make sure that they agree with the numbers of genes observed in the contrast.
I eventually took some code from Theresa which more intelligently colors the sides of MA/volcano plots to be the same colors as their corresponding numerator/denominator.
## Check that up is up
summary(tc_all_clinic_table_sva[["data"]][["clinics"]][["deseq_logfc"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.270 -0.583 -0.156 -0.255 0.171 3.483
## I think we can assume that most genes are down when considering Tumaco/Cali.
sum(tc_all_clinic_table_sva$data$clinics$deseq_logfc < -1.0 &
tc_all_clinic_table_sva$data$clinics$deseq_adjp < 0.05)## [1] 1788
tc_all_clinic_table_sva[["plots"]][["clinics"]][["deseq_vol_plots"]]There appear to be many more genes which are increased in the Tumaco samples with respect to the Cali samples.
The remaining cell types all have pretty strong clinic-based variance; but I am not certain if it is consistent across cell types.
table(pData(tc_eosinophils)[["condition"]])##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
tc_eosinophils_clinic_de_nobatch <- all_pairwise(tc_eosinophils, parallel = parallel,
model_batch = FALSE, filter = TRUE,
methods = methods)##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_eosinophils_clinic_de_nobatch## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_eosinophils_clinic_de_nobatch[["deseq"]][["contrasts_performed"]]## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"
## [3] "tumaco_cure_vs_cali_cure"
tc_eosinophils_clinic_table_nobatch <- combine_de_tables(
tc_eosinophils_clinic_de_nobatch, keepers = tc_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_table_nobatch## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 102 35 114
## 2 tumaco_cure_vs_cali_cure 828 811 778
## edger_sigdown limma_sigup limma_sigdown
## 1 31 61 17
## 2 885 711 700
## Plot describing unique/shared genes in a differential expression table.
tc_eosinophils_clinic_sig_nobatch <- extract_significant_genes(
tc_eosinophils_clinic_table_nobatch,
excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_nobatch## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco 61 17 114 31 102 35 7
## cure 711 700 778 885 828 811 694
## ebseq_down basic_up basic_down
## tumaco 36 10 0
## cure 596 5543 0
tc_eosinophils_clinic_de_sva <- all_pairwise(tc_eosinophils, model_batch = "svaseq",
filter = TRUE, methods = methods)##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_eosinophils_clinic_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
tc_eosinophils_clinic_de_sva[["deseq"]][["contrasts_performed"]]## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"
## [3] "tumaco_cure_vs_cali_cure"
tc_eosinophils_clinic_table_sva <- combine_de_tables(
tc_eosinophils_clinic_de_sva, keepers = tc_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 111 61 114
## 2 tumaco_cure_vs_cali_cure 785 861 716
## edger_sigdown limma_sigup limma_sigdown
## 1 36 74 35
## 2 925 734 672
## Plot describing unique/shared genes in a differential expression table.
tc_eosinophils_clinic_sig_sva <- extract_significant_genes(
tc_eosinophils_clinic_table_sva,
excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco 74 35 114 36 111 61 7
## cure 734 672 716 925 785 861 694
## ebseq_down basic_up basic_down
## tumaco 36 10 0
## cure 596 5543 0
Interestingly to me, the biopsy samples appear to have the least location-based variance. But we can perform an explicit DE and see how well that hypothesis holds up.
Note that these data include cure and fail samples for
table(pData(tc_biopsies)[["condition"]])##
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
tc_biopsies_clinic_de_sva <- all_pairwise(tc_biopsies, parallel = parallel,
model_batch = "svaseq", filter = TRUE,
methods = methods)##
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_biopsies_clinic_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
tc_biopsies_clinic_de_sva[["deseq"]][["contrasts_performed"]]## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"
## [3] "tumaco_cure_vs_cali_cure"
tc_biopsies_clinic_table_sva <- combine_de_tables(
tc_biopsies_clinic_de_sva, keepers = tc_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_table_sva-v{ver}.xlsx"))
tc_biopsies_clinic_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 14 11 18
## 2 tumaco_cure_vs_cali_cure 1 0 0
## edger_sigdown limma_sigup limma_sigdown
## 1 6 0 0
## 2 0 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tc_biopsies_clinic_sig_sva <- extract_significant_genes(
tc_biopsies_clinic_table_sva,
excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_sig_sva-v{ver}.xlsx"))
tc_biopsies_clinic_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco 0 0 18 6 14 11 11
## cure 0 0 0 0 1 0 27
## ebseq_down basic_up basic_down
## tumaco 60 0 0
## cure 1 0 0
At least for the moment, I am only looking at the differences between no-batch vs. sva across clinics for the monocyte samples. This was chosen mostly arbitrarily.
Our baseline is the comparison of the monocytes samples without batch in the model or surrogate estimation. In theory at least, this should correspond to the PCA plot above when no batch estimation was performed.
table(pData(tc_monocytes)[["condition"]])##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
tc_monocytes_de_nobatch <- all_pairwise(tc_monocytes, model_batch = FALSE,
filter = TRUE,
methods = methods)##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_monocytes_de_nobatch## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_monocytes_table_nobatch <- combine_de_tables(
tc_monocytes_de_nobatch, keepers = clinic_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_nobatch-v{ver}.xlsx"))
tc_monocytes_table_nobatch## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 cali_failure_vs_cali_cure 16 20 32
## 2 tumaco_failure_vs_tumaco_cure 48 120 60
## 3 tumaco_cure_vs_cali_cure 781 724 773
## 4 tumaco_failure_vs_cali_failure 633 488 515
## edger_sigdown limma_sigup limma_sigdown
## 1 13 38 5
## 2 138 23 37
## 3 779 644 713
## 4 535 395 564
## Plot describing unique/shared genes in a differential expression table.
tc_monocytes_sig_nobatch <- extract_significant_genes(
tc_monocytes_table_nobatch,
excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_nobatch-v{ver}.xlsx"))
tc_monocytes_sig_nobatch## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali 38 5 32 13 16 20 92
## tumaco 23 37 60 138 48 120 0
## cure 644 713 773 779 781 724 642
## fail 395 564 515 535 633 488 166
## ebseq_down basic_up basic_down
## cali 23 0 0
## tumaco 23 339 0
## cure 660 6378 0
## fail 525 2160 0
In contrast, the following comparison should give a view of the data corresponding to the svaseq PCA plot above. In the best case scenario, we should therefore be able to see some differences between the Tumaco cure and fail samples.
tc_monocytes_de_sva <- all_pairwise(tc_monocytes, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_monocytes_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
tc_monocytes_table_sva <- combine_de_tables(
tc_monocytes_de_sva, keepers = clinic_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_sva-v{ver}.xlsx"))
tc_monocytes_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 cali_failure_vs_cali_cure 27 36 40
## 2 tumaco_failure_vs_tumaco_cure 34 88 29
## 3 tumaco_cure_vs_cali_cure 763 728 711
## 4 tumaco_failure_vs_cali_failure 684 583 576
## edger_sigdown limma_sigup limma_sigdown
## 1 17 51 7
## 2 71 15 57
## 3 758 640 663
## 4 615 430 553
## Plot describing unique/shared genes in a differential expression table.
tc_monocytes_sig_sva <- extract_significant_genes(
tc_monocytes_table_sva,
excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_sva-v{ver}.xlsx"))
tc_monocytes_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali 51 7 40 17 27 36 92
## tumaco 15 57 29 71 34 88 0
## cure 640 663 711 758 763 728 642
## fail 430 553 576 615 684 583 166
## ebseq_down basic_up basic_down
## cali 23 0 0
## tumaco 23 339 0
## cure 660 6378 0
## fail 525 2160 0
The following block shows that these two results are exceedingly different, sugesting that the Cali cure/fail and Tumaco cure/fail cannot easily be considered in the same analysis. I did some playing around with my calculate_aucc function in this block and found that it is in some important way broken, at least if one expands the top-n genes to more than 20% of the number of genes in the data.
cali_table <- tc_monocytes_table_nobatch[["data"]][["cali"]]
table <- tc_monocytes_table_nobatch[["data"]][["tumaco"]]
cali_aucc <- calculate_aucc(cali_table, table, px = "deseq_adjp", py = "deseq_adjp",
lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc## These two tables have an aucc value of: 0.0659989114452595 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 1.3, df = 11084, p-value = 0.2
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.005944 0.031280
## sample estimates:
## cor
## 0.01267
cali_table_sva <- tc_monocytes_table_sva[["data"]][["cali"]]
tumaco_table_sva <- tc_monocytes_table_sva[["data"]][["tumaco"]]
cali_aucc_sva <- calculate_aucc(cali_table_sva, tumaco_table_sva, px = "deseq_adjp",
py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_sva## These two tables have an aucc value of: 0.0846505658047588 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 17, df = 11084, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1396 0.1759
## sample estimates:
## cor
## 0.1578
tc_neutrophils_de_nobatch <- all_pairwise(tc_neutrophils, parallel = parallel,
model_batch = FALSE, filter = TRUE,
methods = methods)##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_neutrophils_de_nobatch## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_neutrophils_table_nobatch <- combine_de_tables(
tc_neutrophils_de_nobatch, keepers = clinic_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_nobatch-v{ver}.xlsx"))
tc_neutrophils_table_nobatch## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 cali_failure_vs_cali_cure 33 85 42
## 2 tumaco_failure_vs_tumaco_cure 95 49 112
## 3 tumaco_cure_vs_cali_cure 905 337 913
## 4 tumaco_failure_vs_cali_failure 983 256 803
## edger_sigdown limma_sigup limma_sigdown
## 1 33 37 10
## 2 55 7 12
## 3 355 627 520
## 4 281 380 460
## Plot describing unique/shared genes in a differential expression table.
tc_neutrophils_sig_nobatch <- extract_significant_genes(
tc_neutrophils_table_nobatch,
excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_nobatch-v{ver}.xlsx"))
tc_neutrophils_sig_nobatch## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali 37 10 42 33 33 85 90
## tumaco 7 12 112 55 95 49 7
## cure 627 520 913 355 905 337 683
## fail 380 460 803 281 983 256 113
## ebseq_down basic_up basic_down
## cali 39 0 0
## tumaco 7 8 0
## cure 299 4589 0
## fail 310 1652 0
tc_neutrophils_de_sva <- all_pairwise(tc_neutrophils, parallel = parallel,
model_batch = "svaseq", filter = TRUE,
methods = methods)##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_neutrophils_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
tc_neutrophils_table_sva <- combine_de_tables(
tc_neutrophils_de_sva, keepers = clinic_cf_contrasts,
excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_sva-v{ver}.xlsx"))
tc_neutrophils_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 cali_failure_vs_cali_cure 88 183 102
## 2 tumaco_failure_vs_tumaco_cure 92 42 80
## 3 tumaco_cure_vs_cali_cure 853 379 831
## 4 tumaco_failure_vs_cali_failure 704 201 611
## edger_sigdown limma_sigup limma_sigdown
## 1 121 76 49
## 2 24 44 51
## 3 384 650 486
## 4 220 312 332
## Plot describing unique/shared genes in a differential expression table.
tc_neutrophils_sig_sva <- extract_significant_genes(
tc_neutrophils_table_sva,
excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_sva-v{ver}.xlsx"))
tc_neutrophils_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali 76 49 102 121 88 183 90
## tumaco 44 51 80 24 92 42 7
## cure 650 486 831 384 853 379 683
## fail 312 332 611 220 704 201 113
## ebseq_down basic_up basic_down
## cali 39 0 0
## tumaco 7 8 0
## cure 299 4589 0
## fail 310 1652 0
Given the above comparisons, we can extract some gene sets which resulted from those DE analyses and eventually perform some ontology/KEGG/reactome/etc searches. This reminds me, I want to make my extract_significant_ functions to return gene-set data structures and my various ontology searches to take them as inputs. This should help avoid potential errors when extracting up/down genes.
clinic_sigenes_up <- rownames(tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]])
clinic_sigenes_down <- rownames(tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]])
clinic_sigenes <- c(clinic_sigenes_up, clinic_sigenes_down)
tc_eosinophils_sigenes_up <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_eosinophils_sigenes_down <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_monocytes_sigenes_up <- rownames(tc_monocytes_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_monocytes_sigenes_down <- rownames(tc_monocytes_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_neutrophils_sigenes_up <- rownames(tc_neutrophils_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_neutrophils_sigenes_down <- rownames(tc_neutrophils_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_eosinophils_sigenes <- c(tc_eosinophils_sigenes_up,
tc_eosinophils_sigenes_down)
tc_monocytes_sigenes <- c(tc_monocytes_sigenes_up,
tc_monocytes_sigenes_down)
tc_neutrophils_sigenes <- c(tc_neutrophils_sigenes_up,
tc_neutrophils_sigenes_down)I was curious to try to understand why the two clinics appear to be so different vis a vis their PCA/DE; so I thought that gProfiler might help boil those results down to something more digestible.
Note that in the following block I used the function simple_gprofiler(), but later in this document I will use all_gprofiler(). The first invocation limits the search to a single table, while the second will iterate over every result in a pairwise differential expression analysis.
In this instance, we are looking at the vector of gene IDs deemed significantly different between the two clinics in either the up or down direction.
One other thing worth noting, the new version of gProfiler provides some fun interactive plots. I will add an example here.
tc_eosinophil_gprofiler <- simple_gprofiler(
tc_eosinophils_sigenes_up,
excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophil_gprofiler## A set of ontologies produced by gprofiler using 785
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 21 MF
## 213 BP
## 16 CC
## 0 KEGG
## 3 REAC
## 0 WP
## 549 TF
## 11 MIRNA
## 0 HPA
## 4 CORUM
## 0 HP hits.
clinic_gp <- simple_gprofiler(
clinic_sigenes,
excel = glue("{gsea_prefix}/both_clinics_cali_up-v{ver}.xlsx"))
clinic_gp$pvalue_plots$REACclinic_gp$pvalue_plots$BP## NULL
clinic_gp$pvalue_plots$TFclinic_gp$interactive_plots$GO## NULL
In the following block, I am looking at the gProfiler over represented groups observed across clinics in only the Eosinophils. First I do so for all genes(up or down), followed by only the up and down groups. Each of the following will include only the Reactome and GO:BP plots. These searches did not have too many other hits, excepting the transcription factor database.
tc_eosinophils_gp <- simple_gprofiler(
tc_eosinophils_sigenes,
excel = glue("{gsea_prefix}/eosinophil_clinics-v{ver}.xlsx"))
tc_eosinophils_gp## A set of ontologies produced by gprofiler using 1646
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 39 MF
## 281 BP
## 50 CC
## 0 KEGG
## 0 REAC
## 0 WP
## 582 TF
## 13 MIRNA
## 0 HPA
## 5 CORUM
## 0 HP hits.
tc_eosinophils_gp$pvalue_plots$REAC## NULL
tc_eosinophils_gp$pvalue_plots$BP## NULL
tc_eosinophils_up_gp <- simple_gprofiler(
tc_eosinophils_sigenes_up,
excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophils_up_gp## A set of ontologies produced by gprofiler using 785
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 21 MF
## 213 BP
## 16 CC
## 0 KEGG
## 3 REAC
## 0 WP
## 549 TF
## 11 MIRNA
## 0 HPA
## 4 CORUM
## 0 HP hits.
tc_eosinophils_up_gp$pvalue_plots$REACtc_eosinophils_down_gp <- simple_gprofiler(
tc_eosinophils_sigenes_down,
excel = glue("{gsea_prefix}/eosinophil_clinics_cali_up-v{ver}.xlsx"))
tc_eosinophils_down_gp## A set of ontologies produced by gprofiler using 861
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 12 MF
## 126 BP
## 70 CC
## 3 KEGG
## 10 REAC
## 3 WP
## 70 TF
## 1 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
tc_eosinophils_down_gp$pvalue_plots$REACIn the following block I repeated the above query, but this time looking at the monocyte samples.
tc_monocytes_up_gp <- simple_gprofiler(
tc_monocytes_sigenes,
excel = glue("{gsea_prefix}/monocyte_clinics-v{ver}.xlsx"))
tc_monocytes_up_gp## A set of ontologies produced by gprofiler using 1491
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 57 MF
## 480 BP
## 29 CC
## 0 KEGG
## 5 REAC
## 6 WP
## 545 TF
## 4 MIRNA
## 0 HPA
## 1 CORUM
## 0 HP hits.
tc_monocytes_up_gp$pvalue_plots$REACtc_monocytes_up_gp$pvalue_plots$BP## NULL
tc_monocytes_down_gp <- simple_gprofiler(
tc_monocytes_sigenes_down,
excel = glue("{gsea_prefix}/monocyte_clinics_cali_up-v{ver}.xlsx"))
tc_monocytes_down_gp$pvalue_plots$REACtc_monocytes_down_gp$pvalue_plots$BP## NULL
Ibid. This time looking at the Neutrophils. Thus the first two images should be a superset of the second and third pairs of images; assuming that the genes in the up/down list do not cause the groups to no longer be significant. Interestingly, the reactome search did not return any hits for the increased search.
tc_neutrophils_gp <- simple_gprofiler(
tc_neutrophils_sigenes,
excel = glue("{gsea_prefix}/neutrophil_clinics-v{ver}.xlsx"))
## tc_neutrophils_gp$pvalue_plots$REAC ## no hits
tc_neutrophils_gp$pvalue_plots$BP## NULL
tc_neutrophils_gp$pvalue_plots$TFtc_neutrophils_up_gp <- simple_gprofiler(
tc_neutrophils_sigenes_up,
excel = glue("{gsea_prefix}/neutrophil_clinics_tumaco_up-v{ver}.xlsx"))
## tc_neutrophils_up_gp$pvalue_plots$REAC ## No hits
tc_neutrophils_up_gp$pvalue_plots$BP## NULL
tc_neutrophils_down_gp <- simple_gprofiler(
tc_neutrophils_sigenes_down,
excel = glue("{gsea_prefix}/neutrophil_clinics_cali_up-v{ver}.xlsx"))
tc_neutrophils_down_gp$pvalue_plots$REACtc_neutrophils_down_gp$pvalue_plots$BP## NULL
The following expands the cross-clinic query above to also test the neutrophils. Once again, I think it will pretty strongly support the hypothesis that the two clinics are not compatible.
We are concerned that the clinic-based batch effect may make our results essentially useless. One way to test this concern is to compare the set of genes observed different between the Cali Cure/Fail vs. the Tumaco Cure/Fail.
cali_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["cali"]]
tumaco_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["tumaco"]]
cali_merged_nobatch <- merge(cali_table_nobatch, tumaco_table_nobatch, by="row.names")
cor.test(cali_merged_nobatch[, "deseq_logfc.x"], cali_merged_nobatch[, "deseq_logfc.y"])##
## Pearson's product-moment correlation
##
## data: cali_merged_nobatch[, "deseq_logfc.x"] and cali_merged_nobatch[, "deseq_logfc.y"]
## t = -16, df = 9229, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1821 -0.1423
## sample estimates:
## cor
## -0.1623
cali_aucc_nobatch <- calculate_aucc(cali_table_nobatch, tumaco_table_nobatch, px = "deseq_adjp",
py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_nobatch$plotIn all of the above, we are looking to understand the differences between the two locations. Let us now step back and perform the original question: fail/cure without regard to location.
I performed this query with a few different parameters, notably with(out) sva and again using each cell type, including biopsies. The main reasion I am keeping these comparisons is in the relatively weak hope that there will be sufficient signal in the full dataset that it might be able to overcome the apparently ridiculous batch effect from the two clinics.
table(pData(tc_valid)[["condition"]])##
## cure failure
## 122 62
tc_all_cf_de_sva <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 122 62
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_all_cf_table_sva <- combine_de_tables(
tc_all_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_sva-v{ver}.xlsx"))
tc_all_cf_sig_sva <- extract_significant_genes(
tc_all_cf_table_sva,
excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_sva-v{ver}.xlsx"))
tc_all_cf_de_batch <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
model_batch = TRUE)##
## cure failure
## 122 62
##
## v3 v2 v1
## 51 50 83
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_all_cf_table_batch <- combine_de_tables(
tc_all_cf_de_batch, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_batch-v{ver}.xlsx"))
tc_all_cf_sig_batch <- extract_significant_genes(
tc_all_cf_table_batch,
excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_batch-v{ver}.xlsx"))I am not sure if this is the best choice, but I call the set of all samples excluding biopsies ‘clinical’.
table(pData(tc_clinical_nobiop)[["condition"]])##
## cure failure
## 109 57
tc_clinical_cf_de_sva <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## cure failure
## 109 57
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_clinical_cf_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.8865
## basic_vs_dream 0.9294
## basic_vs_ebseq 0.7595
## basic_vs_edger 0.9007
## basic_vs_limma 0.9415
## basic_vs_noiseq 0.9213
## deseq_vs_dream 0.8999
## deseq_vs_ebseq 0.8271
## deseq_vs_edger 0.9919
## deseq_vs_limma 0.8936
## deseq_vs_noiseq 0.9409
## dream_vs_ebseq 0.8102
## dream_vs_edger 0.9024
## dream_vs_limma 0.9855
## dream_vs_noiseq 0.8709
## ebseq_vs_edger 0.8188
## ebseq_vs_limma 0.8184
## ebseq_vs_noiseq 0.8312
## edger_vs_limma 0.8971
## edger_vs_noiseq 0.9517
## limma_vs_noiseq 0.8733
tc_clinical_cf_table_sva <- combine_de_tables(
tc_clinical_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_sva-v{ver}.xlsx"))
tc_clinical_cf_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 181 93 209 92
## limma_sigup limma_sigdown
## 1 96 78
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tc_clinical_cf_sig_sva <- extract_significant_genes(
tc_clinical_cf_table_sva, according_to = "deseq",
excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_sva-v{ver}.xlsx"))
tc_clinical_cf_sig_sva## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## outcome 181 93
tc_clinical_cf_de_batch <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
model_batch = TRUE,
methods = methods)##
## cure failure
## 109 57
##
## eosinophils monocytes neutrophils
## 41 63 62
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_clinical_cf_de_batch## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.6444
## basic_vs_dream 0.7017
## basic_vs_ebseq 0.7595
## basic_vs_edger 0.6445
## basic_vs_limma 0.7122
## basic_vs_noiseq 0.9213
## deseq_vs_dream 0.8060
## deseq_vs_ebseq 0.7628
## deseq_vs_edger 0.9991
## deseq_vs_limma 0.8096
## deseq_vs_noiseq 0.7631
## dream_vs_ebseq 0.6730
## dream_vs_edger 0.8114
## dream_vs_limma 0.9716
## dream_vs_noiseq 0.6695
## ebseq_vs_edger 0.7626
## ebseq_vs_limma 0.6897
## ebseq_vs_noiseq 0.8312
## edger_vs_limma 0.8140
## edger_vs_noiseq 0.7631
## limma_vs_noiseq 0.6685
tc_clinical_cf_table_batch <- combine_de_tables(
tc_clinical_cf_de_batch, keepers = t_cf_contrast, label_column = "hgnc_symbol",
excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_batch-v{ver}.xlsx"))
tc_clinical_cf_table_batch## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 104 68 114 73
## limma_sigup limma_sigdown
## 1 81 45
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tc_clinical_cf_sig_batch <- extract_significant_genes(
tc_clinical_cf_table_batch, according_to = "deseq",
excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_batch-v{ver}.xlsx"))
tc_clinical_cf_sig_batch## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## outcome 104 68
num_color <- color_choices[["cf"]][["cure"]]
den_color <- color_choices[["cf"]][["failure"]]
tc_clinical_cf_table <- tc_clinical_cf_table_sva[["data"]][["outcome"]]
tc_clinical_cf_volcano_top10 <- plot_volcano_condition_de(
tc_clinical_cf_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/s11c_tc_clinical_cf_volcano_labeled_top10.svg")
tc_clinical_cf_volcano_top10[["plot"]]
dev.off()## png
## 2
tc_clinical_cf_volcano_top10[["plot"]]In the following block, we repeat the same question, but using only the biopsy samples from both clinics.
tc_biopsies_cf <- set_expt_conditions(tc_biopsies, fact = "finaloutcome")## The numbers of samples by condition are:
##
## cure failure
## 13 5
tc_biopsies_cf_de_sva <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 13 5
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_biopsies_cf_table_sva <- combine_de_tables(
tc_biopsies_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Biopsies/tc_biopsies_cf_table_sva-v{ver}.xlsx"))
tc_biopsies_cf_sig_sva <- extract_significant_genes(
tc_biopsies_cf_table_sva,
excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_sva-v{ver}.xlsx"))
tc_biopsies_cf_de_batch <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
model_batch = TRUE)##
## cure failure
## 13 5
##
## v1
## 18
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_biopsies_cf_table_batch <- combine_de_tables(
tc_biopsies_cf_de_batch, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_table_batch-v{ver}.xlsx"))
tc_biopsies_cf_sig_batch <- extract_significant_genes(
tc_biopsies_cf_table_batch,
excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_batch-v{ver}.xlsx"))In the following block, we repeat the same question, but using only the Eosinophil samples from both clinics.
tc_eosinophils_cf <- set_expt_conditions(tc_eosinophils, fact = "finaloutcome")## The numbers of samples by condition are:
##
## cure failure
## 32 9
tc_eosinophils_cf_de_sva <- all_pairwise(tc_eosinophils_cf, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 32 9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_eosinophils_cf_table_sva <- combine_de_tables(
tc_eosinophils_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Eosinophils/tc_eosinophils_cf_table_sva-v{ver}.xlsx"))
tc_eosinophils_cf_sig_sva <- extract_significant_genes(
tc_eosinophils_cf_table_sva,
excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_sva-v{ver}.xlsx"))
tc_eosinophils_cf_de_batch <- all_pairwise(tc_eosinophils_cf, filter = TRUE,
model_batch = TRUE,
methods = methods)##
## cure failure
## 32 9
##
## v3 v2 v1
## 13 14 14
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_eosinophils_cf_table_batch <- combine_de_tables(
tc_eosinophils_cf_de_batch, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_table_batch-v{ver}.xlsx"))
tc_eosinophils_cf_sig_batch <- extract_significant_genes(
tc_eosinophils_cf_table_batch,
excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_batch-v{ver}.xlsx"))Repeat yet again, this time with the monocyte samples. The idea is to see if there is a cell type which is particularly good (or bad) at discriminating the two clinics.
tc_monocytes_cf <- set_expt_conditions(tc_monocytes, fact = "finaloutcome")## The numbers of samples by condition are:
##
## cure failure
## 39 24
tc_monocytes_cf_de_sva <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 39 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_monocytes_cf_table_sva <- combine_de_tables(
tc_monocytes_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Monocytes/tc_monocytes_cf_table_sva-v{ver}.xlsx"))
tc_monocytes_cf_sig_sva <- extract_significant_genes(
tc_monocytes_cf_table_sva,
excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_sva-v{ver}.xlsx"))
tc_monocytes_cf_de_batch <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
model_batch = TRUE)##
## cure failure
## 39 24
##
## v3 v2 v1
## 19 18 26
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_monocytes_cf_table_batch <- combine_de_tables(
tc_monocytes_cf_de_batch, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_table_batch-v{ver}.xlsx"))
tc_monocytes_cf_sig_batch <- extract_significant_genes(
tc_monocytes_cf_table_batch,
excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_batch-v{ver}.xlsx"))Last try, this time using the Neutrophil samples.
tc_neutrophils_cf <- set_expt_conditions(tc_neutrophils, fact = "finaloutcome")## The numbers of samples by condition are:
##
## cure failure
## 38 24
tc_neutrophils_cf_de_sva <- all_pairwise(tc_neutrophils_cf, parallel = parallel,
filter = TRUE, model_batch = "svaseq",
methods = methods)##
## cure failure
## 38 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_neutrophils_cf_table_sva <- combine_de_tables(
tc_neutrophils_cf_de_sva, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Neutrophils/tc_neutrophils_cf_table_sva-v{ver}.xlsx"))
tc_neutrophils_cf_sig_sva <- extract_significant_genes(
tc_neutrophils_cf_table_sva,
excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_sva-v{ver}.xlsx"))
tc_neutrophils_cf_de_batch <- all_pairwise(tc_neutrophils_cf, filter = TRUE,
model_batch = TRUE,
methods = methods)##
## cure failure
## 38 24
##
## v3 v2 v1
## 19 18 25
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_neutrophils_cf_table_batch <- combine_de_tables(
tc_neutrophils_cf_de_batch, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_table_batch-v{ver}.xlsx"))
tc_neutrophils_cf_sig_batch <- extract_significant_genes(
tc_neutrophils_cf_table_batch,
excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_batch-v{ver}.xlsx"))Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.
v1_vs_later <- all_pairwise(tc_v1vs, model_batch = "svaseq", methods = methods,
filter = TRUE)##
## first later
## 65 101
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
v1_vs_later_table <- combine_de_tables(
v1_vs_later, keepers = visit_v1later,
excel = glue("{visit_prefix}/v1_vs_later_tables-v{ver}.xlsx"))
v1_vs_later_sig <- extract_significant_genes(
v1_vs_later_table,
excel = glue("{visit_prefix}/v1_vs_later_sig-v{ver}.xlsx"))v1later_gp <- all_gprofiler(v1_vs_later_sig)
v1later_gp[[1]]$pvalue_plots$REACv1later_gp[[2]]$pvalue_plots$REACtc_sex_de <- all_pairwise(tc_sex, model_batch = "svaseq", methods = methods,
filter = TRUE)##
## female male
## 28 156
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_sex_table <- combine_de_tables(
tc_sex_de, excel = glue("{sex_prefix}/tc_sex_table-v{ver}.xlsx"))
tc_sex_sig <- extract_significant_genes(
tc_sex_table, excel = glue("{sex_prefix}/tc_sex_sig-v{ver}.xlsx"))
tc_sex_gp <- all_gprofiler(tc_sex_sig)tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")## subset_expt(): There were 184, now there are 122 samples.
tc_sex_cure_de <- all_pairwise(tc_sex_cure, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## female male
## 19 103
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_sex_cure_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## basic_vs_deseq 0.6829
## basic_vs_dream 0.9094
## basic_vs_ebseq 0.5372
## basic_vs_edger 0.7974
## basic_vs_limma 0.9066
## basic_vs_noiseq 0.8141
## deseq_vs_dream 0.6847
## deseq_vs_ebseq 0.6539
## deseq_vs_edger 0.8760
## deseq_vs_limma 0.6556
## deseq_vs_noiseq 0.7614
## dream_vs_ebseq 0.6272
## dream_vs_edger 0.8040
## dream_vs_limma 0.9643
## dream_vs_noiseq 0.7544
## ebseq_vs_edger 0.6871
## ebseq_vs_limma 0.5873
## ebseq_vs_noiseq 0.6317
## edger_vs_limma 0.7733
## edger_vs_noiseq 0.8654
## limma_vs_noiseq 0.7144
tc_sex_cure_table <- combine_de_tables(
tc_sex_cure_de, excel = glue("{sex_prefix}/tc_sex_cure_table-v{ver}.xlsx"))
tc_sex_cure_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 68 74 62 80
## limma_sigup limma_sigdown
## 1 37 73
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tc_sex_cure_sig <- extract_significant_genes(
tc_sex_cure_table, excel = glue("{sex_prefix}/tc_sex_cure_sig-v{ver}.xlsx"))
tc_sex_cure_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 37 73 62 80 68 74
## ebseq_up ebseq_down basic_up basic_down
## male_vs_female 9 8 11 0
tc_sex_cure_gp <- all_gprofiler(tc_sex_cure_sig)
tc_sex_cure_gp## Running gProfiler on every set of significant genes found:
## BP CC CORUM HP HPA KEGG MIRNA MF REAC TF WP
## male_vs_female_up 3 2 0 1 0 1 0 1 0 4 1
## male_vs_female_down 3 0 0 0 0 0 0 0 1 0 0
tc_sex_cure_gp[[1]][["pvalue_plots"]][["BP"]]## NULL
tc_sex_cure_gp[[2]][["pvalue_plots"]][["BP"]]## NULL
tc_ethnicity_de <- all_pairwise(tc_etnia_expt, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## afrocol indigena mestiza
## 91 46 47
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tc_ethnicity_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
tc_ethnicity_table <- combine_de_tables(
tc_ethnicity_de, keepers = ethnicity_contrasts,
excel = glue("{eth_prefix}/tc_ethnicity_table-v{ver}.xlsx"))
tc_ethnicity_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 47 22 54 23
## 2 mestiza_vs_afrocol 53 165 53 180
## 3 indigena_vs_afrocol 66 269 71 279
## limma_sigup limma_sigdown
## 1 23 14
## 2 41 90
## 3 75 143
## Plot describing unique/shared genes in a differential expression table.
tc_ethnicity_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]tc_ethnicity_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]tc_ethnicity_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]tc_ethnicity_sig <- extract_significant_genes(
tc_ethnicity_table, excel = glue("{eth_prefix}/tc_ethnicity_sig-v{ver}.xlsx"))
ethnicity_cure <- subset_expt(tc_etnia_expt, subset = "finaloutcome=='cure'")## subset_expt(): There were 184, now there are 122 samples.
ethnicity_cure_de <- all_pairwise(ethnicity_cure, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## afrocol indigena mestiza
## 39 36 47
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
ethnicity_cure_table <- combine_de_tables(
ethnicity_cure_de, keepers = ethnicity_contrasts,
excel = glue("{eth_prefix}/ethnicity_cure_table-v{ver}.xlsx"))
ethnicity_cure_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 63 24 59 26
## 2 mestiza_vs_afrocol 66 167 76 165
## 3 indigena_vs_afrocol 86 350 94 340
## limma_sigup limma_sigdown
## 1 36 16
## 2 61 101
## 3 108 177
## Plot describing unique/shared genes in a differential expression table.
ethnicity_cure_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]ethnicity_cure_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]ethnicity_cure_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]ethnicity_cure_sig <- extract_significant_genes(
ethnicity_cure_table, excel = glue("{eth_prefix}/ethnicity_cure_sig-v{ver}.xlsx"))
ethnicity_cure_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## mestizo_indigenous 36 16 59 26 63 24
## mestizo_afrocol 61 101 76 165 66 167
## indigenous_afrocol 108 177 94 340 86 350
## ebseq_up ebseq_down basic_up basic_down
## mestizo_indigenous 10 0 6 0
## mestizo_afrocol 5 16 44 0
## indigenous_afrocol 9 38 433 0
Performed once with both clinics and again with only Tumaco.
tc_ethnicity_gp <- all_gprofiler(tc_ethnicity_sig)pander::pander(sessionInfo())R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
locale: C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: edgeR(v.4.0.16), ruv(v.0.9.7.1), DOSE(v.3.28.2), forcats(v.1.0.0), dplyr(v.1.1.4), hpgltools(v.1.0), Matrix(v.1.6-5), glue(v.1.7.0), SummarizedExperiment(v.1.32.0), GenomicRanges(v.1.54.1), GenomeInfoDb(v.1.38.6), IRanges(v.2.36.0), S4Vectors(v.0.40.2), MatrixGenerics(v.1.14.0), matrixStats(v.1.2.0), Biobase(v.2.62.0) and BiocGenerics(v.0.48.1)
loaded via a namespace (and not attached): fs(v.1.6.3), bitops(v.1.0-7), enrichplot(v.1.22.0), blockmodeling(v.1.1.5), HDO.db(v.0.99.1), httr(v.1.4.7), RColorBrewer(v.1.1-3), numDeriv(v.2016.8-1.1), tools(v.4.4.1), backports(v.1.4.1), utf8(v.1.2.4), R6(v.2.5.1), lazyeval(v.0.2.2), mgcv(v.1.9-1), withr(v.3.0.0), gridExtra(v.2.3), preprocessCore(v.1.64.0), fdrtool(v.1.2.18), cli(v.3.6.2), scatterpie(v.0.2.1), labeling(v.0.4.3), slam(v.0.1-53), EBSeq(v.2.0.0), sass(v.0.4.8), mvtnorm(v.1.2-4), robustbase(v.0.99-4), genefilter(v.1.84.0), yulab.utils(v.0.1.7), ggupset(v.0.4.0), gson(v.0.1.0), R.utils(v.2.12.3), limma(v.3.58.1), RSQLite(v.2.3.5), gridGraphics(v.0.5-1), generics(v.0.1.3), gtools(v.3.9.5), crosstalk(v.1.2.1), zip(v.2.3.1), GO.db(v.3.18.0), fansi(v.1.0.6), abind(v.1.4-5), R.methodsS3(v.1.8.2), lifecycle(v.1.0.4), yaml(v.2.3.8), gplots(v.3.1.3.1), qvalue(v.2.34.0), SparseArray(v.1.2.4), grid(v.4.4.1), blob(v.1.2.4), promises(v.1.2.1), crayon(v.1.5.2), lattice(v.0.22-5), cowplot(v.1.1.3), annotate(v.1.80.0), KEGGREST(v.1.42.0), pillar(v.1.9.0), knitr(v.1.45), varhandle(v.2.0.6), fgsea(v.1.28.0), boot(v.1.3-29), corpcor(v.1.6.10), codetools(v.0.2-19), fastmatch(v.1.1-4), ggfun(v.0.1.8), data.table(v.1.15.0), Vennerable(v.3.1.0.9000), treeio(v.1.29.1), vctrs(v.0.6.5), png(v.0.1-8), Rdpack(v.2.6), testthat(v.3.2.1), gtable(v.0.3.4), cachem(v.1.0.8), xfun(v.0.42), openxlsx(v.4.2.5.2), rbibutils(v.2.2.16), S4Arrays(v.1.2.0), mime(v.0.12), RcppEigen(v.0.3.3.9.4), tidygraph(v.1.3.1), survival(v.3.5-8), iterators(v.1.0.14), NOISeq(v.2.46.0), statmod(v.1.5.0), ellipsis(v.0.3.2), nlme(v.3.1-164), pbkrtest(v.0.5.2), ggtree(v.3.15.0), bit64(v.4.0.5), EnvStats(v.2.8.1), UpSetR(v.1.4.0), rprojroot(v.2.0.4), bslib(v.0.6.1), KernSmooth(v.2.23-22), colorspace(v.2.1-0), DBI(v.1.2.2), DESeq2(v.1.42.0), tidyselect(v.1.2.0), bit(v.4.0.5), compiler(v.4.4.1), graph(v.1.80.0), desc(v.1.4.3), DelayedArray(v.0.28.0), plotly(v.4.10.4), shadowtext(v.0.1.3), scales(v.1.3.0), caTools(v.1.18.2), DEoptimR(v.1.1-3), remaCor(v.0.0.18), RBGL(v.1.78.0), stringr(v.1.5.1), digest(v.0.6.34), minqa(v.1.2.6), variancePartition(v.1.32.5), rmarkdown(v.2.25), aod(v.1.3.3), XVector(v.0.42.0), RhpcBLASctl(v.0.23-42), htmltools(v.0.5.7), pkgconfig(v.2.0.3), lme4(v.1.1-35.1), lpsymphony(v.1.30.0), highr(v.0.10), fastmap(v.1.1.1), rlang(v.1.1.3), htmlwidgets(v.1.6.4), shiny(v.1.8.0), farver(v.2.1.1), jquerylib(v.0.1.4), IHW(v.1.30.0), jsonlite(v.1.8.8), BiocParallel(v.1.36.0), GOSemSim(v.2.28.1), R.oo(v.1.26.0), RCurl(v.1.98-1.14), magrittr(v.2.0.3), GenomeInfoDbData(v.1.2.11), ggplotify(v.0.1.2), patchwork(v.1.2.0), munsell(v.0.5.0), Rcpp(v.1.0.12), ape(v.5.8), viridis(v.0.6.5), stringi(v.1.8.3), ggraph(v.2.1.0), brio(v.1.1.4), zlibbioc(v.1.48.0), MASS(v.7.3-60.0.1), plyr(v.1.8.9), parallel(v.4.4.1), ggrepel(v.0.9.5), Biostrings(v.2.70.2), graphlayouts(v.1.1.0), splines(v.4.4.1), pander(v.0.6.5), locfit(v.1.5-9.8), igraph(v.2.0.2), reshape2(v.1.4.4), pkgload(v.1.3.4), gprofiler2(v.0.2.3), XML(v.3.99-0.16.1), evaluate(v.0.23), BiocManager(v.1.30.25), nloptr(v.2.0.3), foreach(v.1.5.2), tweenr(v.2.0.2), httpuv(v.1.6.14), tidyr(v.1.3.1), purrr(v.1.0.2), polyclip(v.1.10-6), ggplot2(v.3.5.0), ggforce(v.0.4.2), broom(v.1.0.5), xtable(v.1.8-4), tidytree(v.0.4.6), fANCOVA(v.0.6-1), later(v.1.3.2), viridisLite(v.0.4.2), tibble(v.3.2.1), lmerTest(v.3.1-3), clusterProfiler(v.4.10.1), aplot(v.0.2.2), memoise(v.2.0.1), AnnotationDbi(v.1.64.1), sva(v.3.50.0) and GSEABase(v.1.64.0)
message("This is hpgltools commit: ", get_git_commit())## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 4664b0b9440028c89307a154494579648c639ae8the
## This is hpgltools commit: Fri Mar 7 10:26:11 2025 -0500: 4664b0b9440028c89307a154494579648c639ae8Fri Mar 7 10:26:11 2025 -0500: the
message("Saving to ", savefile)## Saving to 03differential_expression_both.rda.xz
# tmp <- sm(saveme(filename = savefile))tmp <- loadme(filename = savefile)