The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document.
I am going to try to standardize how I name the various data structures created in this document. Most of the large data created are either sets of differential expression analyses, their combined results, or the set of results deemed ‘significant’.
Hopefully by now they all follow these guidelines:
{clinic(s)}sample-subset}{primary-question(s)}{datatype}{batch-method}
With this in mind, ‘tc_biopsies_clinic_de_sva’ should be the Tumaco+Cali biopsy data after performing the differential expression analyses comparing the clinics using sva.
I suspect there remain some exceptions and/or errors.
Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus tc_cf_contrasts is expected to be used for the Tumaco+Cali data and provide a series of cure/fail comparisons which (to the extent possible) across both locations. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.
The GSEA analyses will follow each DE analysis during this document.
Most (all?) of the GSEA analyses used in this paper were done via gProfiler rather than goseq/clusterProfiler/topGO/GOstats. Primarily because it is so easy to invoke gprofiler.
<- list(
clinic_contrasts "clinics" = c("cali", "tumaco"))
## In some cases we have no Cali failure samples, so there remain only 2
## contrasts that are likely of interest
<- list(
tc_cf_contrasts "tumaco" = c("tumacofailure", "tumacocure"),
"cure" = c("tumacocure", "calicure"))
## In other cases, we have cure/fail for both places.
<- list(
clinic_cf_contrasts "cali" = c("califailure", "calicure"),
"tumaco" = c("tumacofailure", "tumacocure"),
"cure" = c("tumacocure", "calicure"),
"fail" = c("tumacofailure", "califailure"))
<- list(
cf_contrast "outcome" = c("tumacofailure", "tumacocure"))
<- list(
t_cf_contrast "outcome" = c("failure", "cure"))
<- list(
visitcf_contrasts "v1cf" = c("v1failure", "v1cure"),
"v2cf" = c("v2failure", "v2cure"),
"v3cf" = c("v3failure", "v3cure"))
<- list(
visit_contrasts "v2v1" = c("c2", "c1"),
"v3v1" = c("c3", "c1"),
"v3v2" = c("c3", "c2"))
<- list(
visit_v1later "later_vs_first" = c("later", "first"))
<- list(
celltypes "eo_mono" = c("eosinophils", "monocytes"),
"ne_mono" = c("neutrophils", "monocytes"),
"eo_ne" = c("eosinophils", "neutrophils"))
<- list(
ethnicity_contrasts "mestizo_indigenous" = c("mestiza", "indigena"),
"mestizo_afrocol" = c("mestiza", "afrocol"),
"indigenous_afrocol" = c("indigena", "afrocol"))
Perform a svaseq-guided comparison of the two clinics. Ideally this will give some clue about just how strong the clinic-based batch effect really is and what its causes are.
<- tc_valid %>%
tc_clinic_type set_expt_conditions(fact = "clinic") %>%
set_expt_batches(fact = "typeofcells")
## The numbers of samples by condition are:
##
## cali tumaco
## 61 123
## The number of samples by batch are:
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
table(pData(tc_clinic_type)[["condition"]])
##
## cali tumaco
## 61 123
<- all_pairwise(tc_clinic_type, model_batch = "svaseq",
tc_all_clinic_de_sva filter = TRUE, parallel = parallel, methods = methods)
##
## cali tumaco
## 61 123
## Removing 0 low-count genes (14298 remaining).
## Setting 31394 low elements to zero.
## transform_counts: Found 31394 values equal to 0, adding 1 to the matrix.
tc_all_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
## tumc_vs_cl
## limma_vs_deseq 0.7998
## limma_vs_edger 0.8621
## limma_vs_basic 0.9693
## limma_vs_noiseq -0.8362
## deseq_vs_edger 0.9377
## deseq_vs_basic 0.8086
## deseq_vs_noiseq -0.7112
## edger_vs_basic 0.8735
## edger_vs_noiseq -0.7662
## basic_vs_noiseq -0.8350
"deseq"]][["contrasts_performed"]] tc_all_clinic_de_sva[[
## [1] "tumaco_vs_cali"
<- combine_de_tables(
tc_all_clinic_table_sva keepers = clinic_contrasts,
tc_all_clinic_de_sva, excel = glue("{clinic_prefix}/tc_all_clinic_table_sva-v{ver}.xlsx"))
tc_all_clinic_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 tumaco_vs_cali-inverted 273 1799 323 1667
## limma_sigup limma_sigdown
## 1 392 606
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_all_clinic_sig_sva
tc_all_clinic_table_sva,excel = glue("{clinic_prefix}/compare_clinics/tc_clinic_type_sig_sva-v{ver}.xlsx"))
tc_all_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## clinics 392 606 323 1667 273 1799 419
## basic_down
## clinics 489
<- simple_gprofiler(
increased_tumaco_categories_up "deseq"]][["ups"]][["clinics"]],
tc_all_clinic_sig_sva[[excel = glue("{gsea_prefix}/tumaco_cateogies_up-v{ver}.xlsx"))
increased_tumaco_categories_up
## A set of ontologies produced by gprofiler using 273
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 17 MF
## 12 BP
## 1 KEGG
## 1 REAC
## 0 WP
## 100 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
"pvalue_plots"]][["BP"]] increased_tumaco_categories_up[[
## NULL
<- simple_gprofiler(
increased_cali_categories "deseq"]][["downs"]][["clinics"]],
tc_all_clinic_sig_sva[[excel = glue("{gsea_prefix}/cali_cateogies_up-v{ver}.xlsx"))
increased_cali_categories
## A set of ontologies produced by gprofiler using 1799
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 59 MF
## 686 BP
## 2 KEGG
## 20 REAC
## 7 WP
## 333 TF
## 2 MIRNA
## 16 HPA
## 0 CORUM
## 14 HP hits.
"pvalue_plots"]][["BP"]] increased_cali_categories[[
## NULL
Let us take a quick look at the results of the comparison of Tumaco/Cali
Note: I keep re-introducing an error which causes these (volcano and MA) plots to be reversed with respect to the logFC values. Pay careful attention to these and make sure that they agree with the numbers of genes observed in the contrast.
## Check that up is up
summary(tc_all_clinic_table_sva[["data"]][["clinics"]][["deseq_logfc"]])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -20.280 -0.584 -0.155 -0.255 0.172 3.514
## I think we can assume that most genes are down when considering Tumaco/Cali.
sum(tc_all_clinic_table_sva$data$clinics$deseq_logfc < -1.0 &
$data$clinics$deseq_adjp < 0.05) tc_all_clinic_table_sva
## [1] 1795
"plots"]][["clinics"]][["deseq_vol_plots"]] tc_all_clinic_table_sva[[
## Ok, so it says 1794 up, but that is clearly the down side... Something is definitely messed up.
## The points are on the correct sides of the plot, but the categories of up/down are reversed.
## Theresa noted that she colors differently, and I think better: left side gets called
## 'increased in denominator', right side gets called 'increased in numerator';
## these two groups are colored according to their condition colors, and everything else is gray.
## I am checking out Theresa's helper_functions.R to get a sense of how she handles this, I think
## I can use a variant of her idea pretty easily:
## 1. Add a column 'Significance', which is a factor, and contains either 'Not enriched',
## 'Enriched in x', or 'Enriched in y' according to the logfc/adjp.
## 2. use the significance column for the geom_point color/fill in the volcano plot.
## My change to this idea would be to extract the colors from the input expressionset.
There appear to be many more genes which are increased in the Tumaco samples with respect to the Cali samples.
The remaining cell types all have pretty strong clinic-based variance; but I am not certain if it is consistent across cell types.
table(pData(tc_eosinophils)[["condition"]])
##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
<- all_pairwise(tc_eosinophils, parallel = parallel,
tc_eosinophils_clinic_de_nobatch model_batch = FALSE, filter = TRUE,
methods = methods)
##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
tc_eosinophils_clinic_de_nobatch
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 10 comparisons.
"deseq"]][["contrasts_performed"]] tc_eosinophils_clinic_de_nobatch[[
## [1] "tumacofailure_vs_tumacocure" "tumacofailure_vs_calicure"
## [3] "tumacocure_vs_calicure"
<- combine_de_tables(
tc_eosinophils_clinic_table_nobatch keepers = tc_cf_contrasts,
tc_eosinophils_clinic_de_nobatch, excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_table_nobatch
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumacofailure_vs_tumacocure 102 35 114
## 2 tumacocure_vs_calicure 834 814 856
## edger_sigdown limma_sigup limma_sigdown
## 1 32 62 17
## 2 817 712 705
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_eosinophils_clinic_sig_nobatch
tc_eosinophils_clinic_table_nobatch,excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## tumaco 62 17 114 32 102 35 0
## cure 712 705 856 817 834 814 731
## basic_down
## tumaco 0
## cure 685
<- all_pairwise(tc_eosinophils, model_batch = "svaseq",
tc_eosinophils_clinic_de_sva parallel = parallel, filter = TRUE, methods = methods)
##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
## Removing 0 low-count genes (10867 remaining).
## Setting 1048 low elements to zero.
## transform_counts: Found 1048 values equal to 0, adding 1 to the matrix.
tc_eosinophils_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
"deseq"]][["contrasts_performed"]] tc_eosinophils_clinic_de_sva[[
## [1] "tumacofailure_vs_tumacocure" "tumacofailure_vs_calicure"
## [3] "tumacocure_vs_calicure"
<- combine_de_tables(
tc_eosinophils_clinic_table_sva keepers = tc_cf_contrasts,
tc_eosinophils_clinic_de_sva, excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumacofailure_vs_tumacocure 89 57 90
## 2 tumacocure_vs_calicure 777 808 781
## edger_sigdown limma_sigup limma_sigdown
## 1 41 77 30
## 2 806 723 710
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_eosinophils_clinic_sig_sva
tc_eosinophils_clinic_table_sva,excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## tumaco 77 30 90 41 89 57 0
## cure 723 710 781 806 777 808 731
## basic_down
## tumaco 0
## cure 685
Interestingly to me, the biopsy samples appear to have the least location-based variance. But we can perform an explicit DE and see how well that hypothesis holds up.
Note that these data include cure and fail samples for
table(pData(tc_biopsies)[["condition"]])
##
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
<- all_pairwise(tc_biopsies, parallel = parallel,
tc_biopsies_clinic_de_sva model_batch = "svaseq", filter = TRUE,
methods = methods)
##
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
## Removing 0 low-count genes (13615 remaining).
## Setting 289 low elements to zero.
## transform_counts: Found 289 values equal to 0, adding 1 to the matrix.
tc_biopsies_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
"deseq"]][["contrasts_performed"]] tc_biopsies_clinic_de_sva[[
## [1] "tumacofailure_vs_tumacocure" "tumacofailure_vs_calicure"
## [3] "tumacocure_vs_calicure"
<- combine_de_tables(
tc_biopsies_clinic_table_sva keepers = tc_cf_contrasts,
tc_biopsies_clinic_de_sva, excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_table_sva-v{ver}.xlsx"))
tc_biopsies_clinic_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumacofailure_vs_tumacocure 14 11 18
## 2 tumacocure_vs_calicure 1 0 0
## edger_sigdown limma_sigup limma_sigdown
## 1 6 0 0
## 2 0 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_biopsies_clinic_sig_sva
tc_biopsies_clinic_table_sva,excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_sig_sva-v{ver}.xlsx"))
tc_biopsies_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## tumaco 0 0 18 6 14 11 0
## cure 0 0 0 0 1 0 0
## basic_down
## tumaco 0
## cure 0
At least for the moment, I am only looking at the differences between no-batch vs. sva across clinics for the monocyte samples. This was chosen mostly arbitrarily.
Our baseline is the comparison of the monocytes samples without batch in the model or surrogate estimation. In theory at least, this should correspond to the PCA plot above when no batch estimation was performed.
table(pData(tc_monocytes)[["condition"]])
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
<- all_pairwise(tc_monocytes, model_batch = FALSE,
tc_monocytes_de_nobatch parallel = parallel, filter = TRUE,
methods = methods)
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
tc_monocytes_de_nobatch
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 10 comparisons.
<- combine_de_tables(
tc_monocytes_table_nobatch keepers = clinic_cf_contrasts,
tc_monocytes_de_nobatch, excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_nobatch-v{ver}.xlsx"))
tc_monocytes_table_nobatch
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 califailure_vs_calicure 16 20 32
## 2 tumacofailure_vs_tumacocure 48 121 60
## 3 tumacocure_vs_calicure 786 729 778
## 4 tumacofailure_vs_califailure 638 492 518
## edger_sigdown limma_sigup limma_sigdown
## 1 13 38 5
## 2 139 24 37
## 3 784 646 716
## 4 540 395 570
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_monocytes_sig_nobatch
tc_monocytes_table_nobatch,excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_nobatch-v{ver}.xlsx"))
tc_monocytes_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## cali 38 5 32 13 16 20 90
## tumaco 24 37 60 139 48 121 11
## cure 646 716 778 784 786 729 648
## fail 395 570 518 540 638 492 444
## basic_down
## cali 53
## tumaco 19
## cure 706
## fail 529
In contrast, the following comparison should give a view of the data corresponding to the svaseq PCA plot above. In the best case scenario, we should therefore be able to see some significane differences between the Tumaco cure and fail samples.
<- all_pairwise(tc_monocytes, model_batch = "svaseq",
tc_monocytes_de_sva parallel = parallel, filter = TRUE,
methods = methods)
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
## Removing 0 low-count genes (11108 remaining).
## Setting 1455 low elements to zero.
## transform_counts: Found 1455 values equal to 0, adding 1 to the matrix.
tc_monocytes_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
<- combine_de_tables(
tc_monocytes_table_sva keepers = clinic_cf_contrasts,
tc_monocytes_de_sva, excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_sva-v{ver}.xlsx"))
tc_monocytes_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 califailure_vs_calicure 28 36 40
## 2 tumacofailure_vs_tumacocure 34 86 29
## 3 tumacocure_vs_calicure 761 732 713
## 4 tumacofailure_vs_califailure 684 584 583
## edger_sigdown limma_sigup limma_sigdown
## 1 17 52 7
## 2 70 14 57
## 3 762 640 663
## 4 623 434 567
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_monocytes_sig_sva
tc_monocytes_table_sva,excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_sva-v{ver}.xlsx"))
tc_monocytes_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## cali 52 7 40 17 28 36 90
## tumaco 14 57 29 70 34 86 11
## cure 640 663 713 762 761 732 648
## fail 434 567 583 623 684 584 444
## basic_down
## cali 53
## tumaco 19
## cure 706
## fail 529
The following block shows that these two results are exceedingly different, sugesting that the Cali cure/fail and Tumaco cure/fail cannot easily be considered in the same analysis. I did some playing around with my calculate_aucc function in this block and found that it is in some important way broken, at least if one expands the top-n genes to more than 20% of the number of genes in the data.
<- tc_monocytes_table_nobatch[["data"]][["cali"]]
cali_table <- tc_monocytes_table_nobatch[["data"]][["tumaco"]]
table
<- calculate_aucc(cali_table, table, px = "deseq_adjp", py = "deseq_adjp",
cali_aucc lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc
## These two tables have an aucc value of: 0.0659267365585479 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 1.2, df = 11106, p-value = 0.2
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.006843 0.030345
## sample estimates:
## cor
## 0.01175
<- tc_monocytes_table_sva[["data"]][["cali"]]
cali_table_sva <- tc_monocytes_table_sva[["data"]][["tumaco"]]
tumaco_table_sva <- calculate_aucc(cali_table_sva, tumaco_table_sva, px = "deseq_adjp",
cali_aucc_sva py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_sva
## These two tables have an aucc value of: 0.0842668799254026 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 16, df = 11106, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1356 0.1719
## sample estimates:
## cor
## 0.1538
<- all_pairwise(tc_neutrophils, parallel = parallel,
tc_neutrophils_de_nobatch model_batch = FALSE, filter = TRUE,
methods = methods)
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
tc_neutrophils_de_nobatch
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 10 comparisons.
<- combine_de_tables(
tc_neutrophils_table_nobatch keepers = clinic_cf_contrasts,
tc_neutrophils_de_nobatch, excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_nobatch-v{ver}.xlsx"))
tc_neutrophils_table_nobatch
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 califailure_vs_calicure 33 83 42
## 2 tumacofailure_vs_tumacocure 95 50 112
## 3 tumacocure_vs_calicure 910 337 934
## 4 tumacofailure_vs_califailure 984 257 808
## edger_sigdown limma_sigup limma_sigdown
## 1 32 37 10
## 2 57 7 12
## 3 342 630 520
## 4 283 380 462
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_neutrophils_sig_nobatch
tc_neutrophils_table_nobatch,excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_nobatch-v{ver}.xlsx"))
tc_neutrophils_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## cali 37 10 42 32 33 83 79
## tumaco 7 12 112 57 95 50 4
## cure 630 520 934 342 910 337 621
## fail 380 462 808 283 984 257 507
## basic_down
## cali 87
## tumaco 2
## cure 503
## fail 417
<- all_pairwise(tc_neutrophils, parallel = parallel,
tc_neutrophils_de_sva model_batch = "svaseq", filter = TRUE,
methods = methods)
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
## Removing 0 low-count genes (9244 remaining).
## Setting 1539 low elements to zero.
## transform_counts: Found 1539 values equal to 0, adding 1 to the matrix.
tc_neutrophils_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
<- combine_de_tables(
tc_neutrophils_table_sva keepers = clinic_cf_contrasts,
tc_neutrophils_de_sva, excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_sva-v{ver}.xlsx"))
tc_neutrophils_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 califailure_vs_calicure 91 181 103
## 2 tumacofailure_vs_tumacocure 86 38 72
## 3 tumacocure_vs_calicure 844 379 843
## 4 tumacofailure_vs_califailure 696 197 608
## edger_sigdown limma_sigup limma_sigdown
## 1 122 75 50
## 2 23 37 47
## 3 367 645 481
## 4 214 310 325
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_neutrophils_sig_sva
tc_neutrophils_table_sva,excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_sva-v{ver}.xlsx"))
tc_neutrophils_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up
## cali 75 50 103 122 91 181 79
## tumaco 37 47 72 23 86 38 4
## cure 645 481 843 367 844 379 621
## fail 310 325 608 214 696 197 507
## basic_down
## cali 87
## tumaco 2
## cure 503
## fail 417
Given the above comparisons, we can extract some gene sets which resulted from those DE analyses and eventually perform some ontology/KEGG/reactome/etc searches. This reminds me, I want to make my extract_significant_ functions to return gene-set data structures and my various ontology searches to take them as inputs. This should help avoid potential errors when extracting up/down genes.
<- rownames(tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]])
clinic_sigenes_up <- rownames(tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]])
clinic_sigenes_down <- c(clinic_sigenes_up, clinic_sigenes_down)
clinic_sigenes
<- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_eosinophils_sigenes_up <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_eosinophils_sigenes_down <- rownames(tc_monocytes_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_monocytes_sigenes_up <- rownames(tc_monocytes_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_monocytes_sigenes_down <- rownames(tc_neutrophils_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_neutrophils_sigenes_up <- rownames(tc_neutrophils_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_neutrophils_sigenes_down
<- c(tc_eosinophils_sigenes_up,
tc_eosinophils_sigenes
tc_eosinophils_sigenes_down)<- c(tc_monocytes_sigenes_up,
tc_monocytes_sigenes
tc_monocytes_sigenes_down)<- c(tc_neutrophils_sigenes_up,
tc_neutrophils_sigenes tc_neutrophils_sigenes_down)
I was curious to try to understand why the two clinics appear to be so different vis a vis their PCA/DE; so I thought that gProfiler might help boil those results down to something more digestible.
Note that in the following block I used the function simple_gprofiler(), but later in this document I will use all_gprofiler(). The first invocation limits the search to a single table, while the second will iterate over every result in a pairwise differential expression analysis.
In this instance, we are looking at the vector of gene IDs deemed significantly different between the two clinics in either the up or down direction.
One other thing worth noting, the new version of gProfiler provides some fun interactive plots. I will add an example here.
<- simple_gprofiler(
tc_eosinophil_gprofiler
tc_eosinophils_sigenes_up,excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophil_gprofiler
## A set of ontologies produced by gprofiler using 777
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 20 MF
## 218 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 540 TF
## 6 MIRNA
## 0 HPA
## 2 CORUM
## 0 HP hits.
<- simple_gprofiler(
clinic_gp
clinic_sigenes,excel = glue("{gsea_prefix}/both_clinics_cali_up-v{ver}.xlsx"))
$pvalue_plots$REAC clinic_gp
$pvalue_plots$BP clinic_gp
## NULL
$pvalue_plots$TF clinic_gp
$interactive_plots$GO clinic_gp
## NULL
In the following block, I am looking at the gProfiler over represented groups observed across clinics in only the Eosinophils. First I do so for all genes(up or down), followed by only the up and down groups. Each of the following will include only the Reactome and GO:BP plots. These searches did not have too many other hits, excepting the transcription factor database.
<- simple_gprofiler(
tc_eosinophils_gp
tc_eosinophils_sigenes,excel = glue("{gsea_prefix}/eosinophil_clinics-v{ver}.xlsx"))
tc_eosinophils_gp
## A set of ontologies produced by gprofiler using 1585
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 39 MF
## 276 BP
## 0 KEGG
## 5 REAC
## 0 WP
## 563 TF
## 10 MIRNA
## 0 HPA
## 5 CORUM
## 0 HP hits.
$pvalue_plots$REAC tc_eosinophils_gp
$pvalue_plots$BP tc_eosinophils_gp
## NULL
<- simple_gprofiler(
tc_eosinophils_up_gp
tc_eosinophils_sigenes_up,excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophils_up_gp
## A set of ontologies produced by gprofiler using 777
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 20 MF
## 218 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 540 TF
## 6 MIRNA
## 0 HPA
## 2 CORUM
## 0 HP hits.
$pvalue_plots$REAC tc_eosinophils_up_gp
<- simple_gprofiler(
tc_eosinophils_down_gp
tc_eosinophils_sigenes_down,excel = glue("{gsea_prefix}/eosinophil_clinics_cali_up-v{ver}.xlsx"))
tc_eosinophils_down_gp
## A set of ontologies produced by gprofiler using 808
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 14 MF
## 94 BP
## 2 KEGG
## 9 REAC
## 2 WP
## 77 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
$pvalue_plots$REAC tc_eosinophils_down_gp
In the following block I repeated the above query, but this time looking at the monocyte samples.
<- simple_gprofiler(
tc_monocytes_up_gp
tc_monocytes_sigenes,excel = glue("{gsea_prefix}/monocyte_clinics-v{ver}.xlsx"))
tc_monocytes_up_gp
## A set of ontologies produced by gprofiler using 1493
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 55 MF
## 476 BP
## 0 KEGG
## 6 REAC
## 4 WP
## 495 TF
## 2 MIRNA
## 0 HPA
## 1 CORUM
## 0 HP hits.
$pvalue_plots$REAC tc_monocytes_up_gp
$pvalue_plots$BP tc_monocytes_up_gp
## NULL
<- simple_gprofiler(
tc_monocytes_down_gp
tc_monocytes_sigenes_down,excel = glue("{gsea_prefix}/monocyte_clinics_cali_up-v{ver}.xlsx"))
$pvalue_plots$REAC tc_monocytes_down_gp
$pvalue_plots$BP tc_monocytes_down_gp
## NULL
Ibid. This time looking at the Neutrophils. Thus the first two images should be a superset of the second and third pairs of images; assuming that the genes in the up/down list do not cause the groups to no longer be significant. Interestingly, the reactome search did not return any hits for the increased search.
<- simple_gprofiler(
tc_neutrophils_gp
tc_neutrophils_sigenes,excel = glue("{gsea_prefix}/neutrophil_clinics-v{ver}.xlsx"))
## tc_neutrophils_gp$pvalue_plots$REAC ## no hits
$pvalue_plots$BP tc_neutrophils_gp
## NULL
$pvalue_plots$TF tc_neutrophils_gp
<- simple_gprofiler(
tc_neutrophils_up_gp
tc_neutrophils_sigenes_up,excel = glue("{gsea_prefix}/neutrophil_clinics_tumaco_up-v{ver}.xlsx"))
## tc_neutrophils_up_gp$pvalue_plots$REAC ## No hits
$pvalue_plots$BP tc_neutrophils_up_gp
## NULL
<- simple_gprofiler(
tc_neutrophils_down_gp
tc_neutrophils_sigenes_down,excel = glue("{gsea_prefix}/neutrophil_clinics_cali_up-v{ver}.xlsx"))
$pvalue_plots$REAC tc_neutrophils_down_gp
$pvalue_plots$BP tc_neutrophils_down_gp
## NULL
The following expands the cross-clinic query above to also test the neutrophils. Once again, I think it will pretty strongly support the hypothesis that the two clinics are not compatible.
We are concerned that the clinic-based batch effect may make our results essentially useless. One way to test this concern is to compare the set of genes observed different between the Cali Cure/Fail vs. the Tumaco Cure/Fail.
<- tc_neutrophils_table_nobatch[["data"]][["cali"]]
cali_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["tumaco"]]
tumaco_table_nobatch
<- merge(cali_table_nobatch, tumaco_table_nobatch, by="row.names")
cali_merged_nobatch cor.test(cali_merged_nobatch[, "deseq_logfc.x"], cali_merged_nobatch[, "deseq_logfc.y"])
##
## Pearson's product-moment correlation
##
## data: cali_merged_nobatch[, "deseq_logfc.x"] and cali_merged_nobatch[, "deseq_logfc.y"]
## t = -16, df = 9242, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1798 -0.1401
## sample estimates:
## cor
## -0.16
<- calculate_aucc(cali_table_nobatch, tumaco_table_nobatch, px = "deseq_adjp",
cali_aucc_nobatch py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
$plot cali_aucc_nobatch
In all of the above, we are looking to understand the differences between the two location. Let us now step back and perform the original question: fail/cure without regard to location.
I performed this query with a few different parameters, notably with(out) sva and again using each cell type, including biopsies. The main reasion I am keeping these comparisons is in the relatively weak hope that there will be sufficient signal in the full dataset that it might be able to overcome the apparently ridiculous batch effect from the two clinics.
table(pData(tc_valid)[["condition"]])
##
## cure failure
## 122 62
<- all_pairwise(tc_valid, filter = TRUE, methods = methods,
tc_all_cf_de_sva parallel = parallel, model_batch = "svaseq")
##
## cure failure
## 122 62
## Removing 0 low-count genes (14298 remaining).
## Setting 27144 low elements to zero.
## transform_counts: Found 27144 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_all_cf_table_sva keepers = t_cf_contrast,
tc_all_cf_de_sva, excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_sva-v{ver}.xlsx"))
<- extract_significant_genes(
tc_all_cf_sig_sva
tc_all_cf_table_sva,excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_sva-v{ver}.xlsx"))
<- all_pairwise(tc_valid, filter = TRUE, methods = methods,
tc_all_cf_de_batch parallel = parallel, model_batch = TRUE)
##
## cure failure
## 122 62
##
## 1 2 3
## 83 50 51
<- combine_de_tables(
tc_all_cf_table_batch keepers = t_cf_contrast,
tc_all_cf_de_batch, excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_batch-v{ver}.xlsx"))
<- extract_significant_genes(
tc_all_cf_sig_batch
tc_all_cf_table_batch,excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_batch-v{ver}.xlsx"))
I am not sure if this is the best choice, but I call the set of all samples excluding biopsies ‘clinical’.
table(pData(tc_clinical_nobiop)[["condition"]])
##
## cure failure
## 109 57
<- all_pairwise(tc_clinical_nobiop, filter = TRUE,
tc_clinical_cf_de_sva parallel = parallel, model_batch = "svaseq",
methods = methods)
##
## cure failure
## 109 57
## Removing 0 low-count genes (12162 remaining).
## Setting 17777 low elements to zero.
## transform_counts: Found 17777 values equal to 0, adding 1 to the matrix.
tc_clinical_cf_de_sva
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## limma_vs_deseq 0.8939
## limma_vs_edger 0.8974
## limma_vs_basic 0.9413
## limma_vs_noiseq -0.8345
## deseq_vs_edger 0.9918
## deseq_vs_basic 0.8865
## deseq_vs_noiseq -0.8201
## edger_vs_basic 0.9008
## edger_vs_noiseq -0.8294
## basic_vs_noiseq -0.8598
<- combine_de_tables(
tc_clinical_cf_table_sva keepers = t_cf_contrast,
tc_clinical_cf_de_sva, excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_sva-v{ver}.xlsx"))
tc_clinical_cf_table_sva
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 186 96 214 93
## limma_sigup limma_sigdown
## 1 97 79
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_clinical_cf_sig_sva according_to = "deseq",
tc_clinical_cf_table_sva, excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_sva-v{ver}.xlsx"))
tc_clinical_cf_sig_sva
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## outcome 186 96
<- all_pairwise(tc_clinical_nobiop, filter = TRUE,
tc_clinical_cf_de_batch parallel = parallel, model_batch = TRUE,
methods = methods)
##
## cure failure
## 109 57
##
## eosinophils monocytes neutrophils
## 41 63 62
tc_clinical_cf_de_batch
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## limma_vs_deseq 0.8070
## limma_vs_edger 0.8115
## limma_vs_basic 0.7121
## limma_vs_noiseq -0.6328
## deseq_vs_edger 0.9991
## deseq_vs_basic 0.6416
## deseq_vs_noiseq -0.6422
## edger_vs_basic 0.6418
## edger_vs_noiseq -0.6441
## basic_vs_noiseq -0.8598
<- combine_de_tables(
tc_clinical_cf_table_batch keepers = t_cf_contrast,
tc_clinical_cf_de_batch, excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_batch-v{ver}.xlsx"))
tc_clinical_cf_table_batch
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 106 68 116 74
## limma_sigup limma_sigdown
## 1 83 45
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_clinical_cf_sig_batch according_to = "deseq",
tc_clinical_cf_table_batch, excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_batch-v{ver}.xlsx"))
tc_clinical_cf_sig_batch
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## outcome 106 68
<- color_choices[["cf"]][["cure"]]
num_color <- color_choices[["cf"]][["failure"]]
den_color <- tc_clinical_cf_table_sva[["data"]][["outcome"]]
tc_clinical_cf_table <- plot_volcano_condition_de(
tc_clinical_cf_volcano_top10 "outcome", label = 10,
tc_clinical_cf_table, fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/s11c_tc_clinical_cf_volcano_labeled_top10.svg")
"plot"]]
tc_clinical_cf_volcano_top10[[dev.off()
## png
## 2
"plot"]] tc_clinical_cf_volcano_top10[[
In the following block, we repeat the same question, but using only the biopsy samples from both clinics.
<- set_expt_conditions(tc_biopsies, fact = "finaloutcome") tc_biopsies_cf
## The numbers of samples by condition are:
##
## cure failure
## 13 5
<- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
tc_biopsies_cf_de_sva parallel = parallel, model_batch = "svaseq")
##
## cure failure
## 13 5
## Removing 0 low-count genes (13615 remaining).
## Setting 225 low elements to zero.
## transform_counts: Found 225 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_biopsies_cf_table_sva keepers = t_cf_contrast,
tc_biopsies_cf_de_sva, excel = glue("{cf_prefix}/Biopsies/tc_biopsies_cf_table_sva-v{ver}.xlsx"))
<- extract_significant_genes(
tc_biopsies_cf_sig_sva
tc_biopsies_cf_table_sva,excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_sva-v{ver}.xlsx"))
<- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
tc_biopsies_cf_de_batch parallel = parallel, model_batch = TRUE)
##
## cure failure
## 13 5
##
## 1
## 18
<- combine_de_tables(
tc_biopsies_cf_table_batch keepers = t_cf_contrast,
tc_biopsies_cf_de_batch, excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_table_batch-v{ver}.xlsx"))
<- extract_significant_genes(
tc_biopsies_cf_sig_batch
tc_biopsies_cf_table_batch,excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_batch-v{ver}.xlsx"))
In the following block, we repeat the same question, but using only the Eosinophil samples from both clinics.
<- set_expt_conditions(tc_eosinophils, fact = "finaloutcome") tc_eosinophils_cf
## The numbers of samples by condition are:
##
## cure failure
## 32 9
<- all_pairwise(tc_eosinophils_cf, filter = TRUE, methods = methods,
tc_eosinophils_cf_de_sva parallel = parallel, model_batch = "svaseq")
##
## cure failure
## 32 9
## Removing 0 low-count genes (10867 remaining).
## Setting 860 low elements to zero.
## transform_counts: Found 860 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_eosinophils_cf_table_sva keepers = t_cf_contrast,
tc_eosinophils_cf_de_sva, excel = glue("{cf_prefix}/Eosinophils/tc_eosinophils_cf_table_sva-v{ver}.xlsx"))
<- extract_significant_genes(
tc_eosinophils_cf_sig_sva
tc_eosinophils_cf_table_sva,excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_sva-v{ver}.xlsx"))
<- all_pairwise(tc_eosinophils_cf, filter = TRUE,
tc_eosinophils_cf_de_batch parallel = parallel, model_batch = TRUE,
methods = methods)
##
## cure failure
## 32 9
##
## 3 2 1
## 13 14 14
<- combine_de_tables(
tc_eosinophils_cf_table_batch keepers = t_cf_contrast,
tc_eosinophils_cf_de_batch, excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_table_batch-v{ver}.xlsx"))
<- extract_significant_genes(
tc_eosinophils_cf_sig_batch
tc_eosinophils_cf_table_batch,excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_batch-v{ver}.xlsx"))
Repeat yet again, this time with the monocyte samples. The idea is to see if there is a cell type which is particularly good (or bad) at discriminating the two clinics.
<- set_expt_conditions(tc_monocytes, fact = "finaloutcome") tc_monocytes_cf
## The numbers of samples by condition are:
##
## cure failure
## 39 24
<- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
tc_monocytes_cf_de_sva parallel = parallel, model_batch = "svaseq")
##
## cure failure
## 39 24
## Removing 0 low-count genes (11108 remaining).
## Setting 1330 low elements to zero.
## transform_counts: Found 1330 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_monocytes_cf_table_sva keepers = t_cf_contrast,
tc_monocytes_cf_de_sva, excel = glue("{cf_prefix}/Monocytes/tc_monocytes_cf_table_sva-v{ver}.xlsx"))
<- extract_significant_genes(
tc_monocytes_cf_sig_sva
tc_monocytes_cf_table_sva,excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_sva-v{ver}.xlsx"))
<- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
tc_monocytes_cf_de_batch parallel = parallel, model_batch = TRUE)
##
## cure failure
## 39 24
##
## 3 2 1
## 19 18 26
<- combine_de_tables(
tc_monocytes_cf_table_batch keepers = t_cf_contrast,
tc_monocytes_cf_de_batch, excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_table_batch-v{ver}.xlsx"))
<- extract_significant_genes(
tc_monocytes_cf_sig_batch
tc_monocytes_cf_table_batch,excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_batch-v{ver}.xlsx"))
Last try, this time using the Neutrophil samples.
<- set_expt_conditions(tc_neutrophils, fact = "finaloutcome") tc_neutrophils_cf
## The numbers of samples by condition are:
##
## cure failure
## 38 24
<- all_pairwise(tc_neutrophils_cf, parallel = parallel,
tc_neutrophils_cf_de_sva filter = TRUE, model_batch = "svaseq",
methods = methods)
##
## cure failure
## 38 24
## Removing 0 low-count genes (9244 remaining).
## Setting 1563 low elements to zero.
## transform_counts: Found 1563 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_neutrophils_cf_table_sva keepers = t_cf_contrast,
tc_neutrophils_cf_de_sva, excel = glue("{cf_prefix}/Neutrophils/tc_neutrophils_cf_table_sva-v{ver}.xlsx"))
<- extract_significant_genes(
tc_neutrophils_cf_sig_sva
tc_neutrophils_cf_table_sva,excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_sva-v{ver}.xlsx"))
<- all_pairwise(tc_neutrophils_cf, filter = TRUE,
tc_neutrophils_cf_de_batch parallel = parallel, model_batch = TRUE,
methods = methods)
##
## cure failure
## 38 24
##
## 3 2 1
## 19 18 25
<- combine_de_tables(
tc_neutrophils_cf_table_batch keepers = t_cf_contrast,
tc_neutrophils_cf_de_batch, excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_table_batch-v{ver}.xlsx"))
<- extract_significant_genes(
tc_neutrophils_cf_sig_batch
tc_neutrophils_cf_table_batch,excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_batch-v{ver}.xlsx"))
Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.
<- all_pairwise(tc_v1vs, model_batch = "svaseq", methods = methods,
v1_vs_later parallel = parallel, filter = TRUE)
##
## first later
## 65 101
## Removing 0 low-count genes (12162 remaining).
## Setting 17758 low elements to zero.
## transform_counts: Found 17758 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
v1_vs_later_table keepers = visit_v1later,
v1_vs_later, excel = glue("{visit_prefix}/v1_vs_later_tables-v{ver}.xlsx"))
<- extract_significant_genes(
v1_vs_later_sig
v1_vs_later_table,excel = glue("{visit_prefix}/v1_vs_later_sig-v{ver}.xlsx"))
<- all_gprofiler(v1_vs_later_sig)
v1later_gp 1]]$pvalue_plots$REAC v1later_gp[[
2]]$pvalue_plots$REAC v1later_gp[[
<- all_pairwise(tc_sex, model_batch = "svaseq", methods = methods,
tc_sex_de parallel = parallel, filter = TRUE)
##
## female male
## 28 156
## Removing 0 low-count genes (14298 remaining).
## Setting 26368 low elements to zero.
## transform_counts: Found 26368 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
tc_sex_table excel = glue("{sex_prefix}/tc_sex_table-v{ver}.xlsx"))
tc_sex_de, <- extract_significant_genes(
tc_sex_sig excel = glue("{sex_prefix}/tc_sex_sig-v{ver}.xlsx"))
tc_sex_table, <- all_gprofiler(tc_sex_sig) tc_sex_gp
<- subset_expt(tc_sex, subset = "finaloutcome=='cure'") tc_sex_cure
## The samples excluded are: TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30017, TMRC30019, TMRC30071, TMRC30056, TMRC30105, TMRC30058, TMRC30094, TMRC30119, TMRC30122, TMRC30107, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30121, TMRC30026, TMRC30048, TMRC30054, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30053, TMRC30068, TMRC30123, TMRC30072, TMRC30078, TMRC30116, TMRC30076, TMRC30088, TMRC30197, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30177, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 184, now there are 122 samples.
<- all_pairwise(tc_sex_cure, model_batch = "svaseq",
tc_sex_cure_de parallel = parallel, filter = TRUE,
methods = methods)
##
## female male
## 19 103
## Removing 0 low-count genes (14156 remaining).
## Setting 17015 low elements to zero.
## transform_counts: Found 17015 values equal to 0, adding 1 to the matrix.
tc_sex_cure_de
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## limma_vs_deseq 0.6481
## limma_vs_edger 0.7769
## limma_vs_basic 0.9099
## limma_vs_noiseq -0.7055
## deseq_vs_edger 0.8641
## deseq_vs_basic 0.6740
## deseq_vs_noiseq -0.5454
## edger_vs_basic 0.8003
## edger_vs_noiseq -0.6463
## basic_vs_noiseq -0.7563
<- combine_de_tables(
tc_sex_cure_table excel = glue("{sex_prefix}/tc_sex_cure_table-v{ver}.xlsx"))
tc_sex_cure_de, tc_sex_cure_table
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 70 74 64 80
## limma_sigup limma_sigdown
## 1 40 74
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
<- extract_significant_genes(
tc_sex_cure_sig excel = glue("{sex_prefix}/tc_sex_cure_sig-v{ver}.xlsx"))
tc_sex_cure_table, tc_sex_cure_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 40 74 64 80 70 74
## basic_up basic_down
## male_vs_female 12 5
<- all_gprofiler(tc_sex_cure_sig)
tc_sex_cure_gp tc_sex_cure_gp
## Running gProfiler on every set of significant genes found:
## BP CORUM HP HPA KEGG MIRNA MF REAC TF WP
## male_vs_female_up 3 0 1 0 1 0 1 0 3 0
## male_vs_female_down 3 0 0 0 0 0 0 1 0 0
1]][["pvalue_plots"]][["BP"]] tc_sex_cure_gp[[
## NULL
2]][["pvalue_plots"]][["BP"]] tc_sex_cure_gp[[
## NULL
<- all_pairwise(tc_etnia_expt, model_batch = "svaseq",
tc_ethnicity_de parallel = parallel, filter = TRUE,
methods = methods)
##
## afrocol indigena mestiza
## 91 46 47
## Removing 0 low-count genes (14298 remaining).
## Setting 28882 low elements to zero.
## transform_counts: Found 28882 values equal to 0, adding 1 to the matrix.
tc_ethnicity_de
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
<- combine_de_tables(
tc_ethnicity_table keepers = ethnicity_contrasts,
tc_ethnicity_de, excel = glue("{eth_prefix}/tc_ethnicity_table-v{ver}.xlsx"))
tc_ethnicity_table
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 48 22 53 23
## 2 mestiza_vs_afrocol 51 171 58 180
## 3 indigena_vs_afrocol 67 269 72 280
## limma_sigup limma_sigdown
## 1 24 14
## 2 44 90
## 3 78 144
## Plot describing unique/shared genes in a differential expression table.
"plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]] tc_ethnicity_table[[
"plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]] tc_ethnicity_table[[
"plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]] tc_ethnicity_table[[
<- extract_significant_genes(
tc_ethnicity_sig excel = glue("{eth_prefix}/tc_ethnicity_sig-v{ver}.xlsx"))
tc_ethnicity_table,
<- subset_expt(tc_etnia_expt, subset = "finaloutcome=='cure'") ethnicity_cure
## The samples excluded are: TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30017, TMRC30019, TMRC30071, TMRC30056, TMRC30105, TMRC30058, TMRC30094, TMRC30119, TMRC30122, TMRC30107, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30121, TMRC30026, TMRC30048, TMRC30054, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30053, TMRC30068, TMRC30123, TMRC30072, TMRC30078, TMRC30116, TMRC30076, TMRC30088, TMRC30197, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30177, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 184, now there are 122 samples.
<- all_pairwise(ethnicity_cure, model_batch = "svaseq",
ethnicity_cure_de parallel = parallel, filter = TRUE,
methods = methods)
##
## afrocol indigena mestiza
## 39 36 47
## Removing 0 low-count genes (14156 remaining).
## Setting 17594 low elements to zero.
## transform_counts: Found 17594 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(
ethnicity_cure_table keepers = ethnicity_contrasts,
ethnicity_cure_de, excel = glue("{eth_prefix}/ethnicity_cure_table-v{ver}.xlsx"))
ethnicity_cure_table
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 64 24 60 26
## 2 mestiza_vs_afrocol 68 168 78 167
## 3 indigena_vs_afrocol 87 352 97 340
## limma_sigup limma_sigdown
## 1 36 15
## 2 63 100
## 3 109 180
## Plot describing unique/shared genes in a differential expression table.
"plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]] ethnicity_cure_table[[
"plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]] ethnicity_cure_table[[
"plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]] ethnicity_cure_table[[
<- extract_significant_genes(
ethnicity_cure_sig excel = glue("{eth_prefix}/ethnicity_cure_sig-v{ver}.xlsx"))
ethnicity_cure_table, ethnicity_cure_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## mestizo_indigenous 36 15 60 26 64 24
## mestizo_afrocol 63 100 78 167 68 168
## indigenous_afrocol 109 180 97 340 87 352
## basic_up basic_down
## mestizo_indigenous 0 1
## mestizo_afrocol 6 13
## indigenous_afrocol 46 63
Performed once with both clinics and again with only Tumaco.
<- all_gprofiler(tc_ethnicity_sig) tc_ethnicity_gp
::pander(sessionInfo()) pander
R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
locale: C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ruv(v.0.9.7.1), DOSE(v.3.28.2), forcats(v.1.0.0), dplyr(v.1.1.4), hpgltools(v.1.0), Matrix(v.1.6-5), glue(v.1.7.0), SummarizedExperiment(v.1.32.0), GenomicRanges(v.1.54.1), GenomeInfoDb(v.1.38.6), IRanges(v.2.36.0), S4Vectors(v.0.40.2), MatrixGenerics(v.1.14.0), matrixStats(v.1.2.0), Biobase(v.2.62.0) and BiocGenerics(v.0.48.1)
loaded via a namespace (and not attached): splines(v.4.4.1), later(v.1.3.2), BiocIO(v.1.12.0), ggplotify(v.0.1.2), bitops(v.1.0-7), filelock(v.1.0.3), tibble(v.3.2.1), R.oo(v.1.26.0), polyclip(v.1.10-6), preprocessCore(v.1.64.0), graph(v.1.80.0), XML(v.3.99-0.16.1), lifecycle(v.1.0.4), edgeR(v.4.0.16), doParallel(v.1.0.17), lattice(v.0.22-5), MASS(v.7.3-60.0.1), crosstalk(v.1.2.1), backports(v.1.4.1), magrittr(v.2.0.3), openxlsx(v.4.2.5.2), limma(v.3.58.1), plotly(v.4.10.4), sass(v.0.4.8), rmarkdown(v.2.25), jquerylib(v.0.1.4), yaml(v.2.3.8), httpuv(v.1.6.14), zip(v.2.3.1), cowplot(v.1.1.3), DBI(v.1.2.2), RColorBrewer(v.1.1-3), abind(v.1.4-5), zlibbioc(v.1.48.0), purrr(v.1.0.2), R.utils(v.2.12.3), ggraph(v.2.1.0), RCurl(v.1.98-1.14), yulab.utils(v.0.1.7), tweenr(v.2.0.2), sva(v.3.50.0), GenomeInfoDbData(v.1.2.11), enrichplot(v.1.22.0), ggrepel(v.0.9.5), tidytree(v.0.4.6), genefilter(v.1.84.0), Vennerable(v.3.1.0.9000), annotate(v.1.80.0), codetools(v.0.2-19), DelayedArray(v.0.28.0), ggforce(v.0.4.2), tidyselect(v.1.2.0), aplot(v.0.2.2), farver(v.2.1.1), viridis(v.0.6.5), BiocFileCache(v.2.10.1), GenomicAlignments(v.1.38.2), jsonlite(v.1.8.8), tidygraph(v.1.3.1), ellipsis(v.0.3.2), survival(v.3.5-8), iterators(v.1.0.14), foreach(v.1.5.2), tools(v.4.4.1), treeio(v.1.29.1), Rcpp(v.1.0.12), gridExtra(v.2.3), SparseArray(v.1.2.4), xfun(v.0.42), mgcv(v.1.9-1), DESeq2(v.1.42.0), qvalue(v.2.34.0), withr(v.3.0.0), BiocManager(v.1.30.25), fastmap(v.1.1.1), fansi(v.1.0.6), digest(v.0.6.34), gridGraphics(v.0.5-1), R6(v.2.5.1), mime(v.0.12), colorspace(v.2.1-0), GO.db(v.3.18.0), gtools(v.3.9.5), RSQLite(v.2.3.5), R.methodsS3(v.1.8.2), UpSetR(v.1.4.0), utf8(v.1.2.4), tidyr(v.1.3.1), generics(v.0.1.3), data.table(v.1.15.0), corpcor(v.1.6.10), rtracklayer(v.1.62.0), robustbase(v.0.99-4), graphlayouts(v.1.1.0), httr(v.1.4.7), htmlwidgets(v.1.6.4), S4Arrays(v.1.2.0), scatterpie(v.0.2.1), pkgconfig(v.2.0.3), gtable(v.0.3.4), blob(v.1.2.4), XVector(v.0.42.0), shadowtext(v.0.1.3), clusterProfiler(v.4.10.1), htmltools(v.0.5.7), fgsea(v.1.28.0), RBGL(v.1.78.0), GSEABase(v.1.64.0), ggupset(v.0.4.0), scales(v.1.3.0), png(v.0.1-8), ggfun(v.0.1.6), knitr(v.1.45), reshape2(v.1.4.4), rjson(v.0.2.21), nlme(v.3.1-164), curl(v.5.2.0), cachem(v.1.0.8), stringr(v.1.5.1), parallel(v.4.4.1), HDO.db(v.0.99.1), AnnotationDbi(v.1.64.1), restfulr(v.0.0.15), pillar(v.1.9.0), grid(v.4.4.1), vctrs(v.0.6.5), promises(v.1.2.1), dbplyr(v.2.4.0), xtable(v.1.8-4), evaluate(v.0.23), cli(v.3.6.2), locfit(v.1.5-9.8), compiler(v.4.4.1), Rsamtools(v.2.18.0), rlang(v.1.1.3), crayon(v.1.5.2), gprofiler2(v.0.2.3), labeling(v.0.4.3), plyr(v.1.8.9), fs(v.1.6.3), pander(v.0.6.5), stringi(v.1.8.3), viridisLite(v.0.4.2), BiocParallel(v.1.36.0), munsell(v.0.5.0), Biostrings(v.2.70.2), lazyeval(v.0.2.2), GOSemSim(v.2.28.1), patchwork(v.1.2.0), bit64(v.4.0.5), ggplot2(v.3.5.0), KEGGREST(v.1.42.0), statmod(v.1.5.0), varhandle(v.2.0.6), shiny(v.1.8.0), highr(v.0.10), igraph(v.2.0.2), broom(v.1.0.5), memoise(v.2.0.1), bslib(v.0.6.1), ggtree(v.3.13.1), fastmatch(v.1.1-4), DEoptimR(v.1.1-3), bit(v.4.0.5), gson(v.0.1.0) and ape(v.5.8)
message("This is hpgltools commit: ", get_git_commit())
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset e94559f9353874aac76346ceb4db55016a142abb
## This is hpgltools commit: Fri Sep 27 15:44:30 2024 -0400: e94559f9353874aac76346ceb4db55016a142abb
message("Saving to ", savefile)
## Saving to 03differential_expression_both.rda.xz
# tmp <- sm(saveme(filename = savefile))
<- loadme(filename = savefile) tmp