The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document. Most of the actual work is via the function ‘all_pairwise()’; the word ‘all’ in the name does a lot of work; it is responsible for performing every possible pairwise contrast using each possible method for which I have sufficient understanding to be able to write a reasonably robust pairwise function. Currently this is limited to:
The first 3 methods allow one to add surrogate variable estimates to the model when performing the differential expression analyses. Noiseq handles surrogates using its own heuristics, EBSeq is inimicable to that kind of model, and I explicitly chose to not make that possible for basic. I am uncertain at this time how the random effect factors used with dream interact with surrogates from sva. With that in mind, in most instances I usually deal with surrogates/batches in one of a few ways:
The last two options are handled via a function named ‘all_adjusters’ in hpgltools which is responsible for ensuring that the data is sane for the assumptions made by each method and invokes each method (hopefully) properly. It returns both modified counts and model estimates when possible and has implementations for a fair number of methods in this realm. sva is my favorite by a pretty big margin, though I do sometimes use RUV (Risso et al. (2014)) and of course, in writing this document I stumbled into another interesting contender: (Molania et al. (2023)) all_adjusters() also has implementations of every example/method I got out of the papers for sva (e.g. ssva/fsva), isva, smartsva, and some others.
I have been changing hpgltools so that it is now possible to trivially pass arbitrarily complex models to the various methods; with the caveat that there is no good way currently to mix fixed effects and random effects across methods; so I am running dream separately and adding it to the result of all_pairwise post-facto.
Note 202502: This last limitation has been lifted; the code is smart enough now to take arbitrarily complex models and simultaneously take a separate mixed model for dream.
Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus t_cf_contrasts is expected to be used for the Tumaco data and provide a series of cure/fail comparisons among those samples. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.
t_cf_contrast <- list(
"outcome" = c("tumaco_failure", "tumaco_cure"))
cf_contrast <- list(
"outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
"v1cf" = c("v1_failure", "v1_cure"),
"v2cf" = c("v2_failure", "v2_cure"),
"v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
"v2v1" = c("v2", "v1"),
"v3v1" = c("v3", "v1"),
"v3v2" = c("v3", "v2"))
visit_v1later <- list(
"later_vs_first" = c("later", "first"))
celltypes <- list(
"eo_mono" = c("eosinophils", "monocytes"),
"ne_mono" = c("neutrophils", "monocytes"),
"eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
"mestizo_indigenous" = c("mestiza", "indigena"),
"mestizo_afrocol" = c("mestiza", "afrocol"),
"indigenous_afrocol" = c("indigena", "afrocol"))
outcometype_contrasts <- list(
"monocyte_cf" = c("failure_monocytes", "cure_monocytes"),
"neutrophil_cf" = c("failure_neutrophils", "cure_neutrophils"),
"eosinophil_cf" = c("failure_eosinophils", "cure_eosinophils"))
visittype_contrasts_mono <- list(
"v2v1_mono_cure" = c("monocytes_2_cure", "monocytes_1_cure"),
"v2v1_mono_failure" = c("monocytes_2_failure", "monocytes_1_failure"),
"v3v1_mono_cure" = c("monocytes_3_cure", "monocytes_1_cure"),
"v3v1_mono_failure" = c("monocytes_3_failure", "monocytes_1_failure"))
visittype_contrasts_eo <- list(
"v2v1_eo_cure" = c("eosinophils_2_cure", "eosinophils_1_cure"),
"v2v1_eo_failure" = c("eosinophils_2_failure", "eosinophils_1_failure"),
"v3v1_eo_cure" = c("eosinophils_3_cure", "eosinophils_1_cure"),
"v3v1_eo_failure" = c("eosinophils_3_failure", "eosinophils_1_failure"))
visittype_contrasts_ne <- list(
"v2v1_ne_cure" = c("neutrophils_2_cure", "neutrophils_1_cure"),
"v2v1_ne_failure" = c("neutrophils_2_failure", "neutrophils_1_failure"),
"v3v1_ne_cure" = c("neutrophils_3_cure", "neutrophils_1_cure"),
"v3v1_ne_failure" = c("neutrophils_3_failure", "neutrophils_1_failure"))
visittype_contrasts <- c(visittype_contrasts_mono,
visittype_contrasts_eo,
visittype_contrasts_ne)The over representation analyses (e.g. GO and friends) have moved multiple times in the process of producing these analyses. I recently mentally severed my conception of GO analyses into two camps: over representation analyses in which one provides a group of genes deemed significant in some way and asks if there are known categories which contain these genes more than one would expect at random. In contrast, I am defining Gene Set Enrichment Analyses explcitly as the process of passing all observed genes with their metric of choice (logFC, exprs, whatever) and asking if the distribution of all genes is significant with respect to the genes in each category. With that in mind, I added a series of explicitly GSEA analyses in my later iterations of these documents so that both ways of thinking are provided.
However, I moved those analyses to a separate document (05enrichment.Rmd) in the hopes of improving their organization.
Start over, this time with only the samples from Tumaco. We currently are assuming these will prove to be the only analyses used for final interpretation. This is primarily because we have insufficient samples which failed treatment from Cali. There is one disadvantage when using these samples: they had to travel further than the samples taken in Cali and there is significant variance observed between the two locations and we cannot discern its source. In the worst case scenario (one which I think unlikely), the variance is caused by degraded RNA during transit. We do know that the samples were well-stored in RNALater and frozen/etc, so I am inclined to discount that possibility. (Also, looking at the reads in IGV they don’t ‘look’ degraded to me: e.g. all the reads are not at the 3’ end as I would expect if they were significantly degraded (given that these were poly-A selections.) I think a more compelling difference lies in the different population demographics observed in the two locations. Actually, now that I have typed these sentences out, I think I can semi-test this hypothesis by looking at the set of DE genes between the two locations and compare that result to the Tumaco (and/or Cali) ethnicity comparison which is most representative of the ethnicity differences between them. If I get it into my head to try this, I will need to load the DE tables from the 03differential_expression_both.Rmd document; so I am most likely to try it out in the 07var_coef document, which was mostly written by Theresa and is already examining some similar questions.
Start by considering all Tumaco cell types. Note that in this case we only use SVA, primarily because I am not certain what would be an appropriate batch factor, perhaps visit?
t_cf_clinical_de_sva <- all_pairwise(t_clinical, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 67 56
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_clinical <- t_cf_clinical_de_sva[["input"]]
t_cf_clinical_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.8276
## basic_vs_dream 0.8746
## basic_vs_ebseq 0.6353
## basic_vs_edger 0.8319
## basic_vs_limma 0.8721
## basic_vs_noiseq 0.8599
## deseq_vs_dream 0.8482
## deseq_vs_ebseq 0.6986
## deseq_vs_edger 0.9846
## deseq_vs_limma 0.8081
## deseq_vs_noiseq 0.9082
## dream_vs_ebseq 0.7288
## dream_vs_edger 0.8460
## dream_vs_limma 0.9402
## dream_vs_noiseq 0.7946
## ebseq_vs_edger 0.6721
## ebseq_vs_limma 0.6069
## ebseq_vs_noiseq 0.6916
## edger_vs_limma 0.8196
## edger_vs_noiseq 0.9047
## limma_vs_noiseq 0.7678
t_cf_clinical_table_sva <- combine_de_tables(
t_cf_clinical_de_sva, keepers = cf_contrast, label_column = "hgnc_symbol",
excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_table_sva-v{ver}.xlsx"))
t_cf_clinical_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 92 185 102 159
## limma_sigup limma_sigdown
## 1 52 39
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]t_cf_clinical_sig_sva <- extract_significant_genes(
t_cf_clinical_table_sva,
excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 52 39 102 159 92 185 0
## ebseq_down basic_up basic_down
## outcome 48 39 0
dim(t_cf_clinical_sig_sva[["deseq"]][["ups"]][[1]])## [1] 92 75
dim(t_cf_clinical_sig_sva[["deseq"]][["downs"]][[1]])## [1] 185 75
Repeat without the biopsies.
t_cf_clinicalnb_de_sva <- all_pairwise(t_clinical_nobiop, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 58 51
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_clinical_nobiop <- t_cf_clinicalnb_de_sva[["input"]]
t_cf_clinicalnb_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.8312
## basic_vs_dream 0.8504
## basic_vs_ebseq 0.7427
## basic_vs_edger 0.8398
## basic_vs_limma 0.8586
## basic_vs_noiseq 0.8938
## deseq_vs_dream 0.8545
## deseq_vs_ebseq 0.8331
## deseq_vs_edger 0.9966
## deseq_vs_limma 0.8567
## deseq_vs_noiseq 0.8996
## dream_vs_ebseq 0.7825
## dream_vs_edger 0.8567
## dream_vs_limma 0.9848
## dream_vs_noiseq 0.7786
## ebseq_vs_edger 0.8277
## ebseq_vs_limma 0.7824
## ebseq_vs_noiseq 0.8561
## edger_vs_limma 0.8594
## edger_vs_noiseq 0.9041
## limma_vs_noiseq 0.7875
t_cf_clinicalnb_table_sva <- combine_de_tables(
t_cf_clinicalnb_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_table_sva-v{ver}.xlsx"))
t_cf_clinicalnb_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 125 77 126 62
## limma_sigup limma_sigdown
## 1 50 45
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinicalnb_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]t_cf_clinicalnb_sig_sva <- extract_significant_genes(
t_cf_clinicalnb_table_sva,
excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinicalnb_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 50 45 126 62 125 77 1
## ebseq_down basic_up basic_down
## outcome 7 88 0
dim(t_cf_clinicalnb_sig_sva[["deseq"]][["ups"]][[1]])## [1] 125 82
dim(t_cf_clinicalnb_sig_sva[["deseq"]][["downs"]][[1]])## [1] 77 82
As the data structure’s name suggests, the above comparison seeks to learn if there are fail/cure differences discernable across all clinical celltypes in samples taken in Tumaco.
The set of steps taken in this previous block will be essentially repeated for every set of contrasts and way of mixing/matching the data and follows the path:
These datastructures are all exposed to various functions in hpgltools which allow one to poke/compare them; I am not a fan of Excel, but I think the xlsx documents it creates are pretty decent, too.
Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.
tv1_vs_later <- all_pairwise(t_v1vs, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## first later
## 40 69
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_v1vs <- tv1_vs_later[["input"]]
tv1_vs_later## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## ltr_vs_frs
## basic_vs_deseq 0.7966
## basic_vs_dream 0.8031
## basic_vs_ebseq 0.7515
## basic_vs_edger 0.8005
## basic_vs_limma 0.8145
## basic_vs_noiseq 0.8903
## deseq_vs_dream 0.8507
## deseq_vs_ebseq 0.7823
## deseq_vs_edger 0.9983
## deseq_vs_limma 0.8405
## deseq_vs_noiseq 0.8607
## dream_vs_ebseq 0.8109
## dream_vs_edger 0.8575
## dream_vs_limma 0.9717
## dream_vs_noiseq 0.7528
## ebseq_vs_edger 0.7883
## ebseq_vs_limma 0.7799
## ebseq_vs_noiseq 0.8285
## edger_vs_limma 0.8470
## edger_vs_noiseq 0.8648
## limma_vs_noiseq 0.7446
tv1_vs_later_table <- combine_de_tables(
tv1_vs_later, keepers = visit_v1later, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_tables-v{ver}.xlsx"))
tv1_vs_later_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 later_vs_first 23 7 22 7
## limma_sigup limma_sigdown
## 1 23 7
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tv1_vs_later_sig <- extract_significant_genes(
tv1_vs_later_table,
excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_sig-v{ver}.xlsx"))
tv1_vs_later_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## later_vs_first 23 7 22 7 23 7
## ebseq_up ebseq_down basic_up basic_down
## later_vs_first 0 0 0 0
There is an important caveat when considering the sex of people in the study: there are very few females who failed. As a result I primarily concerned with the cure samples male/female.
t_sex <- subset_expt(tc_sex, subset = "clinic == 'tumaco'")## subset_expt(): There were 184, now there are 123 samples.
t_sex## A modified expressionSet containing 19858 and 123 sample. There are 164 metadata columns and 15 annotation columns.
## The primary condition is comprised of:
## female, male.
## Its current state is: raw(data).
t_sex_de <- all_pairwise(t_sex, model_batch = "svaseq", methods = methods,
filter = TRUE)##
## female male
## 22 101
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_sex <- t_sex_de[["input"]]
t_sex_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## basic_vs_deseq 0.8689
## basic_vs_dream 0.9344
## basic_vs_ebseq 0.7114
## basic_vs_edger 0.8734
## basic_vs_limma 0.9465
## basic_vs_noiseq 0.8505
## deseq_vs_dream 0.8805
## deseq_vs_ebseq 0.7571
## deseq_vs_edger 0.9909
## deseq_vs_limma 0.8584
## deseq_vs_noiseq 0.9111
## dream_vs_ebseq 0.7959
## dream_vs_edger 0.8852
## dream_vs_limma 0.9763
## dream_vs_noiseq 0.8244
## ebseq_vs_edger 0.7766
## ebseq_vs_limma 0.7731
## ebseq_vs_noiseq 0.7535
## edger_vs_limma 0.8652
## edger_vs_noiseq 0.9107
## limma_vs_noiseq 0.8124
t_sex_table <- combine_de_tables(
t_sex_de, scale_p = TRUE,
excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_table-v{ver}.xlsx"))
t_sex_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 127 96 115 95
## limma_sigup limma_sigdown
## 1 52 73
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_sex_sig <- extract_significant_genes(
t_sex_table, excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_sig-v{ver}.xlsx"))
t_sex_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 52 73 115 95 127 96
## ebseq_up ebseq_down basic_up basic_down
## male_vs_female 11 12 22 0
In the following block I removed the failed people so that the comparison makes actual sense.
tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")## subset_expt(): There were 184, now there are 122 samples.
t_sex_cure <- subset_expt(tc_sex_cure, subset = "clinic == 'tumaco'")## subset_expt(): There were 122, now there are 67 samples.
t_sex_cure_de <- all_pairwise(t_sex_cure, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## female male
## 13 54
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_sex_cure <- t_sex_cure_de[["input"]]
t_sex_cure_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## basic_vs_deseq 0.7953
## basic_vs_dream 0.9199
## basic_vs_ebseq 0.6642
## basic_vs_edger 0.8448
## basic_vs_limma 0.9268
## basic_vs_noiseq 0.8776
## deseq_vs_dream 0.8058
## deseq_vs_ebseq 0.7192
## deseq_vs_edger 0.9272
## deseq_vs_limma 0.7764
## deseq_vs_noiseq 0.8423
## dream_vs_ebseq 0.7794
## dream_vs_edger 0.8606
## dream_vs_limma 0.9692
## dream_vs_noiseq 0.8393
## ebseq_vs_edger 0.7666
## ebseq_vs_limma 0.7422
## ebseq_vs_noiseq 0.7078
## edger_vs_limma 0.8357
## edger_vs_noiseq 0.8868
## limma_vs_noiseq 0.8127
t_sex_cure_table <- combine_de_tables(
t_sex_cure_de, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_table-v{ver}.xlsx"))
t_sex_cure_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 179 135 154 143
## limma_sigup limma_sigdown
## 1 63 108
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_sex_cure_sig <- extract_significant_genes(
t_sex_cure_table, excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_sig-v{ver}.xlsx"))
t_sex_cure_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 63 108 154 143 179 135
## ebseq_up ebseq_down basic_up basic_down
## male_vs_female 10 15 10 0
In a fashion similar to the putative sex comparisons; there are few/no fails for one ethnicity. In addition, the observed ethnicities are very different for the two clinics. This makes comparisons of the ethnicities tricky.
t_ethnicity_de <- all_pairwise(t_etnia_expt, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## afrocol indigena mestiza
## 76 19 28
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_etnia_expt <- t_ethnicity_de[["input"]]
t_ethnicity_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_ethnicity_table <- combine_de_tables(
t_ethnicity_de, keepers = ethnicity_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_table-v{ver}.xlsx"))
t_ethnicity_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 78 96 65 107
## 2 mestiza_vs_afrocol 55 92 50 96
## 3 indigena_vs_afrocol 160 235 183 213
## limma_sigup limma_sigdown
## 1 57 55
## 2 40 53
## 3 163 145
## Plot describing unique/shared genes in a differential expression table.
t_ethnicity_sig <- extract_significant_genes(
t_ethnicity_table, according_to = "deseq",
excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_sig-v{ver}.xlsx"))
t_ethnicity_sig## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## mestizo_indigenous 78 96
## mestizo_afrocol 55 92
## indigenous_afrocol 160 235
One of the most compelling ideas in the data is the opportunity to find genes in the first visit which may help predict the likelihood that a person will respond well to treatment. The following block will therefore look at cure/fail from Tumaco at visit 1.
t_cf_clinical_v1_de_sva <- all_pairwise(tv1_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 30 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv1_samples <- t_cf_clinical_v1_de_sva[["input"]]
t_cf_clinical_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.6931
## basic_vs_dream 0.7284
## basic_vs_ebseq 0.6525
## basic_vs_edger 0.7205
## basic_vs_limma 0.6839
## basic_vs_noiseq 0.8249
## deseq_vs_dream 0.7920
## deseq_vs_ebseq 0.7110
## deseq_vs_edger 0.9533
## deseq_vs_limma 0.7404
## deseq_vs_noiseq 0.7787
## dream_vs_ebseq 0.6911
## dream_vs_edger 0.8280
## dream_vs_limma 0.9333
## dream_vs_noiseq 0.6880
## ebseq_vs_edger 0.6777
## ebseq_vs_limma 0.5517
## ebseq_vs_noiseq 0.7745
## edger_vs_limma 0.7838
## edger_vs_noiseq 0.7871
## limma_vs_noiseq 0.5946
t_cf_clinical_v1_table_sva <- combine_de_tables(
t_cf_clinical_v1_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_table_sva-v{ver}.xlsx"))
t_cf_clinical_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 28 74 26 54
## limma_sigup limma_sigdown
## 1 2 3
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v1_sig_sva <- extract_significant_genes(
t_cf_clinical_v1_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 2 3 26 54 28 74 0
## ebseq_down basic_up basic_down
## outcome 37 0 0
dim(t_cf_clinical_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 28 82
dim(t_cf_clinical_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 74 82
The visit 2 and visit 3 samples are interesting because they provide an opportunity to see if we can observe changes in response in the middle and end of treatment…
t_cf_clinical_v2_de_sva <- all_pairwise(tv2_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 20 15
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv2_samples <- t_cf_clinical_v2_de_sva[["input"]]
t_cf_clinical_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7736
## basic_vs_dream 0.7195
## basic_vs_ebseq 0.7229
## basic_vs_edger 0.7749
## basic_vs_limma 0.7424
## basic_vs_noiseq 0.8551
## deseq_vs_dream 0.8077
## deseq_vs_ebseq 0.7925
## deseq_vs_edger 0.9986
## deseq_vs_limma 0.8163
## deseq_vs_noiseq 0.8446
## dream_vs_ebseq 0.6840
## dream_vs_edger 0.8102
## dream_vs_limma 0.9635
## dream_vs_noiseq 0.6059
## ebseq_vs_edger 0.7960
## ebseq_vs_limma 0.6918
## ebseq_vs_noiseq 0.8214
## edger_vs_limma 0.8187
## edger_vs_noiseq 0.8434
## limma_vs_noiseq 0.6312
t_cf_clinical_v2_table_sva <- combine_de_tables(
t_cf_clinical_v2_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_table_sva-v{ver}.xlsx"))
t_cf_clinical_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 51 15 50 11
## limma_sigup limma_sigdown
## 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v2_sig_sva <- extract_significant_genes(
t_cf_clinical_v2_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 50 11 51 15 0
## ebseq_down basic_up basic_down
## outcome 0 0 0
dim(t_cf_clinical_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 51 82
dim(t_cf_clinical_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 15 82
Repeat for visit 3
t_cf_clinical_v3_de_sva <- all_pairwise(tv3_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 17 17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv3_samples <- t_cf_clinical_v3_de_sva[["input"]]
t_cf_clinical_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7970
## basic_vs_dream 0.8029
## basic_vs_ebseq 0.7592
## basic_vs_edger 0.7992
## basic_vs_limma 0.8145
## basic_vs_noiseq 0.8989
## deseq_vs_dream 0.8560
## deseq_vs_ebseq 0.7991
## deseq_vs_edger 0.9983
## deseq_vs_limma 0.8538
## deseq_vs_noiseq 0.8720
## dream_vs_ebseq 0.7627
## dream_vs_edger 0.8602
## dream_vs_limma 0.9815
## dream_vs_noiseq 0.7324
## ebseq_vs_edger 0.8014
## ebseq_vs_limma 0.7578
## ebseq_vs_noiseq 0.8467
## edger_vs_limma 0.8578
## edger_vs_noiseq 0.8734
## limma_vs_noiseq 0.7356
t_cf_clinical_v3_table_sva <- combine_de_tables(
t_cf_clinical_v3_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_table_sva-v{ver}.xlsx"))
t_cf_clinical_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 112 61 113 45
## limma_sigup limma_sigdown
## 1 3 1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v3_sig_sva <- extract_significant_genes(
t_cf_clinical_v3_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 3 1 113 45 112 61 0
## ebseq_down basic_up basic_down
## outcome 0 0 0
dim(t_cf_clinical_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 112 82
dim(t_cf_clinical_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 61 82
Now let us switch our view to each individual cell type collected. The hope here is that we will be able to learn some cell-specific differences in the response for people who did(not) respond well.
A primary hypothesis/assumption that we have held for quite a while with this data: the biopsy samples, given that they are comprised of hetergeneous tissue types as well as a mix of healthy and infected tissue; are unlikely to be very information rich vis a vis cure/fail. The following block seems to support that; we observe very few genes in the biopsies.
I therefore did not spend the time invoking other models.
t_cf_biopsy_de_sva <- all_pairwise(t_biopsies, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 9 5
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_biopsies <- t_cf_biopsy_de_sva[["input"]]
t_cf_biopsy_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8180
## basic_vs_dream 0.8530
## basic_vs_ebseq 0.8008
## basic_vs_edger 0.8825
## basic_vs_limma 0.8505
## basic_vs_noiseq 0.9168
## deseq_vs_dream 0.7998
## deseq_vs_ebseq 0.8643
## deseq_vs_edger 0.9517
## deseq_vs_limma 0.7924
## deseq_vs_noiseq 0.8702
## dream_vs_ebseq 0.7540
## dream_vs_edger 0.8694
## dream_vs_limma 0.9938
## dream_vs_noiseq 0.7766
## ebseq_vs_edger 0.8857
## ebseq_vs_limma 0.7345
## ebseq_vs_noiseq 0.8877
## edger_vs_limma 0.8625
## edger_vs_noiseq 0.9195
## limma_vs_noiseq 0.7667
t_cf_biopsy_table_sva <- combine_de_tables(
t_cf_biopsy_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Biopsies/t_biopsy_cf_table_sva-v{ver}.xlsx"))
t_cf_biopsy_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 16 11 19
## edger_sigdown limma_sigup limma_sigdown
## 1 15 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_biopsy_sig_sva <- extract_significant_genes(
t_cf_biopsy_table_sva,
excel = glue("{cf_prefix}/Biopsies/t_cf_biopsy_sig_sva-v{ver}.xlsx"))
t_cf_biopsy_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 19 15 16 11 11
## ebseq_down basic_up basic_down
## outcome 57 0 0
dim(t_cf_biopsy_sig_sva[["deseq"]][["ups"]][[1]])## [1] 16 82
dim(t_cf_biopsy_sig_sva[["deseq"]][["downs"]][[1]])## [1] 11 82
Same question, but this time looking at monocytes. In addition, this comparison was done twice, once using SVA and once using visit as a batch factor.
I have been using this block to ensure that changed I have been making to the hpgltools do not change the analysis results. Thus the comment with a few logFC values; those are the first 6 observed DESeq2 logFC values in my last result before I made some changes to hpgltools in order to be able to work with random effect models.
t_cf_monocyte_de_sva <- all_pairwise(t_monocytes, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## The svs are added to the expressionset during all_pairwise.
t_monocytes <- t_cf_monocyte_de_sva[["input"]]
t_cf_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8606
## basic_vs_dream 0.9249
## basic_vs_ebseq 0.8504
## basic_vs_edger 0.8654
## basic_vs_limma 0.9288
## basic_vs_noiseq 0.9525
## deseq_vs_dream 0.8739
## deseq_vs_ebseq 0.8724
## deseq_vs_edger 0.9990
## deseq_vs_limma 0.8639
## deseq_vs_noiseq 0.9062
## dream_vs_ebseq 0.7940
## dream_vs_edger 0.8776
## dream_vs_limma 0.9905
## dream_vs_noiseq 0.8896
## ebseq_vs_edger 0.8727
## ebseq_vs_limma 0.7856
## ebseq_vs_noiseq 0.8937
## edger_vs_limma 0.8683
## edger_vs_noiseq 0.9102
## limma_vs_noiseq 0.8796
t_cf_monocyte_table_sva <- combine_de_tables(
t_cf_monocyte_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_sva-v{ver}.xlsx"))
t_cf_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 59 56 57
## edger_sigdown limma_sigup limma_sigdown
## 1 54 11 35
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
head(t_cf_monocyte_table_sva[["data"]][["outcome"]][["deseq_logfc"]])## [1] 0.33530 -0.06331 0.10740 -0.09094 -0.13910 0.23430
## The first few values in my pre-change result set are:
## 0.338, -0.072, 0.097, -0.091, -0.135, 0.233
t_cf_monocyte_sig_sva <- extract_significant_genes(
t_cf_monocyte_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 11 35 57 54 59 56 0
## ebseq_down basic_up basic_down
## outcome 23 339 0
dim(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][[1]])## [1] 59 82
dim(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][[1]])## [1] 56 82
t_cf_monocyte_de_batchvisit <- all_pairwise(t_monocytes, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 21 21
##
## v3 v2 v1
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_monocyte_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8539
## basic_vs_dream 0.9422
## basic_vs_ebseq 0.8504
## basic_vs_edger 0.8574
## basic_vs_limma 0.9519
## basic_vs_noiseq 0.9525
## deseq_vs_dream 0.8213
## deseq_vs_ebseq 0.9932
## deseq_vs_edger 0.9998
## deseq_vs_limma 0.8157
## deseq_vs_noiseq 0.8919
## dream_vs_ebseq 0.8054
## dream_vs_edger 0.8238
## dream_vs_limma 0.9819
## dream_vs_noiseq 0.9094
## ebseq_vs_edger 0.9935
## ebseq_vs_limma 0.7992
## ebseq_vs_noiseq 0.8937
## edger_vs_limma 0.8187
## edger_vs_noiseq 0.8947
## limma_vs_noiseq 0.9014
t_cf_monocyte_table_batchvisit <- combine_de_tables(
t_cf_monocyte_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_batchvisit-v{ver}.xlsx"))
t_cf_monocyte_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 43 93 47
## edger_sigdown limma_sigup limma_sigdown
## 1 105 6 13
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_monocyte_sig_batchvisit <- extract_significant_genes(
t_cf_monocyte_table_batchvisit,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_monocyte_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 6 13 47 105 43 93 0
## ebseq_down basic_up basic_down
## outcome 23 339 0
dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 43 82
dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 93 82
Now focus in on the monocyte samples on a per-visit basis.
t_cf_monocyte_v1_de_sva <- all_pairwise(tv1_monocytes, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 8 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv1_monocytes <- t_cf_monocyte_v1_de_sva[["input"]]
t_cf_monocyte_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8923
## basic_vs_dream 0.9319
## basic_vs_ebseq 0.9086
## basic_vs_edger 0.8924
## basic_vs_limma 0.9461
## basic_vs_noiseq 0.9491
## deseq_vs_dream 0.9007
## deseq_vs_ebseq 0.9025
## deseq_vs_edger 1.0000
## deseq_vs_limma 0.8929
## deseq_vs_noiseq 0.9168
## dream_vs_ebseq 0.8397
## dream_vs_edger 0.9006
## dream_vs_limma 0.9830
## dream_vs_noiseq 0.9003
## ebseq_vs_edger 0.9027
## ebseq_vs_limma 0.8319
## ebseq_vs_noiseq 0.9503
## edger_vs_limma 0.8930
## edger_vs_noiseq 0.9171
## limma_vs_noiseq 0.8861
t_cf_monocyte_v1_table_sva <- combine_de_tables(
t_cf_monocyte_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_table_sva-v{ver}.xlsx"))
t_cf_monocyte_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 14 52 15
## edger_sigdown limma_sigup limma_sigdown
## 1 58 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_monocyte_v1_sig_sva <- extract_significant_genes(
t_cf_monocyte_v1_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 15 58 14 52 0
## ebseq_down basic_up basic_down
## outcome 15 0 0
dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 14 82
dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 52 82
sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][[1]],
tbl2 = t_cf_monocyte_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.692803577805363 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 196, df = 10838, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8792 0.8875
## sample estimates:
## cor
## 0.8834
shared_ids <- rownames(t_cf_monocyte_table_sva[["data"]][[1]]) %in%
rownames(t_cf_monocyte_table_batchvisit[["data"]][[1]])
first <- t_cf_monocyte_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_monocyte_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 196, df = 10838, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8792 0.8875
## sample estimates:
## cor
## 0.8834
Switch context to the Neutrophils, once again repeat the analysis using SVA and visit as a batch factor.
t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 20 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8718
## basic_vs_dream 0.9181
## basic_vs_ebseq 0.8610
## basic_vs_edger 0.8770
## basic_vs_limma 0.9324
## basic_vs_noiseq 0.9466
## deseq_vs_dream 0.8863
## deseq_vs_ebseq 0.9071
## deseq_vs_edger 0.9996
## deseq_vs_limma 0.8783
## deseq_vs_noiseq 0.9345
## dream_vs_ebseq 0.8374
## dream_vs_edger 0.8899
## dream_vs_limma 0.9852
## dream_vs_noiseq 0.8928
## ebseq_vs_edger 0.9077
## ebseq_vs_limma 0.8465
## ebseq_vs_noiseq 0.9238
## edger_vs_limma 0.8818
## edger_vs_noiseq 0.9378
## limma_vs_noiseq 0.8974
t_cf_neutrophil_table_sva <- combine_de_tables(
t_cf_neutrophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_sva-v{ver}.xlsx"))
t_cf_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 127 30 110
## edger_sigdown limma_sigup limma_sigdown
## 1 27 14 11
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_sig_sva <- extract_significant_genes(
t_cf_neutrophil_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 14 11 110 27 127 30 7
## ebseq_down basic_up basic_down
## outcome 7 8 0
dim(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][[1]])## [1] 127 82
dim(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][[1]])## [1] 30 82
t_cf_neutrophil_de_batchvisit <- all_pairwise(t_neutrophils, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 20 21
##
## v3 v2 v1
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8668
## basic_vs_dream 0.9580
## basic_vs_ebseq 0.8610
## basic_vs_edger 0.8695
## basic_vs_limma 0.9665
## basic_vs_noiseq 0.9466
## deseq_vs_dream 0.8384
## deseq_vs_ebseq 0.9812
## deseq_vs_edger 0.9999
## deseq_vs_limma 0.8405
## deseq_vs_noiseq 0.9204
## dream_vs_ebseq 0.8294
## dream_vs_edger 0.8405
## dream_vs_limma 0.9839
## dream_vs_noiseq 0.9168
## ebseq_vs_edger 0.9817
## ebseq_vs_limma 0.8310
## ebseq_vs_noiseq 0.9238
## edger_vs_limma 0.8427
## edger_vs_noiseq 0.9225
## limma_vs_noiseq 0.9134
t_cf_neutrophil_table_batchvisit <- combine_de_tables(
t_cf_neutrophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_batchvisit-v{ver}.xlsx"))
t_cf_neutrophil_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 92 46 101
## edger_sigdown limma_sigup limma_sigdown
## 1 43 3 1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_sig_batchvisit <- extract_significant_genes(
t_cf_neutrophil_table_batchvisit,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_neutrophil_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 3 1 101 43 92 46 7
## ebseq_down basic_up basic_down
## outcome 7 8 0
dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 92 82
dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 46 82
When I did this with the monocytes, I split it up into multiple blocks for each visit. This time I am just going to run them all together.
visitcf_factor <- paste0(pData(t_neutrophils)[["visitnumber"]], "_",
pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
t_cf_neutrophil_visits_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_visits_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_neutrophil_visits_table_sva <- combine_de_tables(
t_cf_neutrophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))
t_cf_neutrophil_visits_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 12 6 5 6
## 2 v2_failure_vs_v2_cure 2 6 2 3
## 3 v3_failure_vs_v3_cure 2 2 0 2
## limma_sigup limma_sigdown
## 1 1 0
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_visits_sig_sva <- extract_significant_genes(
t_cf_neutrophil_visits_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_visits_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 0 5 6 12 6 0
## v2cf 0 0 2 3 2 6 1
## v3cf 0 0 0 2 2 2 2
## ebseq_down basic_up basic_down
## v1cf 2 0 0
## v2cf 1 0 0
## v3cf 3 0 0
dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["ups"]][[1]])## [1] 12 82
dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["downs"]][[1]])## [1] 6 82
Now visit 1.
t_cf_neutrophil_v1_de_sva <- all_pairwise(tv1_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 8 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8615
## basic_vs_dream 0.8836
## basic_vs_ebseq 0.8722
## basic_vs_edger 0.8779
## basic_vs_limma 0.8917
## basic_vs_noiseq 0.9340
## deseq_vs_dream 0.8246
## deseq_vs_ebseq 0.9451
## deseq_vs_edger 0.9946
## deseq_vs_limma 0.8208
## deseq_vs_noiseq 0.9226
## dream_vs_ebseq 0.7937
## dream_vs_edger 0.8402
## dream_vs_limma 0.9755
## dream_vs_noiseq 0.8441
## ebseq_vs_edger 0.9453
## ebseq_vs_limma 0.8007
## ebseq_vs_noiseq 0.9456
## edger_vs_limma 0.8347
## edger_vs_noiseq 0.9293
## limma_vs_noiseq 0.8390
t_cf_neutrophil_v1_table_sva <- combine_de_tables(
t_cf_neutrophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_table_sva-v{ver}.xlsx"))
t_cf_neutrophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 5 8 5
## edger_sigdown limma_sigup limma_sigdown
## 1 11 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v1_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v1_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 5 11 5 8 0
## ebseq_down basic_up basic_down
## outcome 2 0 0
dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 5 82
dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 8 82
Followed by visit 2.
t_cf_neutrophil_v2_de_sva <- all_pairwise(tv2_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 7 6
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.9055
## basic_vs_dream 0.9521
## basic_vs_ebseq 0.8989
## basic_vs_edger 0.9059
## basic_vs_limma 0.9491
## basic_vs_noiseq 0.9226
## deseq_vs_dream 0.9004
## deseq_vs_ebseq 0.9783
## deseq_vs_edger 0.9986
## deseq_vs_limma 0.8932
## deseq_vs_noiseq 0.9661
## dream_vs_ebseq 0.8761
## dream_vs_edger 0.8987
## dream_vs_limma 0.9939
## dream_vs_noiseq 0.9000
## ebseq_vs_edger 0.9760
## ebseq_vs_limma 0.8673
## ebseq_vs_noiseq 0.9768
## edger_vs_limma 0.8919
## edger_vs_noiseq 0.9642
## limma_vs_noiseq 0.8912
t_cf_neutrophil_v2_table_sva <- combine_de_tables(
t_cf_neutrophil_v2_de_sva, scale_p = TRUE, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_table_sva-v{ver}.xlsx"))
t_cf_neutrophil_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 9 5 20
## edger_sigdown limma_sigup limma_sigdown
## 1 6 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v2_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v2_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 20 6 9 5 1
## ebseq_down basic_up basic_down
## outcome 1 0 0
dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 9 82
dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 5 82
and visit 3.
t_cf_neutrophil_v3_de_sva <- all_pairwise(tv3_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.7570
## basic_vs_dream 0.7911
## basic_vs_ebseq 0.8744
## basic_vs_edger 0.7556
## basic_vs_limma 0.7991
## basic_vs_noiseq 0.9220
## deseq_vs_dream 0.8922
## deseq_vs_ebseq 0.7587
## deseq_vs_edger 0.9992
## deseq_vs_limma 0.8840
## deseq_vs_noiseq 0.8323
## dream_vs_ebseq 0.7694
## dream_vs_edger 0.8936
## dream_vs_limma 0.9846
## dream_vs_noiseq 0.7835
## ebseq_vs_edger 0.7635
## ebseq_vs_limma 0.7523
## ebseq_vs_noiseq 0.9518
## edger_vs_limma 0.8852
## edger_vs_noiseq 0.8340
## limma_vs_noiseq 0.7683
t_cf_neutrophil_v3_table_sva <- combine_de_tables(
t_cf_neutrophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_table_sva-v{ver}.xlsx"))
t_cf_neutrophil_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 5 1 6
## edger_sigdown limma_sigup limma_sigdown
## 1 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v3_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v3_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 6 1 5 1 2
## ebseq_down basic_up basic_down
## outcome 3 0 0
dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 5 82
dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 1 82
sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][[1]],
tbl2 = t_cf_neutrophil_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.67034417727487 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 210, df = 9086, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9070 0.9141
## sample estimates:
## cor
## 0.9106
shared_ids <- rownames(t_cf_neutrophil_table_sva[["data"]][[1]]) %in%
rownames(t_cf_neutrophil_table_batchvisit[["data"]][[1]])
first <- t_cf_neutrophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_neutrophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 210, df = 9086, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9070 0.9141
## sample estimates:
## cor
## 0.9106
This time, with feeling! Repeating the same set of tasks with the eosinophil samples.
t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 17 9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8356
## basic_vs_dream 0.8608
## basic_vs_ebseq 0.8645
## basic_vs_edger 0.8402
## basic_vs_limma 0.8797
## basic_vs_noiseq 0.9087
## deseq_vs_dream 0.9083
## deseq_vs_ebseq 0.8118
## deseq_vs_edger 0.9993
## deseq_vs_limma 0.8949
## deseq_vs_noiseq 0.8694
## dream_vs_ebseq 0.7897
## dream_vs_edger 0.9139
## dream_vs_limma 0.9833
## dream_vs_noiseq 0.8420
## ebseq_vs_edger 0.8161
## ebseq_vs_limma 0.7941
## ebseq_vs_noiseq 0.8991
## edger_vs_limma 0.9004
## edger_vs_noiseq 0.8740
## limma_vs_noiseq 0.8137
t_cf_eosinophil_table_sva <- combine_de_tables(
t_cf_eosinophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_sva-v{ver}.xlsx"))
t_cf_eosinophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 110 70 104
## edger_sigdown limma_sigup limma_sigdown
## 1 57 54 34
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_sig_sva <- extract_significant_genes(
t_cf_eosinophil_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 54 34 104 57 110 70 6
## ebseq_down basic_up basic_down
## outcome 33 10 0
dim(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]])## [1] 110 82
dim(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]])## [1] 70 82
head(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]])head(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]])Repeat with batch in the model.
t_cf_eosinophil_de_batchvisit <- all_pairwise(t_eosinophils, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 17 9
##
## v3 v2 v1
## 9 9 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8969
## basic_vs_dream 0.9184
## basic_vs_ebseq 0.8645
## basic_vs_edger 0.8985
## basic_vs_limma 0.9675
## basic_vs_noiseq 0.9087
## deseq_vs_dream 0.8489
## deseq_vs_ebseq 0.9517
## deseq_vs_edger 0.9998
## deseq_vs_limma 0.8680
## deseq_vs_noiseq 0.9020
## dream_vs_ebseq 0.8127
## dream_vs_edger 0.8513
## dream_vs_limma 0.9482
## dream_vs_noiseq 0.9275
## ebseq_vs_edger 0.9556
## ebseq_vs_limma 0.8333
## ebseq_vs_noiseq 0.8991
## edger_vs_limma 0.8699
## edger_vs_noiseq 0.9053
## limma_vs_noiseq 0.8801
t_cf_eosinophil_table_batchvisit <- combine_de_tables(
t_cf_eosinophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_batchvisit-v{ver}.xlsx"))
t_cf_eosinophil_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 99 35 103
## edger_sigdown limma_sigup limma_sigdown
## 1 24 35 15
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_sig_batchvisit <- extract_significant_genes(
t_cf_eosinophil_table_batchvisit,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_eosinophil_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 35 15 103 24 99 35 6
## ebseq_down basic_up basic_down
## outcome 33 10 0
dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 99 82
dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 35 82
head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]])head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]])Repeat with visit in the condition contrast.
visitcf_factor <- paste0(pData(t_eosinophils)[["visitnumber"]], "_",
pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
t_cf_eosinophil_visits_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_visits_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_eosinophil_visits_table_sva <- combine_de_tables(
t_cf_eosinophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))
t_cf_eosinophil_visits_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 8 10 4 4
## 2 v2_failure_vs_v2_cure 3 3 5 2
## 3 v3_failure_vs_v3_cure 12 7 13 2
## limma_sigup limma_sigdown
## 1 0 1
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_visits_sig_sva <- extract_significant_genes(
t_cf_eosinophil_visits_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_visits_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 0 1 4 4 8 10 4
## v2cf 0 0 5 2 3 3 11
## v3cf 0 0 13 2 12 7 3
## ebseq_down basic_up basic_down
## v1cf 84 0 0
## v2cf 18 0 0
## v3cf 10 0 0
dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]])## [1] 8 82
dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]])## [1] 10 82
head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]])head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]])As a reminder, there are a few genes of particular interest:
expected_genes <- c("IFI44L", "IFI27", "PRR5", "PRR5-ARHGAP8", "RHCE",
"FBXO39", "RSAD2", "SMTNL1", "USP18", "AFAP1")
annot <- fData(t_monocytes)
wanted_idx <- annot[["hgnc_symbol"]] %in% expected_genes
expected_ensg <- rownames(annot)[wanted_idx]Either above or below this section I have a nearly identical block which seeks to demonstrate the similarities/difference observed between my preferred/simplified model vs. a more explicitly correct and complex model. If the trend holds from what we observed with the eosinophils and neutrophils, I would expect to see that the results are marginally ‘better’ (as defined by the strength of the perceived interleukin response and raw number of ‘significant’ genes); but I remain worried that this will prove a more brittle and error-prone analysis.
Start out by extracting the perceived svs via svaseq on the filtered input.
Note to self: the model work I did in hpgltools makes this irrelevant; by replacing the foolish model_connd/model_batch arguments with fstring and model_sv I can do this trivially. In addition, it allows me to mix and match sv methods and compare them using the same data structure because it returns the expressionset with new columns named stuff like ‘svaseq_sv1’…
## The original pairwise invocation with sva:
##t_cf_monocyte_de_sva <- all_pairwise(t_monocyte, model_batch = "svaseq",
## filter = TRUE, parallel = FALSE,
## methods = methods)
test_monocytes <- normalize_expt(t_monocytes, filter = "simple")## Removing 0 low-count genes (10840 remaining).
test_mono_design <- pData(test_monocytes)
test_formula <- as.formula("~ finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_mono_design)
null_formula <- as.formula("~ visitnumber")
null_model <- model.matrix(null_formula, data = test_mono_design)
linear_mtrx <- exprs(test_monocytes)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 2
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 2
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])One neat relatively recent change I made: if one invokes normalize_expt() with the various batch arguments, it will add the new SVs to the example metadata with a prefix by the name of the surrogate method, e.g. ‘svaseq_sv1’, etc…
We can now create a new DESeq2 dataset which takes these putative surrogates into account.
colnames(model_adjust) <- paste0("SV", seq_len(chosen_surrogates))
rownames(model_adjust) <- rownames(pData(test_monocytes))
addition_string <- ""
for (sv in colnames(model_adjust)) {
addition_string <- paste0(addition_string, " + ", sv)
}
longer_model <- as.formula(glue("~ finaloutcome + visitnumber{addition_string}"))
mono_design_svs <- cbind(test_mono_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = mono_design_svs,
design = longer_model)## converting counts to integer mode
In order to compare these and the previous results, I tend to rely on simple correlations and aucc plots. I have been reading the modelr code recently and it looks like there is a suite of other metrics which might be more appropriate – with the caveat that most of the modelr (now yardstick) stuff is intended for lm() and predict() tasks and/or ML.
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 1097, df = 10838, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9954 0.9957
## sample estimates:
## cor
## 0.9955
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")],
add_cor = TRUE, add_rsq = TRUE, identity = TRUE,
add_equation = TRUE)
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_monocyte_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 1.3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9941
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10781 59
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 5 56 54
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 110
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 9 MF
## 151 BP
## 21 CC
## 0 KEGG
## 0 REAC
## 2 WP
## 2 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 59
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 0 MF
## 43 BP
## 3 CC
## 3 KEGG
## 0 REAC
## 2 WP
## 1 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_monocytes), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 10843 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10843 column: 99.
Are the expected Ensembl gene IDs found in this new set?
sum(new_genes %in% expected_ensg)## [1] 10
We wish to ensure that my model simplification did not do anything incorrect to the data for all three cell types, I already did this for the neutrophils, let us repeat for the eosinophils. I am therefore (mostly) copy/pasting the neutrophil section here.
## The original pairwise invocation with sva:
#t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
# filter = TRUE, parallel=FALSE, methods = methods)
test_eosinophils <- normalize_expt(t_eosinophils, filter = "simple")## Removing 2618 low-count genes (17240 remaining).
test_eo_design <- pData(test_eosinophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_eo_design)
null_formula <- as.formula("~ 0 + visitnumber")
null_model <- model.matrix(null_formula, data = test_eo_design)
linear_mtrx <- exprs(test_eosinophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 3
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 3
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])
colnames(model_adjust) <- c("SV1", "SV2", "SV3")
rownames(model_adjust) <- rownames(pData(test_eosinophils))
longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3")
eo_design_svs <- cbind(test_eo_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = eo_design_svs,
design = longer_model)## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 194, df = 10516, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8798 0.8881
## sample estimates:
## cor
## 0.884
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_eosinophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 2.8e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.8565
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_eosinophil_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10408 110
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 38 197 72
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 269
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 14 MF
## 182 BP
## 34 CC
## 4 KEGG
## 5 REAC
## 7 WP
## 10 TF
## 1 MIRNA
## 1 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 110
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 21 MF
## 111 BP
## 14 CC
## 3 KEGG
## 7 REAC
## 4 WP
## 72 TF
## 1 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_eosinophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17243 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10521 column: 99.
Check our genes of particular interest
sum(new_genes %in% expected_ensg)## [1] 6
Not quite as similar as the monocyte data.
## The original pairwise invocation with sva:
## t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
## parallel = parallel, filter = TRUE,
## methods = methods)
test_neutrophils <- normalize_expt(t_neutrophils, filter = "simple")## Removing 2614 low-count genes (17244 remaining).
test_neut_design <- pData(test_neutrophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_neut_design)
## Note to self: double-check that the following line is correct.
null_formula <- as.formula("~ 0 + visitnumber")
## null_model <- test_model[, c(1, 2)]
null_model <- model.matrix(null_formula, data = test_neut_design)
linear_mtrx <- exprs(test_neutrophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 4
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 4
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])
## I don't think the following is actually required, but it is weird to just have this
## unnamed matrix hanging out.
## Set the columns to the SV#s
colnames(model_adjust) <- c("SV1", "SV2", "SV3", "SV4")
## Set the rows the sample IDs
rownames(model_adjust) <- rownames(pData(test_neutrophils))
longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3 + SV4")
neut_design_svs <- cbind(test_neut_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = neut_design_svs,
design = longer_model)## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
## We should be able to directly compare this to the the deseq columns from the above
## data structure named: t_cf_neutrophil_table_sva
big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 400, df = 9086, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9716 0.9738
## sample estimates:
## cor
## 0.9728
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_neutrophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 6.9e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9448
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_neutrophil_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 8961 127
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 47 97 80
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 177
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 1 MF
## 15 BP
## 6 CC
## 0 KEGG
## 2 REAC
## 0 WP
## 9 TF
## 2 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 127
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 4 MF
## 56 BP
## 4 CC
## 0 KEGG
## 4 REAC
## 2 WP
## 56 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_neutrophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17247 column: 23.
old_annotated <- merge(fData(t_neutrophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 9091 column: 99.
Once again, see how many of our favorite genes are here
sum(new_genes %in% expected_ensg)## [1] 8
The above makes one potential significant error: I did not consider the variance of each person in the contrasts above and thus potentially over-represented the significance/power of the results. This is because these models do not include the donor. My previous understanding was that it is sufficient to include visit in the model because that would result in a model matrix which separates samples from each person. This was wrong. Instead, one should make use of models which are able to handle multiple samples across one factor (donor in this case); this is done by using linear mixed models. These models allow one to specify a separate slope, y-intercept, or both, for the samples which share an important factor (again, donor).
Therefore, the previous couple of blocks I now think are not approaching this problem correctly. We spent some time talking with Neal and discussing the various models and methods we employed. He made a series of suggestions about ways which might prove more correct. Eventually he brought my attention to some reviews which explain mixed linear models and I quickly became a convert for this type of query. I think I can reasonably easily perform these with limma, via voom. Let us try and see what happens.
After doing some reading, there is an easier way to do this: use dream() from varianceParition, which is cool because I really like variancePartition it. Somehow I managed to use it all this time and gloss over the import of the models it exposes to the user.
As I write this, we are reasonably certain that a mixed linear model provides a statistically correct framework for representing our expression data as a function of finaloutcome, visit, and person, e.g:
exprs ~ finaloutcome + visit + (1|donor)
In our discussions surrounding the various ways to compare/contrast the various results with/out the mixed linear model; there were a few primary goals laid out by Maria Adelaida and Neal. The goal is to observe if/how well our previous analyses agree with results obtained using a mixed linear model. There are a couple of caveats:
So, with that in mind, Maria Adelaida, Najib, and Neal focused on repeating a useful subset of the analyses using the lmm and comparing them to our extant results rather than re-implementing everything. The following are the things they suggested are the most important comparison points:
A random question if anyone ever reads this: how does one evaluate if/when to allow the intercept, slope, or both to deviate across factors? In my reading I saw examples showing how to do these, but it was never clear why.
I have already written a skeleton function ‘dream_pairwise()’ as a sibling to my other *_pairwise() functions. I think that with some minor modifications (or maybe none at all, when I wrote it I was thinking about fun models that variancePartition supports) it can accept the mixed linear model of interest.
In the following block, the mixed formula will get passed to dream. I set the code to use the first element (after the intercept) as the ‘condition’ factor. Thus if I had made the model ‘~ 0 + visitnumber + finaloutcome + (1|donor)’, it would compare visits.
The dream_pairwise() function is responsible for making sure the variancePartition replacement functions are used for things like voom, lmfit, ebayes, and toptable. Strangely, some of them will automatically fall back to limma’s functions if there is no random-effect in the model, but others will not. As a result, I have a check to see if there is a mixed effect and invoke the appropriate functions explicitly in dream_pairwise().
I have a suspicion that there are still times when one should use the non-mixed version of eBayes() and topTable() with lmms; but I cannot see when that would be. If I figure that out I will add options to dream_pairwise() to allow the user to specify which is desired.
mixed_fstring <- "~ 0 + finaloutcome + visitnumber + (1|donor)"
mixed_form <- as.formula(mixed_fstring)
get_formula_factors(mixed_form)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] TRUE
##
## $mixers
## [1] "1" "donor"
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "visitnumber" "donor"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_xlsx <- write_de_table(
mixed_eosinophil_de, type = "limma",
excel = glue("excel/mixed_eosinophil_table-v{ver}.xlsx"))
mixed_monocyte_de <- dream_pairwise(t_monocytes, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_xlsx <- write_de_table(
mixed_monocyte_de, type = "limma",
excel = glue("excel/mixed_monocyte_table-v{ver}.xlsx"))
mixed_neutrophil_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_xlsx <- write_de_table(
mixed_neutrophil_de, type = "limma",
excel = glue("excel/mixed_neutrophil_table-v{ver}.xlsx"))In other words, the following invocations will go much faster and likely be nearly (or completely) identical to the results from limma using the same model since the ‘mixed_fstring_fv’ does not have a random effect.
mixed_fstring_fv <- "~ 0 + finaloutcome + visitnumber"
mixed_form_fv <- as.formula(mixed_fstring_fv)
get_formula_factors(mixed_form_fv)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] FALSE
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "visitnumber"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fv_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_nodonor_xlsx <- write_de_table(
mixed_eosinophil_fv_de, type = "limma",
excel = glue("excel/mixed_eosinophil_nodonor_table-v{ver}.xlsx"))
mixed_monocyte_fv_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_nodonor_xlsx <- write_de_table(mixed_monocyte_fv_de, type = "limma",
excel = glue("excel/mixed_monocyte_nodonor_table-v{ver}.xlsx"))
mixed_neutrophil_fv_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_nodonor_xlsx <- write_de_table(mixed_neutrophil_fv_de, type = "limma",
excel = glue("excel/mixed_neutrophil_nodonor_table-v{ver}.xlsx"))There remain a few differences, I am close to certain that these arise because I left in dream’s default values for parameters which are different than those provided by the limma authors.
There are a couple observations here which are important and/or troubling:
Thus, on the one hand we have support for the hypothesis that our methods were wrong and we are too lax when defining ‘significant’ genes. On the other hand, we have the examples from the dream publication which explicitly do not use the more ‘traditional’ or ‘strict’ definition of significance…
I suppose we will never know which is correct until/unless we are able to test in future patients who cure/fail treatment their relative levels of the observed genes. It seriously bothers me that this was such a blind spot in my understanding of these methods, though.
Najib asked if I would compare the set of overlapping genes observed with the various significance metrics provided. I think I should write a little function to do this because there are ample opportunities for [sic]typeographicl errors.
deseq_df <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
deseq_gene_idx <- abs(deseq_df[["deseq_logfc"]]) >= 1.0 &
deseq_df[["deseq_adjp"]] <= 0.05
deseq_symb <- annot[deseq_gene_idx, "hgnc_symbol"]
deseq_symb## [1] "SYN1" "TENM1" "LTF" "PHLDB1"
## [5] "SLAMF7" "SCML1" "BCAR1" "BCAT1"
## [9] "TRIP13" "SLC12A1" "ADAMTS2" "GP6"
## [13] "SIGLEC1" "SIRPG" "CHKB" "IL2RB"
## [17] "CTSG" "PLEK2" "NTSR1" "MSLN"
## [21] "FZD3" "TULP2" "HAS1" "GSDME"
## [25] "SERPINE1" "PRUNE2" "PALD1" "UNC5B"
## [29] "CCL8" "FOLR1" "RAD51AP1" "PRLR"
## [33] "OTOF" "IL1R2" "IL1R1" "CD274"
## [37] "PRB2" "MAPK8IP1" "CXCR4" "SNAI1"
## [41] "IL1B" "LAMP5" "A4GALT" "MTUS1"
## [45] "IDO1" "TMTC1" "RSAD2" "HRK"
## [49] "IL6" "THBS1" "IFI44L" "ADAMTS10"
## [53] "GPR174" "LCN2" "TENM4" "PGM5"
## [57] "TBC1D24" "TRIM58" "HESX1" "CAMP"
## [61] "SAP30" "CFAP47" "AQP3" "HECTD2"
## [65] "IFI27" "C15orf48" "LAIR2" "ANGPTL4"
## [69] "RAB3IL1" "DDIT4" "KIF5C" "COL3A1"
## [73] "RNF150" "HTRA3" "S1PR1" "LGALS4"
## [77] "OLR1" "JUP" "HOXB2" "SH3PXD2B"
## [81] "FBXW8" "FBXO39" "THBD" "HLA-DQB1"
## [85] "OR6C2" "CSMD1" "EFHC2" "OLFML1"
## [89] "USP18" "RGPD2" "PRR5" "RHCE"
## [93] "AKR1C3" "AFAP1" "MMP1" "HLA-DQA1"
## [97] "SCAMP5" "SUCNR1" "OOEP" "GLYATL3"
## [101] "HLA-DMA" "NT5M" "POU5F1B" "SMTNL1"
## [105] "HLA-DQB2" "DEFA3" "UPK3B" "PRR5-ARHGAP8"
## [109] "TRNP1" "MGAM" "RNASE4" "MRC1"
## [113] "LINC02210-CRHR1" "FCGBP" "CCL3"
deseq_genes <- rownames(annot)[deseq_gene_idx]
overlap_sig <- function(mixed, deseq = deseq_genes, mixed_pcol = "P.Value",
annot = fData(t_monocytes), mixed_cutoff = 0.05, direction = "lt",
expected = expected_genes) {
if (direction == "lt") {
mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
mixed[[mixed_pcol]] <= mixed_cutoff
} else {
mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
mixed[[mixed_pcol]] >= mixed_cutoff
}
mixed_genes <- rownames(mixed)[mixed_sig_idx]
venn_lst <- list(
"mixed_model" = mixed_genes,
"DESeq_sva" = deseq)
mixed_deseq_comp <- Vennerable::Venn(venn_lst)
Vennerable::plot(mixed_deseq_comp)
mixed_ensg <- mixed_deseq_comp@IntersectionSets[["11"]]
overlap_genes <- annot[mixed_ensg, "hgnc_symbol"]
message("The set of all overlapping genes:")
print(overlap_genes)
found_idx <- expected %in% overlap_genes
message("Overlapping genes in the 10 favorites:")
print(expected[found_idx])
}In this block I am looking at the similarities between the mixed model with donor and without donor (which is no longer a mixed model; it is just using the dream functions). Keep in mind that dream falls back to the limma functions (with some slightly different parameters) when there is no random effect in the model. Therefore the differences observed here ought to give us a good sense of how much (if at all) these different invocations of limma change our view of ‘significant’. Ideally, they are identical, but I am no longer so foolish as to think that likely.
monocyte_visit_with_donor <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]
monocyte_visit_without_donor <- mixed_monocyte_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(monocyte_visit_with_donor, monocyte_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.665366367949394 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 388, df = 10838, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9645 0.9670
## sample estimates:
## cor
## 0.9658
with_donor_genes <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(monocyte_visit_with_donor)[with_donor_genes]
donor_z_idx <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["z.std"]] >= 1.0
donor_z_genes <- rownames(monocyte_visit_with_donor)[donor_z_idx]
overlap_sig(monocyte_visit_with_donor)## The set of all overlapping genes:
## [1] "DEFA3" "CTSG" "HRK" "A4GALT"
## [5] "TENM1" "PLEK2" "TMTC1" "HLA-DQB2"
## [9] "IDO1" "FBXO39" "PRR5-ARHGAP8" "TRIM58"
## [13] "CCL3" "IL6" "LAMP5" "CD274"
## [17] "NTSR1" "ANGPTL4" "DDIT4" "EFHC2"
## [21] "SH3PXD2B" "HECTD2" "MAPK8IP1" "SMTNL1"
## [25] "PGM5" "OLFML1" "TBC1D24" "ADAMTS10"
## [29] "LINC02210-CRHR1" "FZD3" "CHKB" "CXCR4"
## [33] "GPR174" "PRLR" "OOEP" "NT5M"
## [37] "IL1R1" "CAMP" "FBXW8"
## Overlapping genes in the 10 favorites:
## [1] "PRR5-ARHGAP8" "FBXO39" "SMTNL1"
overlap_sig(monocyte_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "POU5F1B" "HRK" "COL3A1" "RGPD2"
## [5] "RNF150" "HLA-DQB2" "IDO1" "FBXO39"
## [9] "PRR5-ARHGAP8" "CCL3" "SUCNR1" "PRR5"
## [13] "IL6" "CD274" "UPK3B" "EFHC2"
## [17] "CSMD1" "HECTD2" "MAPK8IP1" "SMTNL1"
## [21] "PGM5" "OLFML1" "TBC1D24" "LINC02210-CRHR1"
## [25] "FZD3" "CHKB" "IFI44L" "GPR174"
## [29] "PRLR" "OOEP" "HLA-DMA" "PRB2"
## [33] "FBXW8"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "FBXO39" "SMTNL1"
I would have sworn that the 2.0 z-score set was much larger than the p-value set and included all of the 10 genes. Apparently I was wrong.
Now examine the various models for the neutrophil samples.
neutrophil_visit_with_donor <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]
neutrophil_visit_without_donor <- mixed_neutrophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(neutrophil_visit_with_donor, neutrophil_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.541757070505111 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 743, df = 19856, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.982 0.983
## sample estimates:
## cor
## 0.9825
with_donor_genes <- abs(neutrophil_visit_with_donor[["logFC"]]) >= 1.0 &
neutrophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(neutrophil_visit_without_donor[["logFC"]]) >= 1.0 &
neutrophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(neutrophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(neutrophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
"with_donor" = donor_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 0 14 212
overlap_sig(neutrophil_visit_with_donor)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "TENM1" "AFAP1" "PRR5" "BCAT1"
## [6] "HRK" "IFI44L" "CHKB" "FBXW8" "IDO1"
## [11] "SMTNL1" "POU5F1B" "OLR1" "AKR1C3"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "SMTNL1" "AFAP1"
overlap_sig(neutrophil_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "PRR5" "OTOF" "SIGLEC1" "HRK"
## [6] "IFI44L" "USP18" "JUP" "RSAD2" "CHKB"
## [11] "FBXW8" "FBXO39" "TBC1D24" "IDO1" "SMTNL1"
## [16] "POU5F1B"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "FBXO39" "RSAD2"
## [6] "SMTNL1" "USP18"
Finally, compare for the eosinophil samples.
eosinophil_visit_with_donor <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]
eosinophil_visit_without_donor <- mixed_eosinophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(eosinophil_visit_with_donor, eosinophil_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.902187988288427 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 744, df = 19856, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.982 0.983
## sample estimates:
## cor
## 0.9825
with_donor_genes <- abs(eosinophil_visit_with_donor[["logFC"]]) >= 1.0 &
eosinophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(eosinophil_visit_without_donor[["logFC"]]) >= 1.0 &
eosinophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(eosinophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(eosinophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
"with_donor" = donor_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 0 25 3672
overlap_sig(eosinophil_visit_with_donor)## The set of all overlapping genes:
## [1] "HLA-DQB1" "IFI27" "SIGLEC1" "CFAP47" "THBD" "OOEP"
## [7] "SMTNL1" "MGAM" "RNASE4" "CD274" "MSLN"
## Overlapping genes in the 10 favorites:
## [1] "IFI27" "SMTNL1"
overlap_sig(eosinophil_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "IFI27" "PRR5-ARHGAP8" "IFI44L" "SIGLEC1" "OTOF"
## [6] "CFAP47" "FBXO39" "THBD" "OOEP" "SMTNL1"
## [11] "USP18" "RSAD2" "CD274" "HESX1"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "IFI27" "PRR5-ARHGAP8" "FBXO39" "RSAD2"
## [6] "SMTNL1" "USP18"
Compare back to deseq with SVA and with SVA+visit and see how they look with respect to the dream invocation without the random donor effect.
deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
px = "deseq_adjp", py = "P.Value",
lx = "deseq_logfc", ly = "logFC")
deseq_aucc## These two tables have an aucc value of: 0.164465237732564 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 40, df = 8565, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3826 0.4182
## sample estimates:
## cor
## 0.4006
deseq_genes_idx <- abs(merged[["deseq_logfc"]]) >= 1.0 &
merged[["deseq_adjp"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
"with_donor" = deseq_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 148 44 9
This time we are comparing back to the monocyte results which did not include the random donor effect.
deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
px = "log2FoldChange", py = "padj",
lx = "adj.P.Val", ly = "logFC")
deseq_aucc## These two tables have an aucc value of: 0.356334094947956 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = -32, df = 8565, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3476 -0.3099
## sample estimates:
## cor
## -0.3289
deseq_genes_idx <- abs(merged[["log2FoldChange"]]) >= 1.0 &
merged[["padj"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
"with_donor" = deseq_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 108 45 8
This is the orthologous approach: include a random effect for donor and ignore the visit effect.
mixed_fstring_fd <- "~ 0 + finaloutcome + (1|donor)"
mixed_form_fd <- as.formula(mixed_fstring_fd)
get_formula_factors(mixed_form_fd)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] TRUE
##
## $mixers
## [1] "1" "donor"
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "donor"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fd_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_fd_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_fd_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
Now see how these results compare against our previous results…
monocyte_dream_result <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, monocyte_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 193, df = 10838, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8761 0.8845
## sample estimates:
## cor
## 0.8804
t_cf_monocyte_de_sva[["dream"]] <- mixed_monocyte_de
test <- combine_de_tables(
t_cf_monocyte_de_sva, scale_p = TRUE,
excel = "excel/test_monocyte_combined.xlsx")
test_aucc <- calculate_aucc(merged,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
test_aucc[["plot"]]test_aucc[["scatter"]][["scatter"]]logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_full_dream_monocyte_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10781 59
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 10791 49
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
Vennerable::plot(compare)annot[name_idx, ]neutrophil_dream_result <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, neutrophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 174, df = 9086, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8722 0.8817
## sample estimates:
## cor
## 0.877
t_cf_neutrophil_de_sva[["dream"]] <- mixed_neutrophil_de
test <- combine_de_tables(
t_cf_neutrophil_de_sva, scale_p = TRUE,
excel = "excel/test_neutrophil_combined.xlsx")
test_aucc <- calculate_aucc(merged,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
test_aucc[["plot"]]test_aucc[["scatter"]][["scatter"]]logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_full_dream_neutrophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 8961 127
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 9012 76
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_neutrophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]Vennerable::plot(compare)eosinophil_dream_result <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, eosinophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 175, df = 10516, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8575 0.8673
## sample estimates:
## cor
## 0.8624
t_cf_eosinophil_de_sva[["dream"]] <- mixed_eosinophil_de
test <- combine_de_tables(
t_cf_eosinophil_de_sva, scale_p = TRUE,
excel = "excel/test_eosinophil_combined.xlsx")
test_aucc <- calculate_aucc(merged,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
test_aucc[["plot"]]test_aucc[["scatter"]][["scatter"]]logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_full_dream_eosinophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10408 110
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 10453 65
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_eosinophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]Vennerable::plot(compare)Now that I have performed all of the above, I think it should be possible to have a working analysis using dream that includes celltype, visitnumber, finaloutcome, donor, and perhaps SVs.
mixed_fstring <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor)"
mixed_formula <- as.formula(mixed_fstring)
mixed_fstring_svs <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor) + svaseq_SV1 + svaseq_SV2 + svaseq_SV3 + svaseq_SV4"
mixed_formula_svs <- as.formula(mixed_fstring_svs)
all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_all_celltypes_de_xlsx <- write_de_table(
all_dream_de, type = "limma",
excel = glue("excel/mixed_all_celltypes_nobiop_table-v{ver}.xlsx"))
all_dream_result <- all_dream_de[["all_tables"]][["failure_vs_cure"]] %>%
arrange(desc(logFC))
fc_sig_idx <- all_dream_result[["logFC"]] >= 1.0 & all_dream_result[["z.std"]] >= 2.0
dream_sig <- rownames(all_dream_result[fc_sig_idx, ])
svs_all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula_svs)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
test <- hpgl_padjust(svs_all_dream_de[["all_tables"]][["failure_vs_cure"]],
mean_column = "AveExpr", pvalue_column = "P.Value",
method = "ihw", type = "limma")t_clinical_outcomecell_fact <- paste0(pData(t_clinical_nobiop)[["finaloutcome"]], "_",
pData(t_clinical_nobiop)[["typeofcells"]])
t_clinical_outcomecell <- t_clinical_nobiop
pData(t_clinical_outcomecell)[["outcomecell"]] <- t_clinical_outcomecell_fact
t_clinical_outcomecell <- set_expt_conditions(t_clinical_outcomecell, fact = "outcomecell")## The numbers of samples by condition are:
##
## cure_eosinophils cure_monocytes cure_neutrophils failure_eosinophils
## 17 21 20 9
## failure_monocytes failure_neutrophils
## 21 21
t_clinical_outcomecell_de <- all_pairwise(t_clinical_outcomecell, keepers = outcometype_contrasts,
model_batch = "svaseq")##
## cure_eosinophils cure_monocytes cure_neutrophils failure_eosinophils
## 17 21 20 9
## failure_monocytes failure_neutrophils
## 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
mixed_fstring <- "~ 0 + condition + visitnumber + (1|donor)"
t_clinical_outcomecell_dream <- dream_pairwise(t_clinical_outcomecell,
alt_model = as.formula(mixed_fstring),
keepers = outcometype_contrasts)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
t_clinical_outcomecell_table <- write_de_table(t_clinical_outcomecell_dream,
type = "limma",
excel = glue("excel/mixed_clinical_outcomecell-v{ver}.xlsx"))big_table <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, all_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 118, df = 11883, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7275 0.7440
## sample estimates:
## cor
## 0.7359
test_aucc <- calculate_aucc(merged,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
test_aucc[["plot"]]test_aucc[["scatter"]][["scatter"]]logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "images/compare_cf_and_dream_clinical_samples.png")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["P.Value"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["P.Value"]], merged[["deseq_adjp"]], :
## Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["P.Value"]] and merged[["deseq_adjp"]]
## S = 1.8e+11, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.3602
adjp_plotter <- plot_linear_scatter(merged[, c("P.Value", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Dream not-adjusted p-value") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 11760 125
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 11844 41
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]Let us use the overlap_sig() from above to see how similar this result is to our DESeq2+SVA.
all_dream_table <- all_dream_de[["all_tables"]][["failure_vs_cure"]]
overlap_sig(all_dream_table)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "TENM1" "PRR5" "HRK" NA
## [6] "IFI44L" "SMTNL1" "ECHDC3" "DDX60" "HERC5"
## [11] "EXOC3L1" "FBXW8" "BCAT1"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "SMTNL1"
overlap_sig(all_dream_table, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "PRR5" "HRK" NA "IFI44L"
## [6] "IFI44" "SMTNL1" "RSAD2" "LY6E" "OAS2"
## [11] "DDX60" "HERC5" "ISG15" "EXOC3L1" "FBXW8"
## [16] "DLC1" "SPATS2L"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "RSAD2" "SMTNL1"
all_dream_table_svs <- svs_all_dream_de[["all_tables"]][["failure_vs_cure"]]
overlap_sig(all_dream_table_svs)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "TENM1" "PRR5" "SMTNL1" NA
## [6] NA "ECHDC3" "ISG15" "HERC5" "EXOC3L1"
## [11] "DDX60"
## Overlapping genes in the 10 favorites:
## [1] "PRR5" "PRR5-ARHGAP8" "SMTNL1"
overlap_sig(all_dream_table_svs, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "PRR5" NA "IFI44L" "RSAD2"
## [6] "IFI44" "LY6E" "SMTNL1" "HRK" "ISG15"
## [11] "USP18" "HERC5" "SIGLEC1" "DLC1" "OAS2"
## [16] "EXOC3L1" "SPATS2L" "DDX60"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "RSAD2" "SMTNL1"
## [6] "USP18"
One figure I did not create is a venn diagram showing the overlap of the eosionphil, neutrophil, and monocyte results and the 10 genes shared among them all. At least in theory I should be easily able to create a similar/identical plot.
observed_eosinophils <- c(
rownames(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_monocytes <- c(
rownames(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_neutrophils <- c(
rownames(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
venn_input <- list(
"eosinophil" = observed_eosinophils,
"monocyte" = observed_monocytes,
"neutrophils" = observed_neutrophils)
shared <- Vennerable::Venn(venn_input)
shared## A Venn object on 3 sets named
## eosinophil,monocyte,neutrophils
## 000 100 010 110 001 101 011 111
## 0 128 85 8 103 32 10 12
Vennerable::plot(shared)intersect <- "eosinophil:monocyte:neutrophils"
celltype_upset <- UpSetR::upset(UpSetR::fromList(venn_input), text.scale = 2)
celltype_upsetcelltype_shared_genes <- overlap_groups(venn_input)
celltype_geneids <- overlap_geneids(celltype_shared_genes, intersect)
ids <- attr(celltype_shared_genes, "elements")[celltype_shared_genes[[intersect]]]
ids## [1] "ENSG00000115155" "ENSG00000137959" "ENSG00000089012" "ENSG00000165949"
## [5] "ENSG00000186654" "ENSG00000188672" "ENSG00000177294" "ENSG00000134321"
## [9] "ENSG00000214872" "ENSG00000184979" "ENSG00000179344" "ENSG00000196526"
rows <- fData(t_monocytes)[ids, ]
rows[["hgnc_symbol"]]## [1] "OTOF" "IFI44L" "SIRPG" "IFI27" "PRR5" "RHCE"
## [7] "FBXO39" "RSAD2" "SMTNL1" "USP18" "HLA-DQB1" "AFAP1"
Note to self, when I rendered the html, stupid R ran out of temp files and so did not actually print the darn html document, as a result I modified the render function to try to make sure there is a clean directory in which to work; testing now. If it continues to not work, I will need to remove some of the images created in this document.
Also, what the heck, this is complete nonsense; there should be billions of available names when running tempfile(), why does it crap out with “cannot find unused tempfile name” after a piddling ~ 150 images!?!? I spent hours trying to figure this out and finally got mad and replaced the tempfile() invocations with my own tempfile() that uses a md5 sum of some combination of the hostname and current time or some such foolishness. For a while I thought it was because I was using doParallel() for some tasks, but I removed that and am still getting this stupid error. The other obvious solutions would be some nfs shenanigans and/or running out of disk space; but this is being done on (and TMPDIR is) a local filesystem with something like 12Tb free.
Maria Adelaida has asked about the distribution of (non)adjusted p-values produced by the various methods we employed. I use BH by default; so lets take a moment to examine the distribution of p-values and how they get adjusted by BH and a few of the other methods.
dream_pvalues <- all_dream_table[["P.Value"]]
names(dream_pvalues) <- rownames(all_dream_table)
deseq_pvalues <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]][["deseq_p"]]
names(deseq_pvalues) <- rownames(t_cf_clinicalnb_table_sva[["data"]][["outcome"]])
## Note, my xlsx files provide these images.
plot_histogram(dream_pvalues)plot_histogram(deseq_pvalues)Immediately we see that the values produced have very different distributions and that, though there are many low p-values produced by dream, they are far fewer than observed by deseq.
Now consider the BH correction; using it, we rank order the p-values from lowest to highest. Then we choose a denominator for every p-value which ranges from 1 to the number of elements in the set of p-values. Finally we take the minimum between 1 and the cumulative minimum of (#pvalues/denominator) * that-pvalue. Written out the process looks like this:
test_pvalues <- deseq_pvalues
idx <- order(test_pvalues)
test_pvalues <- test_pvalues[idx]
num_pvalues <- length(test_pvalues)
new_pvalues <- test_pvalues
for (i in seq_along(test_pvalues)) {
element <- test_pvalues[i]
new_pvalues[i] <- min(1, cummin((num_pvalues / i) * element))
}
test_against <- p.adjust(test_pvalues, method = "BH")So, consider for a moment the first p-values produced by deseq: 1.195e-24, 3.489e-22, 9.612e-22, 4.853e-18, 9.864e-15, 3.275e-14
The new p-values will be the (number of genes / the current position) * the current element
In contrast, consider the first few values from dream ordered in the same fashion: 2.162e-07, 3.757e-05, 8.119e-05, 1.664e-04, 3.123e-04, 5.600e-04
These start at values which are 1e17 higher than those from DESeq and so we can expect the resulting values to end up starting at ~ 5e11 higher than similar values. Thus when we do the math (and be amused at the fact that the number of p-values in the table is a factor of 2,3,4,5,6):
11910 * 2.16e-07: 0.002573 5955 * 3.757e-5: 0.223711 3970 * 8.119e-5: 0.322297 2978 * 1.664e-4: 0.4955 2382 * 3.123e-4: 0.743836 1985 * 5.600e-4: 1.112 which is caught by pmin() and reset to 1.
Having performed all of the above, let us plot some of the results with a few labels of the top-10 genes on each side of the contrasts.
num_color <- color_choices[["clinic_cf"]][["tumaco_failure"]]
den_color <- color_choices[["clinic_cf"]][["tumaco_cure"]]
cf_monocyte_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
cf_monocyte_volcano <- plot_volcano_condition_de(
cf_monocyte_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_monocyte_volcano_labeled.svg")
cf_monocyte_volcano[["plot"]]
dev.off()## png
## 2
cf_monocyte_volcano[["plot"]]cf_monocyte_volcano_top10 <- plot_volcano_condition_de(
cf_monocyte_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_monocyte_volcano_labeled_top10-v{ver}.svg"))
cf_monocyte_volcano_top10[["plot"]]
## hahahaha the last time I edited here I was super tired and left behind
## aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaadev.off()
## that was awesome, at least it was only ~ 200 a's so I only drifted off for like 7 seconds...
dev.off()## png
## 2
cf_monocyte_volcano_top10[["plot"]]cf_eosinophil_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
cf_eosinophil_volcano <- plot_volcano_condition_de(
cf_eosinophil_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_eosinophil_volcano_labeled.svg")
cf_eosinophil_volcano[["plot"]]
dev.off()## png
## 2
cf_eosinophil_volcano[["plot"]]cf_eosinophil_volcano_top10 <- plot_volcano_condition_de(
cf_eosinophil_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_eosinophil_volcano_labeled_top10-v{ver}.svg"))
cf_eosinophil_volcano_top10[["plot"]]
dev.off()## png
## 2
cf_eosinophil_volcano_top10[["plot"]]cf_neutrophil_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
cf_neutrophil_volcano <- plot_volcano_condition_de(
cf_neutrophil_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_neutrophil_volcano_labeled.svg")
cf_neutrophil_volcano[["plot"]]
dev.off()## png
## 2
cf_neutrophil_volcano[["plot"]]cf_neutrophil_volcano_top10 <- plot_volcano_condition_de(
cf_neutrophil_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_neutrophil_volcano_labeled_top10-v{ver}.svg"))
cf_neutrophil_volcano_top10[["plot"]]
dev.off()## png
## 2
cf_neutrophil_volcano_top10[["plot"]]t_cf_eosinophil_v1_de_sva <- all_pairwise(tv1_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 5 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8690
## basic_vs_dream 0.8868
## basic_vs_ebseq 0.8801
## basic_vs_edger 0.8704
## basic_vs_limma 0.9189
## basic_vs_noiseq 0.8824
## deseq_vs_dream 0.8334
## deseq_vs_ebseq 0.8624
## deseq_vs_edger 0.9995
## deseq_vs_limma 0.8465
## deseq_vs_noiseq 0.8556
## dream_vs_ebseq 0.8344
## dream_vs_edger 0.8360
## dream_vs_limma 0.9868
## dream_vs_noiseq 0.8513
## ebseq_vs_edger 0.8663
## ebseq_vs_limma 0.8447
## ebseq_vs_noiseq 0.9974
## edger_vs_limma 0.8493
## edger_vs_noiseq 0.8598
## limma_vs_noiseq 0.8602
t_cf_eosinophil_v1_table_sva <- combine_de_tables(
t_cf_eosinophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_table_sva-v{ver}.xlsx"))
t_cf_eosinophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 13 18 12
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v1_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v1_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 13 18 12
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 13 82
dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 18 82
t_cf_eosinophil_v2_de_sva <- all_pairwise(tv2_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8410
## basic_vs_dream 0.8545
## basic_vs_ebseq 0.8617
## basic_vs_edger 0.8430
## basic_vs_limma 0.8906
## basic_vs_noiseq 0.8904
## deseq_vs_dream 0.8308
## deseq_vs_ebseq 0.8782
## deseq_vs_edger 0.9996
## deseq_vs_limma 0.8537
## deseq_vs_noiseq 0.8615
## dream_vs_ebseq 0.7230
## dream_vs_edger 0.8346
## dream_vs_limma 0.9759
## dream_vs_noiseq 0.7978
## ebseq_vs_edger 0.8795
## ebseq_vs_limma 0.7540
## ebseq_vs_noiseq 0.9115
## edger_vs_limma 0.8579
## edger_vs_noiseq 0.8646
## limma_vs_noiseq 0.8122
t_cf_eosinophil_v2_table_sva <- combine_de_tables(
t_cf_eosinophil_v2_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_table_sva-v{ver}.xlsx"))
t_cf_eosinophil_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 10 4 10
## edger_sigdown limma_sigup limma_sigdown
## 1 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v2_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v2_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 10 1 10 4 9
## ebseq_down basic_up basic_down
## outcome 17 0 0
dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 10 82
dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 4 82
t_cf_eosinophil_v3_de_sva <- all_pairwise(tv3_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8037
## basic_vs_dream 0.8315
## basic_vs_ebseq 0.8918
## basic_vs_edger 0.8039
## basic_vs_limma 0.8646
## basic_vs_noiseq 0.8918
## deseq_vs_dream 0.8645
## deseq_vs_ebseq 0.8871
## deseq_vs_edger 1.0000
## deseq_vs_limma 0.9126
## deseq_vs_noiseq 0.8166
## dream_vs_ebseq 0.7586
## dream_vs_edger 0.8647
## dream_vs_limma 0.9692
## dream_vs_noiseq 0.8101
## ebseq_vs_edger 0.8872
## ebseq_vs_limma 0.7989
## ebseq_vs_noiseq 0.9399
## edger_vs_limma 0.9127
## edger_vs_noiseq 0.8169
## limma_vs_noiseq 0.8004
t_cf_eosinophil_v3_table_sva <- combine_de_tables(
t_cf_eosinophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_table_sva-v{ver}.xlsx"))
t_cf_eosinophil_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 70 29 73
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v3_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v3_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 73 10 70 29 2
## ebseq_down basic_up basic_down
## outcome 9 0 0
dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 70 82
dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 29 82
sva_aucc <- calculate_aucc(t_cf_eosinophil_table_sva[["data"]][[1]],
tbl2 = t_cf_eosinophil_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.579225027894229 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 137, df = 10516, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7944 0.8080
## sample estimates:
## cor
## 0.8013
shared_ids <- rownames(t_cf_eosinophil_table_sva[["data"]][[1]]) %in%
rownames(t_cf_eosinophil_table_batchvisit[["data"]][[1]])
first <- t_cf_eosinophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_eosinophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 137, df = 10516, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7944 0.8080
## sample estimates:
## cor
## 0.8013
t_mono_neut_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_neutrophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_mono_neut_sva_aucc## These two tables have an aucc value of: 0.207298450864693 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 43, df = 8565, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4037 0.4385
## sample estimates:
## cor
## 0.4212
t_mono_eo_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_mono_eo_sva_aucc## These two tables have an aucc value of: 0.0934803679834033 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 20, df = 9752, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1757 0.2139
## sample estimates:
## cor
## 0.1948
t_neut_eo_sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_neut_eo_sva_aucc## These two tables have an aucc value of: 0.201385413266601 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 42, df = 8564, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3944 0.4296
## sample estimates:
## cor
## 0.4122
For these contrasts, we want to see fail_v1 vs. cure_v1, fail_v2 vs. cure_v2 etc. As a result, we will need to juggle the data slightly and add another set of contrasts.
t_visit_cf_all_de_sva <- all_pairwise(t_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 30 24 20 15 17 17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_all_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_all_table_sva <- combine_de_tables(
t_visit_cf_all_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/t_all_visitcf_table_sva-v{ver}.xlsx"))
t_visit_cf_all_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 25 84 26 60
## 2 v2_failure_vs_v2_cure 52 41 46 33
## 3 v3_failure_vs_v3_cure 76 33 39 28
## limma_sigup limma_sigdown
## 1 9 18
## 2 3 0
## 3 3 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_all_sig_sva <- extract_significant_genes(
t_visit_cf_all_table_sva,
excel = glue("{cf_prefix}/t_all_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_all_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 9 18 26 60 25 84 0
## v2cf 3 0 46 33 52 41 0
## v3cf 3 0 39 28 76 33 1
## ebseq_down basic_up basic_down
## v1cf 38 0 0
## v2cf 0 0 0
## v3cf 0 0 0
In the following block, I am including all samples for the monocytes and splitting them up by visit and then comparing v1 cure/fail, v2 cure/fail, v3 cure/fail.
I expect that this should be more robust than the datasets of only visit 1.
visitcf_factor <- paste0(pData(t_monocytes)[["visitnumber"]], "_",
pData(t_monocytes)[["finaloutcome"]])
t_monocytes_visitcf <- set_expt_conditions(t_monocytes, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 6 7
t_visit_cf_monocyte_de_sva <- all_pairwise(t_monocytes_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 6 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_monocyte_table_sva <- combine_de_tables(
t_visit_cf_monocyte_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_table_sva-v{ver}.xlsx"))
t_visit_cf_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 18 13 11 13
## 2 v2_failure_vs_v2_cure 0 0 0 0
## 3 v3_failure_vs_v3_cure 0 0 0 0
## limma_sigup limma_sigdown
## 1 1 1
## 2 0 0
## 3 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_monocyte_sig_sva <- extract_significant_genes(
t_visit_cf_monocyte_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_monocyte_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 1 11 13 18 13 0
## v2cf 0 0 0 0 0 0 1
## v3cf 0 0 0 0 0 0 0
## ebseq_down basic_up basic_down
## v1cf 15 0 0
## v2cf 5 0 0
## v3cf 1 0 0
t_v1fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v1cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v1_maplot.png")
t_v1fc_deseq_ma
closed <- dev.off()
t_v1fc_deseq_mat_v2fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v2cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v2_maplot.png")
t_v2fc_deseq_ma
closed <- dev.off()
t_v2fc_deseq_mat_v3fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v3cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v3_maplot.png")
t_v3fc_deseq_ma
closed <- dev.off()
t_v3fc_deseq_maOne query from Alejandro is to look at the genes shared up/down across visits. I am not entirely certain we have enough samples for this to work, but let us find out.
I am thinking this is a good place to use the AUCC curves I learned about thanks to Julie Cridland.
Note that the following is all monocyte samples, this should therefore potentially be moved up and a version of this with only the Tumaco samples put here?
v1cf <- t_visit_cf_monocyte_table_sva[["data"]][["v1cf"]]
v2cf <- t_visit_cf_monocyte_table_sva[["data"]][["v2cf"]]
v3cf <- t_visit_cf_monocyte_table_sva[["data"]][["v3cf"]]
v1_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v1cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v1cf"]]))
length(v1_sig)## [1] 31
v2_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v2_sig)## [1] 0
v3_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v3_sig)## [1] 0
t_monocyte_visit_aucc_v2v1 <- calculate_aucc(v1cf, tbl2 = v2cf,
py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v2v1_aucc.png")
t_monocyte_visit_aucc_v2v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v2v1[["plot"]]t_monocyte_visit_aucc_v3v1 <- calculate_aucc(v1cf, tbl2 = v3cf,
py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v3v1_aucc.png")
t_monocyte_visit_aucc_v3v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v3v1[["plot"]]visitcf_factor <- paste0(pData(t_neutrophils)[["visitnumber"]], "_",
pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
t_visit_cf_neutrophil_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_neutrophil_table_sva <- combine_de_tables(
t_visit_cf_neutrophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_table_sva-v202503.xlsx before writing the tables.
t_visit_cf_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 12 6 5 6
## 2 v2_failure_vs_v2_cure 2 6 2 3
## 3 v3_failure_vs_v3_cure 2 2 0 2
## limma_sigup limma_sigdown
## 1 1 0
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_neutrophil_sig_sva <- extract_significant_genes(
t_visit_cf_neutrophil_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_sig_sva-v202503.xlsx before writing the tables.
t_visit_cf_neutrophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 0 5 6 12 6 0
## v2cf 0 0 2 3 2 6 1
## v3cf 0 0 0 2 2 2 2
## ebseq_down basic_up basic_down
## v1cf 2 0 0
## v2cf 1 0 0
## v3cf 3 0 0
visitcf_factor <- paste0(pData(t_eosinophils)[["visitnumber"]], "_",
pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
t_visit_cf_eosinophil_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods, keepers = visitcf_contrasts)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_eosinophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_eosinophil_table_sva <- combine_de_tables(
t_visit_cf_eosinophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_table_sva-v202503.xlsx before writing the tables.
t_visit_cf_eosinophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 8 10 4 4
## 2 v2_failure_vs_v2_cure 3 3 5 2
## 3 v3_failure_vs_v3_cure 12 7 13 2
## limma_sigup limma_sigdown
## 1 0 1
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_eosinophil_sig_sva <- extract_significant_genes(
t_visit_cf_eosinophil_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_sig_sva-v202503.xlsx before writing the tables.
t_visit_cf_eosinophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 0 1 4 4 8 10 4
## v2cf 0 0 5 2 3 3 11
## v3cf 0 0 13 2 12 7 3
## ebseq_down basic_up basic_down
## v1cf 84 0 0
## v2cf 18 0 0
## v3cf 10 0 0
Having put some SL read mapping information in the sample sheet, Maria Adelaida added a new column using it with the putative persistence state on a per-sample basis. One question which arised from that: what differences are observable between the persistent yes vs. no samples on a per-cell-type basis among the visit 3 samples.
First things first, create the datasets.
persistence_expt <- subset_expt(t_clinical, subset = "persistence=='Y'|persistence=='N'") %>%
subset_expt(subset = 'visitnumber==3') %>%
set_expt_conditions(fact = 'persistence')## subset_expt(): There were 123, now there are 83 samples.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': When the subset was taken, the resulting design has 0 members.
## persistence_biopsy <- subset_expt(persistence_expt, subset = "typeofcells=='biopsy'")
persistence_monocyte <- subset_expt(persistence_expt, subset = "typeofcells=='monocytes'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'persistence_expt' not found
persistence_neutrophil <- subset_expt(persistence_expt, subset = "typeofcells=='neutrophils'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'persistence_expt' not found
persistence_eosinophil <- subset_expt(persistence_expt, subset = "typeofcells=='eosinophils'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'persistence_expt' not found
See if there are any patterns which look usable.
## All
persistence_norm <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_expt' not found
plot_pca(persistence_norm)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_norm' not found
persistence_nb <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_expt' not found
plot_pca(persistence_nb)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_nb' not found
## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
## norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)[["plot"]]
## Insufficient data
## Monocytes
persistence_monocyte_norm <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_monocyte' not found
plot_pca(persistence_monocyte_norm)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_monocyte_norm' not found
persistence_monocyte_nb <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_monocyte' not found
plot_pca(persistence_monocyte_nb)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_monocyte_nb' not found
## Neutrophils
persistence_neutrophil_norm <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_neutrophil' not found
plot_pca(persistence_neutrophil_norm)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_neutrophil_norm' not found
persistence_neutrophil_nb <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_neutrophil' not found
plot_pca(persistence_neutrophil_nb)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_neutrophil_nb' not found
## Eosinophils
persistence_eosinophil_norm <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_eosinophil' not found
plot_pca(persistence_eosinophil_norm)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_eosinophil_norm' not found
persistence_eosinophil_nb <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'persistence_eosinophil' not found
plot_pca(persistence_eosinophil_nb)[["plot"]]## Error in eval(expr, envir, enclos): object 'persistence_eosinophil_nb' not found
This is pretty sparse and unlikely to yield any interesting results I am thinking.
persistence_de_sva <- all_pairwise(persistence_expt, filter = TRUE, methods = methods,
model_batch = "svaseq")## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'persistence_expt' not found
persistence_de_sva## Error in eval(expr, envir, enclos): object 'persistence_de_sva' not found
persistence_table_sva <- combine_de_tables(
persistence_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_all_de_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 'persistence_de_sva' not found
persistence_table_sva## Error in eval(expr, envir, enclos): object 'persistence_table_sva' not found
persistence_monocyte_de_sva <- all_pairwise(persistence_monocyte, filter = TRUE,
model_batch = "svaseq",
methods = methods)## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'persistence_monocyte' not found
persistence_monocyte_de_sva## Error in eval(expr, envir, enclos): object 'persistence_monocyte_de_sva' not found
persistence_monocyte_table_sva <- combine_de_tables(
persistence_monocyte_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_monocyte_de_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 'persistence_monocyte_de_sva' not found
persistence_monocyte_table_sva## Error in eval(expr, envir, enclos): object 'persistence_monocyte_table_sva' not found
persistence_neutrophil_de_sva <- all_pairwise(persistence_neutrophil, filter = TRUE,
model_batch = "svaseq",
methods = methods)## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'persistence_neutrophil' not found
persistence_neutrophil_de_sva## Error in eval(expr, envir, enclos): object 'persistence_neutrophil_de_sva' not found
persistence_neutrophil_table_sva <- combine_de_tables(
persistence_neutrophil_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_neutrophil_de_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 'persistence_neutrophil_de_sva' not found
persistence_neutrophil_table_sva## Error in eval(expr, envir, enclos): object 'persistence_neutrophil_table_sva' not found
## There are insufficient samples (1) in the 'N' category.
##persistence_eosinophil_de_sva <- all_pairwise(persistence_eosinophil, filter = TRUE,
#model_batch = "svaseq",
## methods = methods)
##persistence_eosinophil_de_sva
##persistence_eosinophil_table_sva <- combine_de_tables(
## persistence_eosinophil_de_sva,
## excel = glue("{xlsx_prefix}/DE_Persistence/persistence_eosinophil_de_sva-v{ver}.xlsx"))In the following, I am hoping to lower variance associated with factors other than visit via sva and therefore be able to see what genes are changing for everyone with respect to time.
This is the one instance where I think it would be really nice to have biopsy samples for all three visits; I presume that we would have a really nice signal of stuff like keratin and other wound-healing associated genes.
t_visit_all_de_sva <- all_pairwise(t_visit, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## v3 v2 v1
## 34 35 40
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_all_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_all_table_sva <- combine_de_tables(
t_visit_all_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_table_sva-v{ver}.xlsx"))
t_visit_all_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 v2_vs_v1 20 8 20 10 19
## 2 v3_vs_v1 21 20 18 16 21
## 3 v3_vs_v2 0 2 0 2 0
## limma_sigdown
## 1 7
## 2 7
## 3 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_all_sig_sva <- extract_significant_genes(
t_visit_all_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_sig_sva-v{ver}.xlsx"))
t_visit_all_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v2v1 19 7 20 10 20 8 0
## v3v1 21 7 18 16 21 20 0
## v3v2 0 0 0 2 0 2 0
## ebseq_down basic_up basic_down
## v2v1 0 0 0
## v3v1 0 0 0
## v3v2 0 0 0
t_visit_monocytes <- set_expt_conditions(t_monocytes, fact = "visitnumber")## The numbers of samples by condition are:
##
## v3 v2 v1
## 13 13 16
t_visit_monocyte_de_sva <- all_pairwise(t_visit_monocytes, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## v3 v2 v1
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_monocyte_table_sva <- combine_de_tables(
t_visit_monocyte_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_table_sva-v{ver}.xlsx"))
t_visit_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 v2_vs_v1 1 2 1 1 0
## 2 v3_vs_v1 2 1 1 1 0
## 3 v3_vs_v2 0 0 0 0 0
## limma_sigdown
## 1 0
## 2 0
## 3 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_visit_monocyte_sig_sva <- extract_significant_genes(
t_visit_monocyte_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_sig_sva-v{ver}.xlsx"))
t_visit_monocyte_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v2v1 0 0 1 1 1 2 0
## v3v1 0 0 1 1 2 1 0
## v3v2 0 0 0 0 0 0 0
## ebseq_down basic_up basic_down
## v2v1 1 1 0
## v3v1 0 0 0
## v3v2 1 0 0
t_visit_neutrophils <- set_expt_conditions(t_neutrophils, fact = "visitnumber")## The numbers of samples by condition are:
##
## v3 v2 v1
## 12 13 16
t_visit_neutrophil_de_sva <- all_pairwise(t_visit_neutrophils, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## v3 v2 v1
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = as.numeric(xdata[j, ]), y =
## as.numeric(ydata[j, : cannot compute exact p-value with ties
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_neutrophil_table_sva <- combine_de_tables(
t_visit_neutrophil_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_table_sva-v{ver}.xlsx"))
t_visit_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 v2_vs_v1 112 88 113 87 117
## 2 v3_vs_v1 128 47 119 46 92
## 3 v3_vs_v2 1 0 0 0 0
## limma_sigdown
## 1 52
## 2 67
## 3 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_neutrophil_sig_sva <- extract_significant_genes(
t_visit_neutrophil_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_sig_sva-v{ver}.xlsx"))
t_visit_neutrophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v2v1 117 52 113 87 112 88 65
## v3v1 92 67 119 46 128 47 36
## v3v2 0 0 0 0 1 0 1
## ebseq_down basic_up basic_down
## v2v1 20 653 0
## v3v1 7 281 0
## v3v2 0 0 0
t_visit_eosinophils <- set_expt_conditions(t_eosinophils, fact="visitnumber")
t_visit_eosinophil_de <- all_pairwise(t_visit_eosinophils, filter = TRUE,
model_batch = "svaseq",
methods = methods)
t_visit_eosinophil_de
t_visit_eosinophil_table <- combine_de_tables(
t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_table_sva-v{ver}.xlsx"))
t_visit_eosinophil_table
t_visit_eosinophil_sig <- extract_significant_genes(
t_visit_eosinophil_table,
excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_sig_sva-v{ver}.xlsx"))
## No significant genes observed.## Error: <text>:9:3: unexpected symbol
## 8: t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
## 9: excel
## ^
Alejandro showed some ROC curves for eosinophil data showing sensitivity vs. specificity of a couple genes which were observed in v1 eosinophils vs. all-times eosinophils across cure/fail. I am curious to better understand how this was done and what utility it might have in other contexts.
To that end, I want to try something similar myself. In order to properly perform the analysis with these various tools, I need to reconfigure the data in a pretty specific format:
If I intend to use this for our tx data, I will likely need a utility function to create the properly formatted input df.
For the purposes of my playing, I will choose three genes from the eosinophil C/F table, one which is significant, one which is not, and an arbitrary.
The input genes will therefore be chosen from the data structure: t_cf_eosinophil_table_sva:
ENSG00000198178, ENSG00000179344, ENSG00000182628
eo_rpkm <- normalize_expt(tv1_eosinophils, convert = "rpkm", column = "cds_length")## There appear to be 5382 genes without a length.
This paper is DOI:10.1126/scitranslmed.aax4204 (Amorim et al. (2019))
Variable gene expression and parasite load predict treatment outcome in cutaneous leishmaniasis.
One query from Maria Adelaida is to see how this data fits with ours. I have read this paper a couple of times now and I get confused on a couple of points every time, which I will explain in a moment. The expermental design is key to my confusion and key to what I think is being missed in our interpretation of the results:
external_norm <- normalize_expt(external_cf, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
plot_pca(external_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
external_nb <- normalize_expt(external_cf, filter = TRUE, batch = "svaseq",
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
## transform_counts: Found 171 values less than 0.
## Warning in transform_counts(count_table, method = transform, ...): NaNs
## produced
plot_pca(external_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
external_de <- all_pairwise(external_cf, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 14 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
external_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.3908
## basic_vs_dream 0.4488
## basic_vs_ebseq 0.9027
## basic_vs_edger 0.3914
## basic_vs_limma 0.4180
## basic_vs_noiseq 0.9604
## deseq_vs_dream 0.8718
## deseq_vs_ebseq 0.4149
## deseq_vs_edger 0.9997
## deseq_vs_limma 0.8487
## deseq_vs_noiseq 0.4412
## dream_vs_ebseq 0.4304
## dream_vs_edger 0.8727
## dream_vs_limma 0.9654
## dream_vs_noiseq 0.4269
## ebseq_vs_edger 0.4177
## ebseq_vs_limma 0.3577
## ebseq_vs_noiseq 0.9407
## edger_vs_limma 0.8497
## edger_vs_noiseq 0.4418
## limma_vs_noiseq 0.3627
external_table <- combine_de_tables(
external_de, scale_p = TRUE,
excel = "excel/scott_table.xlsx")
external_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 0 0 0 0
## limma_sigup limma_sigdown
## 1 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
external_sig <- extract_significant_genes(external_table, excel = "excel/scott_sig.xlsx")
external_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure 0 0 0 0 0 0
## ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure 0 0 0 0
external_top100 <- extract_significant_genes(external_table, n = 100)
external_up <- external_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
external_down <- external_top100[["deseq"]][["downs"]][["failure_vs_cure"]]I think I am getting a significantly different result from Scott, so I am going to do an explicit side-by-side comparison of our results at each step. In order to do this, I am using the capsule they kindly provided with their publication.
I am copy/pasting material from their publication with some modification which I will note as I go.
Here is their block ‘r packages’
Note/Spoiler alert: It actually turns out our results are basically relatively similar, I just didn’t understand what comparisons are actually in paper vs those I have primary interest. In addition, we handled gene IDs differently (gene card vs. EnsemblID) which has a surprisingly big effect.
Oh, I just realized that when I did these analyses, I did them in a completely separate tree and compared the results post-facto. This assumption remains in this document and therefore is unlikely to work properly in the containerized environment I am attempting to create. Given that the primary goal of this section is to show to myself that I compared the two datasets as thoroughly as I could, perhaps I should just disable them for the container and allow the reader to perform the exercise de-novo.
library(tidyverse)
library(ggthemes)
library(reshape2)
library(edgeR)
library(patchwork)
library(vegan)
library(DT)
library(tximport)
library(gplots)
library(FinCal)
library(ggrepel)
library(gt)
library(ggExtra)
library(EnsDb.Hsapiens.v86)
library(stringr)
library(cowplot)
library(ggpubr)I have a separate tree in which I copied the capsule and data. I performed exactly their kallisto quant steps within it and put the output data into the same place within it. I did change the commands slightly because I downloaded the files from SRA and so don’t have them with names like ‘host_CL01’, but instead ‘PRJNA… (well, SRR…)’. But the samples are in the same order, so I sent the output files to the same final filenames. Here is an example from the first sample:
cd preprocessing
module add kallisto
kallisto index -i Homo_sapiens.GRCh38.cdna.all.Index Homo_sapiens.GRCh38.cdna.all.fa
# Map reads to the indexed reference transcriptome for HOST
# first the healthy subjects (HS)
export LESS = '--buffers 0 -B'
kallisto quant -i Homo_sapiens.GRCh38.cdna.all.Index -o host_HS01 -t 24 -b 60 \
--single -l 250 -s 30 <(less SRR8668755/*-trimmed.fastq.xz) 2>host_HS01.log 1>&2 &I am going to change the path very slightly in the following block simply because I put the capsule in a separate directory and do not want to copy it here. Otherwise it is unmodified. Also, the function gt::tab_header() annoys the crap out of me.
import <- read_tsv("../scott_2019/capsule-6534016/data/studydesign.txt")
import %>% dplyr::filter(disease == "cutaneous") %>%
dplyr::select(-2) %>% gt() %>%
tab_header(title = md("Clinical metadata from patients with cutaneous leishmaniasis (CL)"),
subtitle = md("`(n=21)`")) %>% cols_align(align = "center", columns = TRUE)
targets.lesion <- import
targets.onlypatients <- targets.lesion[8:28,] # only CL lesions (n=21)
# Making factors that will be used for pairwise comparisons:
# HS vs. CL lesions as a factor:
disease.lesion <- factor(targets.lesion$disease)
# Cure vs. Failure lesions as a factor:
treatment.lesion <- factor(targets.onlypatients$treatment_outcome)They did use a slightly different annotation set, Ensembl revision 86. Once again I am modifying the paths slightly to reflect where I put the capsule.
# capturing Ensembl transcript IDs (tx) and gene symbols ("gene_name") from
# EnsDb.Hsapiens.v86 annotation package
Tx <- as.data.frame(transcripts(EnsDb.Hsapiens.v86,
columns=c(listColumns(EnsDb.Hsapiens.v86, "tx"),
"gene_name")))
Tx <- dplyr::rename(Tx, target_id = tx_id)
row.names(Tx) <- NULL
Tx <- Tx[,c(6,12)]
# getting file paths for Kallisto outputs
paths.all <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.lesion$sample, "abundance.h5")
paths.patients <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.onlypatients$sample, "abundance.h5")
# importing .h5 Kallisto data and collapsing transcript-level data to genes
Txi.lesion.coding <- tximport(paths.all,
type = "kallisto",
tx2gene = Tx,
txOut = FALSE,
ignoreTxVersion = TRUE,
countsFromAbundance = "lengthScaledTPM")
# importing againg, but this time just the CL patients
Txi.lesion.coding.onlypatients <- tximport(paths.patients,
type = "kallisto",
tx2gene = Tx,
txOut = FALSE,
ignoreTxVersion = TRUE,
countsFromAbundance = "lengthScaledTPM")The block ‘visualizationDatasets’ follows unchanged. In the next block I will add another plot or perhaps 2
# First make a DGEList from the counts:
Txi.lesion.coding.DGEList <- DGEList(Txi.lesion.coding$counts)
colnames(Txi.lesion.coding.DGEList$counts) <- targets.lesion$sample
colnames(Txi.lesion.coding$counts) <- targets.lesion$sample
Txi.lesion.coding.DGEList.OP <- DGEList(Txi.lesion.coding.onlypatients$counts)
colnames(Txi.lesion.coding.DGEList.OP) <- targets.onlypatients$sample
# Convert to counts per million:
Txi.lesion.coding.DGEList.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList, log = TRUE)
Txi.lesion.coding.DGEList.OP.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList.OP, log = TRUE)
keepers.coding <- rowSums(Txi.lesion.coding.DGEList.cpm>1)>=7
keepers.coding.OP <- rowSums(Txi.lesion.coding.DGEList.OP.cpm>1)>=7
Txi.lesion.coding.DGEList.filtered <- Txi.lesion.coding.DGEList[keepers.coding,]
Txi.lesion.coding.DGEList.OP.filtered <- Txi.lesion.coding.DGEList.OP[keepers.coding.OP,]
# convert back to cpm:
Txi.lesion.coding.DGEList.LogCPM.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.filtered,
log=TRUE)
Txi.lesion.coding.DGEList.LogCPM.OP.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.OP.filtered,
log=TRUE)
# Normalizing data:
calcNorm1 <- calcNormFactors(Txi.lesion.coding.DGEList.filtered, method = "TMM")
calcNorm2 <- calcNormFactors(Txi.lesion.coding.DGEList.OP.filtered, method = "TMM")
Txi.lesion.coding.DGEList.LogCPM.filtered.norm <- edgeR::cpm(calcNorm1, log=TRUE)
colnames(Txi.lesion.coding.DGEList.LogCPM.filtered.norm) <- targets.lesion$sample
Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm <- edgeR::cpm(calcNorm2, log=TRUE)
colnames(Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm) <- targets.onlypatients$sample
# Raw dataset:
V1 <- as.data.frame(Txi.lesion.coding.DGEList.cpm)
colnames(V1) <- targets.lesion$sample
V1 <- melt(V1)
colnames(V1) <- c("sample","expression")
# Filtered dataset:
V1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered)
colnames(V1.1) <- targets.lesion$sample
V1.1 <- melt(V1.1)
colnames(V1.1) <- c("sample","expression")
# Filtered-normalized dataset:
V1.1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered.norm)
colnames(V1.1.1) <- targets.lesion$sample
V1.1.1 <- melt(V1.1.1)
colnames(V1.1.1) <- c("sample","expression")
# plotting:
ggplot(V1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Raw dataset") +
ggplot(V1.1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Filtered dataset") +
ggplot(V1.1.1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Filtered and normalized dataset")The following block in their dataset recreated the matrix without filtering and will use that for differential expression. It is a little hard to follow for me because they subset based on the sample numbers (8 to 28, which if I am not mistaken just drops the healthy samples).
DataNotFiltered_Norm_OP <- calcNormFactors(Txi.lesion.coding.DGEList[,8:28],
method = "TMM")
DataNotFiltered_Norm_log2CPM_OP <- edgeR::cpm(DataNotFiltered_Norm_OP, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM_OP) <- targets.onlypatients$sample
CPM_normData_notfiltered_OP <- 2^(DataNotFiltered_Norm_log2CPM_OP)
#uncomment the next line to produce raw data that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(Txi.lesion.coding$counts, file = "Amorim_GEO_raw.txt", sep = "\t", quote = FALSE)
# Including all the individuals (HS and CL patients) for public domain submission:
DataNotFiltered_Norm <- calcNormFactors(Txi.lesion.coding.DGEList, method = "TMM")
DataNotFiltered_Norm_log2CPM <- edgeR::cpm(DataNotFiltered_Norm, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM) <- targets.lesion$sample
CPM_normData_notfiltered <- 2^(DataNotFiltered_Norm_log2CPM)
#uncomment the next line to produce the normalized data file that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(DataNotFiltered_Norm_log2CPM, "Amorim_GEO_normalized.txt", sep = "\t", quote = FALSE)The following block generated a couple of the figures in the paper and comprise a pretty straightforward PCA. I am going to make a following block containing the same image with the cure/fail visualization using the same method/data.
pca.res <- prcomp(t(Txi.lesion.coding.DGEList.LogCPM.filtered.norm), scale.=F, retx=T)
pc.var <- pca.res$sdev^2
pc.per <- round(pc.var/sum(pc.var)*100, 1)
data.frame <- as.data.frame(pca.res$x)
# Calculate distance between samples by permanova:
allsamples.dist <- vegdist(t(2^Txi.lesion.coding.DGEList.LogCPM.filtered.norm),
method = "bray")
vegan <- adonis2(allsamples.dist~targets.lesion$disease,
data=targets.lesion,
permutations = 999, method="bray")
targets.lesion$disease
ggplot(data.frame, aes(x=PC1, y=PC2, color=factor(targets.lesion$disease))) +
geom_point(size=5, shape=20) +
theme_calc() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 15, vjust = 0.5),
axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
legend.position="none") +
scale_color_manual(values = c("#073F80","#EB512C")) +
annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
vegan[1,5]), size=3, fontface="bold") +
xlab(paste("PC1 -",pc.per[1],"%")) +
ylab(paste("PC2 -",pc.per[2],"%")) +
xlim(-200,110)I just realized that somewhere along the way in creating this container, I messed up this analysis pretty badly:
When I originally did this on my workstation I had an actual 1:1 comparison and saw that our results were quite similar. I need to bring that back into this in order to show that neither we nor they are crazy people.
Either way, I think the main takeaway is that their dataset does not spend much time looking at cure/fail but instead control/infected for a reason.
Note, the fun aspects of the experiment (time to cure, size of lesion, etc) are not annotated in the metadata provided by SRA, but instead may be found in the capsule kindly provided by the lab. As a result, I copied that file into the sample_sheets/ directory and have added it to the expressionset. There is an important caveat, though: I did not include the non-diseased samples for this comparison; as a result the disease metadata factor is boring (e.g. it is only cutaneous).
external_cf[["accession"]] <- pData(external_cf)[["sample"]]
disease_factor <- pData(external_cf)[["disease"]]
table(disease_factor)## disease_factor
## cutaneous
## 21
external_disease <- set_expt_conditions(external_cf, fact = disease_factor)## The numbers of samples by condition are:
##
## cutaneous
## 21
external_l2cpm <- normalize_expt(external_cf, filter = TRUE,
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
## transform_counts: Found 165 values equal to 0, adding 1 to the matrix.
plot_pca(external_l2cpm, plot_labels = "repel")## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
Use the following block if you wish to bring together SRA-downloaded data with the experimental design from the Scott paper. It requires running the blocks above in which I loaded the capsule-derived metadata.
test <- pData(external_cf)
test_import <- as.data.frame(import)
test_import[["accession"]] <- pData(external_cf[["accession"]])
test_merged <- merge(test, import, by = "accession")This is real comparison point to their cure/fail analysis.
I am just copy/pasting their code again, but changing the color factor so that cure is purple, failure is red, and na(uninfected) is black.
The following plot should be the first direct comparison point between the two analysis pipelines. Thus, if you look back a few block at my invocation of plot_pca(external_norm), you will see a green/orange plot which is functionally identical if you note:
With those caveats in mind, it is trivial to find the same relationshipes in the samples. E.g. the bottom red/purple individual samples are in the same relative position as my top orange/green pair. the same 4 samples are relative x-axis outliers (my right green, their left purple). The last 6 samples (my orange, their red) are all in the relative orientation.
I think I can further prove the similarity of our inputs via a direct comparison of the datastructures: Txi.lesion.coding.DGEList.LogCPM.filtered.norm (ugh what a name) vs. external_cf. In order to make that comparison, I need to rename my rows to the genecard IDs and the columns.
their_norm_exprs <- Txi.lesion.coding.DGEList.LogCPM.filtered.norm
my_hgnc_ids <- make.names(fData(external_cf)[["hgnc_symbol"]], unique = TRUE)
my_renamed <- set_expt_genenames(external_cf, ids = my_hgnc_ids)
my_norm <- normalize_expt(my_renamed, filter = TRUE, transform = "log2", convert = "cpm")
my_norm_exprs <- as.data.frame(exprs(my_norm))
our_exprs <- merge(their_norm_exprs, my_norm_exprs, by = "row.names")
rownames(our_exprs) <- our_exprs[["Row.names"]]
our_exprs[["Row.names"]] <- NULL
dim(our_exprs)
## I fully expected a correlation heatmap of the combined
## data to show a set of paired samples across the board.
## That is absolutely not true.
correlations <- plot_corheat(our_exprs)
correlations[["scatter"]]
correlations[["plot"]]color_fact <- factor(targets.lesion$treatment_outcome)
levels(color_fact)
## Added by atb to see cure/fail on the same dataset
ggplot(data.frame, aes(x=PC1, y=PC2, color=color_fact)) +
geom_point(size=5, shape=20) +
theme_calc() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 15, vjust = 0.5),
axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
legend.position="none") +
scale_color_manual(values = c("purple", "red","black")) +
annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
vegan[1,5]), size=3, fontface="bold") +
xlab(paste("PC1 -",pc.per[1],"%")) +
ylab(paste("PC2 -",pc.per[2],"%")) +
xlim(-200,110)The following is their comparison of healthy tissue vs. CL lesion and Failure vs. Cure. I am going to follow it with my analagous examination using limma. Note, each of the pairs of variables created in the following block is xxx followed by xxx.treat; the former is healthy vs lesion and the latter is the fail vs cure set.
# Model matrices:
# CL lesions vs. HS:
design.lesion <- model.matrix(~0 + disease.lesion)
colnames(design.lesion) <- levels(disease.lesion)
# Failure vs. Cure:
design.lesion.treatment <- model.matrix(~0 + treatment.lesion)
colnames(design.lesion.treatment) <- levels(treatment.lesion)
myDGEList.lesion.coding <- DGEList(calcNorm1$counts)
myDGEList.OP.NotFil <- DGEList(CPM_normData_notfiltered_OP)
# Model mean-variance trend and fit linear model to data.
# Use VOOM function from Limma package to model the mean-variance relationship
normData.lesion.coding <- voom(myDGEList.lesion.coding, design.lesion)
normData.OP.NotFil <- voom(myDGEList.OP.NotFil, design.lesion.treatment)
colnames(normData.lesion.coding) <- targets.lesion$sample
colnames(normData.OP.NotFil) <- targets.onlypatients$sample
# fit a linear model to your data
fit.lesion.coding <- lmFit(normData.lesion.coding, design.lesion)
fit.lesion.coding.treatment <- lmFit(normData.OP.NotFil, design.lesion.treatment)
# contrast matrix
contrast.matrix.lesion <- makeContrasts(CL.vs.CON = cutaneous - control,
levels=design.lesion)
contrast.matrix.lesion.treat <- makeContrasts(failure.vs.cure = failure - cure,
levels=design.lesion.treatment)
# extract the linear model fit
fits.lesion.coding <- contrasts.fit(fit.lesion.coding,
contrast.matrix.lesion)
fits.lesion.coding.treat <- contrasts.fit(fit.lesion.coding.treatment,
contrast.matrix.lesion.treat)
# get bayesian stats for your linear model fit
ebFit.lesion.coding <- eBayes(fits.lesion.coding)
ebFit.lesion.coding.treat <- eBayes(fits.lesion.coding.treat)
# TopTable ----
allHits.lesion.coding <- topTable(ebFit.lesion.coding,
adjust ="BH", coef=1,
number=34935, sort.by="logFC")
allHits.lesion.coding.treat <- topTable(ebFit.lesion.coding.treat,
adjust ="BH", coef=1,
number=34776, sort.by="logFC")
myTopHits <- rownames_to_column(allHits.lesion.coding, "geneID")
myTopHits.treat <- rownames_to_column(allHits.lesion.coding.treat, "geneID")
# mutate the format of numeric values:
myTopHits <- mutate(myTopHits, log10Pval = round(-log10(adj.P.Val),2),
adj.P.Val = round(adj.P.Val, 2),
B = round(B, 2),
AveExpr = round(AveExpr, 2),
t = round(t, 2),
logFC = round(logFC, 2),
geneID = geneID)
myTopHits.treat <- mutate(myTopHits.treat, log10Pval = round(-log10(adj.P.Val),2),
adj.P.Val = round(adj.P.Val, 2),
B = round(B, 2),
AveExpr = round(AveExpr, 2),
t = round(t, 2),
logFC = round(logFC, 2),
geneID = geneID)
#save(myTopHits, file = "myTopHits")
#save(myTopHits.treat, file = "myTopHits.treat")my_filt <- normalize_expt(my_renamed, filter = "simple")
limma_cf <- limma_pairwise(my_filt, model_batch = FALSE)
my_table <- limma_cf[["all_tables"]][["failure_vs_cure"]]
their_table <- myTopHits.treat
dim(my_table)
dim(myTopHits.treat)
our_table <- merge(my_table, myTopHits.treat, by.x = "row.names", by.y = "geneID")
dim(our_table)
comparison <- plot_linear_scatter(our_table[, c("logFC.x", "logFC.y")])
comparison$scatter
comparison$correlation
comparison$lm_modelOk, so there is a constituitive difference in our results, and it is significant. What does that mean for the set of genes observed?
With that said, in my most recent manual run of this, the results are quite good, I got a 0.75 correlation; I bet the primary outliers (on the axes) are just genes for which we got different gene<->tx mappings due to me using hisat and their usage of kallisto.
(spoiler alert: when I recast my gene IDs to the gene symbols, which also collapses a bunch of genes, the remaining differences disappeared) – this brings up a whole different set of worries vis a vis how one makes these analyses more reproducible, but at least leaves me relatively confident that our methods are compatible.
I guess I can test this hypothesis by just swapping in their counts into my data structure.
test_counts <- as.data.frame(myDGEList.lesion.coding[["counts"]])
test_counts[["host_HS01"]] <- NULL
test_counts[["host_HS02"]] <- NULL
test_counts[["host_HS03"]] <- NULL
test_counts[["host_HS04"]] <- NULL
test_counts[["host_HS05"]] <- NULL
test_counts[["host_HS06"]] <- NULL
test_counts[["host_HS07"]] <- NULL
dim(test_counts)
dim(exprs(my_test))
## Oh, that surprises me, the kallisto data has ~ 6k fewer genes?only_tmrc3 <- subset_expt(tmrc3_external, subset = "condition=='Colombia'") %>%
set_expt_conditions(fact = "finaloutcome")## subset_expt(): There were 39, now there are 18 samples.
## The numbers of samples by condition are:
##
## failure cure
## 5 13
only_tmrc3_de <- all_pairwise(only_tmrc3, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## failure cure
## 5 13
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
only_tmrc3_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7791
## basic_vs_dream 0.9009
## basic_vs_ebseq 0.7973
## basic_vs_edger 0.8970
## basic_vs_limma 0.9067
## basic_vs_noiseq 0.9377
## deseq_vs_dream 0.7196
## deseq_vs_ebseq 0.8927
## deseq_vs_edger 0.9249
## deseq_vs_limma 0.7158
## deseq_vs_noiseq 0.7853
## dream_vs_ebseq 0.7619
## dream_vs_edger 0.8365
## dream_vs_limma 0.9891
## dream_vs_noiseq 0.8748
## ebseq_vs_edger 0.9229
## ebseq_vs_limma 0.7494
## ebseq_vs_noiseq 0.8438
## edger_vs_limma 0.8312
## edger_vs_noiseq 0.9018
## limma_vs_noiseq 0.8630
only_tmrc3_table <- combine_de_tables(only_tmrc3_de, scale_p = TRUE)
only_tmrc3_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 27 26 28 15
## limma_sigup limma_sigdown
## 1 1 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
only_tmrc3_top100 <- extract_significant_genes(only_tmrc3_table, n = 100)
only_tmrc3_up <- only_tmrc3_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
only_tmrc3_down <- only_tmrc3_top100[["deseq"]][["downs"]][["failure_vs_cure"]]
tmrc3_external_de <- all_pairwise(tmrc3_external, model_batch = "svaseq",
filter = "simple",
methods = methods)##
## Brazil Colombia
## 21 18
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tmrc3_external_table <- combine_de_tables(
tmrc3_external_de, scale_p = TRUE,
excel = "excel/tmrc3_scott_biopsies.xlsx")## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_sig <- extract_significant_genes(
tmrc3_external_table, excel = "excel/tmrc3_scott_biopsies_sig.xlsx")
tmrc3_external_cf <- set_expt_conditions(tmrc3_external, fact = "finaloutcome")## The numbers of samples by condition are:
##
## failure cure
## 12 27
tmrc3_external_cf <- set_expt_batches(tmrc3_external_cf, fact = "lab")## The number of samples by batch are:
##
## Brazil Colombia
## 21 18
tmrc3_external_cf_norm <- normalize_expt(tmrc3_external_cf, filter = TRUE,
norm = "quant", convert = "cpm", transform = "log2")## Removing 6908 low-count genes (14573 remaining).
## transform_counts: Found 18 values equal to 0, adding 1 to the matrix.
plot_pca(tmrc3_external_cf_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.
tmrc3_external_cf_nb <- normalize_expt(tmrc3_external_cf, filter = TRUE,
batch = "svaseq", convert = "cpm", transform = "log2")## Removing 6908 low-count genes (14573 remaining).
## transform_counts: Found 1521 values less than 0.
## Warning in transform_counts(count_table, method = transform, ...): NaNs
## produced
plot_pca(tmrc3_external_cf_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.
tmrc3_external_cf_de <- all_pairwise(tmrc3_external_cf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## failure cure
## 12 27
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tmrc3_external_cf_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7723
## basic_vs_dream 0.9094
## basic_vs_ebseq 0.8257
## basic_vs_edger 0.8155
## basic_vs_limma 0.9182
## basic_vs_noiseq 0.9409
## deseq_vs_dream 0.8238
## deseq_vs_ebseq 0.9087
## deseq_vs_edger 0.9499
## deseq_vs_limma 0.7945
## deseq_vs_noiseq 0.8240
## dream_vs_ebseq 0.8160
## dream_vs_edger 0.8843
## dream_vs_limma 0.9762
## dream_vs_noiseq 0.8646
## ebseq_vs_edger 0.9161
## ebseq_vs_limma 0.7870
## ebseq_vs_noiseq 0.9014
## edger_vs_limma 0.8540
## edger_vs_noiseq 0.8639
## limma_vs_noiseq 0.8505
tmrc3_external_cf_table <- combine_de_tables(
tmrc3_external_cf_de, scale_p = TRUE,
excel = "excel/tmrc3_scott_cf_table.xlsx")## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_cf_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 37 126 38 94
## limma_sigup limma_sigdown
## 1 7 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tmrc3_external_cf_sig <- extract_significant_genes(
tmrc3_external_cf_table, excel = "excel/tmrc3_scott_cf_sig.xlsx")
tmrc3_external_cf_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure 7 0 38 94 37 126
## ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure 3 6 0 0
tmrc3_external_species <- set_expt_conditions(tmrc3_external, fact = "ParasiteSpecies") %>%
set_expt_colors(color_choices[["parasite"]])## The numbers of samples by condition are:
##
## lvbraziliensis lvpanamensis notapplicable
## 22 14 3
## Warning in set_expt_colors(., color_choices[["parasite"]]): Colors for the
## following categories are not being used: lvguyanensis.
Let us look at the top/bottom 100 genes of these two datasets and see if they have any similarities.
Note to self, set up s4 dispatch on compare_de_tables!
compared <- compare_de_tables(only_tmrc3_table, external_table, first_table = 1, second_table = 1)
compared$scattercompared$correlation##
## Pearson's product-moment correlation
##
## data: df[[xcol]] and df[[ycol]]
## t = 13, df = 13217, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09922 0.13285
## sample estimates:
## cor
## 0.1161
The goal of the following is to look across visits for combinations of cure and fail (not fail/cure, but v2/v1) and across cell types.
Thus, in order to do this, I will need to combine those three parameters or set up a more complex model to handle this.
t_cellvisitcf <- set_expt_conditions(t_clinical_nobiop, fact = "cell_visit_cf")## The numbers of samples by condition are:
##
## eosinophils_v1_cure eosinophils_v1_failure eosinophils_v2_cure
## 5 3 6
## eosinophils_v2_failure eosinophils_v3_cure eosinophils_v3_failure
## 3 6 3
## monocytes_v1_cure monocytes_v1_failure monocytes_v2_cure
## 8 8 7
## monocytes_v2_failure monocytes_v3_cure monocytes_v3_failure
## 6 6 7
## neutrophils_v1_cure neutrophils_v1_failure neutrophils_v2_cure
## 8 8 7
## neutrophils_v2_failure neutrophils_v3_cure neutrophils_v3_failure
## 6 5 7
t_cellvisitcf_de <- all_pairwise(t_cellvisitcf, keepers = visittype_contrasts,
model_batch = "svaseq", filter = TRUE,
methods = methods)##
## eosinophils_v1_cure eosinophils_v1_failure eosinophils_v2_cure
## 5 3 6
## eosinophils_v2_failure eosinophils_v3_cure eosinophils_v3_failure
## 3 6 3
## monocytes_v1_cure monocytes_v1_failure monocytes_v2_cure
## 8 8 7
## monocytes_v2_failure monocytes_v3_cure monocytes_v3_failure
## 6 6 7
## neutrophils_v1_cure neutrophils_v1_failure neutrophils_v2_cure
## 8 8 7
## neutrophils_v2_failure neutrophils_v3_cure neutrophils_v3_failure
## 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## Error in eval(ej, envir = levelsenv) :
## object 'conditionmonocytes_2_cure' not found
## Error in retlst[[meth]]: attempt to select less than one element in get1index
t_cellvisitcf_de## Error in eval(expr, envir, enclos): object 't_cellvisitcf_de' not found
t_cellvisitcf_mono_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_mono, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_table_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 't_cellvisitcf_de' not found
t_cellvisitcf_mono_table## Error in eval(expr, envir, enclos): object 't_cellvisitcf_mono_table' not found
t_cellvisitcf_mono_sig <- extract_significant_genes(
t_cellvisitcf_mono_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_cellvisitcf_mono_table' not found
t_cellvisitcf_mono_sig## Error in eval(expr, envir, enclos): object 't_cellvisitcf_mono_sig' not found
t_cellvisitcf_neut_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_ne, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_table_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 't_cellvisitcf_de' not found
t_cellvisitcf_neut_table## Error in eval(expr, envir, enclos): object 't_cellvisitcf_neut_table' not found
t_cellvisitcf_neut_sig <- extract_significant_genes(
t_cellvisitcf_neut_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_cellvisitcf_neut_table' not found
t_cellvisitcf_neut_sig## Error in eval(expr, envir, enclos): object 't_cellvisitcf_neut_sig' not found
t_cellvisitcf_eo_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_eo,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_table_sva-v{ver}.xlsx"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'colors': object 't_cellvisitcf_de' not found
t_cellvisitcf_eo_table## Error in eval(expr, envir, enclos): object 't_cellvisitcf_eo_table' not found
t_cellvisitcf_eo_sig <- extract_significant_genes(
t_cellvisitcf_eo_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_cellvisitcf_eo_table' not found
t_cellvisitcf_eo_sig## Error in eval(expr, envir, enclos): object 't_cellvisitcf_eo_sig' not found
tmp <- loadme(filename = savefile)