The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document. Most of the actual work is via the function ‘all_pairwise()’; the word ‘all’ in the name does a lot of work; it is responsible for performing all possible pairwise contrasts using all possible methods for which I have sufficient understanding to be able to write a reasonably robust pairwise function. Currently this is limited to:
The first 3 methods allow one to add surrogate variable estimates to the model when performing the differential expression analyses. Noiseq handles surrogates using its own heuristics, EBSeq is inimicable to that kind of model, and I explicitly chose to not make that possible for basic. I am uncertain at this time how the random effect factors used with dream interact with surrogates from sva. With that in mind, in most instances I usually deal with surrogates/batches in one of a few ways:
The last two options are handled via a function named ‘all_adjusters’ in hpgltools which is responsible for ensuring that the data is sane for the assumptions made by each method and invokes each method (hopefully) properly. It returns both modified counts and model estimates when possible and has implementations for a fair number of methods in this realm. sva is my favorite by a pretty big margin, though I do sometimes use RUV (Risso et al. (2014)) and of course, in writing this document I stumbled into another interesting contender: (Molania et al. (2023)) all_adjusters() also has implementations of every example/method I got out of the papers for sva (e.g. ssva/fsva), isva, smartsva, and some others.
I have been changing hpgltools so that it is now possible to trivially pass arbitrarily complex models to the various methods; with the caveat that there is no good way currently to mix fixed effects and random effects across methods; so I am running dream separately and adding it to the result of all_pairwise post-facto.
Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus tc_cf_contrasts is expected to be used for the Tumaco+Cali data and provide a series of cure/fail comparisons which (to the extent possible) across both locations. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.
t_cf_contrast <- list(
"outcome" = c("tumaco_failure", "tumaco_cure"))
cf_contrast <- list(
"outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
"v1cf" = c("v1_failure", "v1_cure"),
"v2cf" = c("v2_failure", "v2_cure"),
"v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
"v2v1" = c("c2", "c1"),
"v3v1" = c("c3", "c1"),
"v3v2" = c("c3", "c2"))
visit_v1later <- list(
"later_vs_first" = c("later", "first"))
celltypes <- list(
"eo_mono" = c("eosinophils", "monocytes"),
"ne_mono" = c("neutrophils", "monocytes"),
"eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
"mestizo_indigenous" = c("mestiza", "indigena"),
"mestizo_afrocol" = c("mestiza", "afrocol"),
"indigenous_afrocol" = c("indigena", "afrocol"))
outcometype_contrasts <- list(
"monocyte_cf" = c("failure_monocytes", "cure_monocytes"),
"neutrophil_cf" = c("failure_neutrophils", "cure_neutrophils"),
"eosinophil_cf" = c("failure_eosinophils", "cure_eosinophils"))
visittype_contrasts_mono <- list(
"v2v1_mono_cure" = c("monocytes_2_cure", "monocytes_1_cure"),
"v2v1_mono_failure" = c("monocytes_2_failure", "monocytes_1_failure"),
"v3v1_mono_cure" = c("monocytes_3_cure", "monocytes_1_cure"),
"v3v1_mono_failure" = c("monocytes_3_failure", "monocytes_1_failure"))
visittype_contrasts_eo <- list(
"v2v1_eo_cure" = c("eosinophils_2_cure", "eosinophils_1_cure"),
"v2v1_eo_failure" = c("eosinophils_2_failure", "eosinophils_1_failure"),
"v3v1_eo_cure" = c("eosinophils_3_cure", "eosinophils_1_cure"),
"v3v1_eo_failure" = c("eosinophils_3_failure", "eosinophils_1_failure"))
visittype_contrasts_ne <- list(
"v2v1_ne_cure" = c("neutrophils_2_cure", "neutrophils_1_cure"),
"v2v1_ne_failure" = c("neutrophils_2_failure", "neutrophils_1_failure"),
"v3v1_ne_cure" = c("neutrophils_3_cure", "neutrophils_1_cure"),
"v3v1_ne_failure" = c("neutrophils_3_failure", "neutrophils_1_failure"))
visittype_contrasts <- c(visittype_contrasts_mono,
visittype_contrasts_eo,
visittype_contrasts_ne)Previously, the over representation analyses (e.g. GO and friends) followed each DE analysis during this document. I recently mentally severed my conception of GO analyses into two camps: over representation analyses in which one provides a group of genes deemed significant in some way and asks if there are known categories which contain these genes more than one would expect at random. In contrast, I am defining gene set enrichment analyses explcitly as the process of passing all genes with their metric of choice (logFC, exprs, whatever) and asking if the distribution of all genes is significant with respect to the categories. With that in mind, I added a series of explicitly GSEA analyses in my later iterations of these documents so that both ways of thinking are provided.
However, I moved those analyses to a separate document (05enrichment.Rmd) in the hopes of improving their organization.
Start over, this time with only the samples from Tumaco. We currently are assuming these will prove to be the only analyses used for final interpretation. This is primarily because we have insufficient samples which failed treatment from Cali. There is one disadvantage when using these samples: they had to travel further than the samples taken in Cali and there is significant variance observed between the two locations and we cannot discern its source. In the worst case scenario (one which I think unlikely), the variance is caused by degraded RNA during transit. We do know that the samples were well-stored in RNALater and frozen/etc, so I am inclined to discount that possibility. (Also, looking at the reads in IGV they don’t ‘look’ degreaded to me.) I think a more compelling difference lies in the different population demographics observed in the two locations. Actually, now that I have typed these sentences out, I think I can semi-test this hypothesis by looking at the set of DE genes between the two locations and compare that result to the Tumaco (and/or Cali) ethnicity comparison which is most representative of the ethnicity differences between them. If I get it into my head to try this, I will need to load the DE tables from the 03differential_expression_both.Rmd document; so I am most likely to try it out in the 07var_coef document, which was mostly written by Theresa and is already examining some similar questions.
Start by considering all Tumaco cell types. Note that in this case we only use SVA, primarily because I am not certain what would be an appropriate batch factor, perhaps visit?
t_cf_clinical_de_sva <- all_pairwise(t_clinical, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 67 56
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_clinical <- t_cf_clinical_de_sva[["input"]]
t_cf_clinical_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.8242
## basic_vs_dream 0.8735
## basic_vs_ebseq 0.6341
## basic_vs_edger 0.8285
## basic_vs_limma 0.8704
## basic_vs_noiseq 0.8582
## deseq_vs_dream 0.8464
## deseq_vs_ebseq 0.6981
## deseq_vs_edger 0.9845
## deseq_vs_limma 0.8063
## deseq_vs_noiseq 0.9062
## dream_vs_ebseq 0.7273
## dream_vs_edger 0.8442
## dream_vs_limma 0.9405
## dream_vs_noiseq 0.7922
## ebseq_vs_edger 0.6715
## ebseq_vs_limma 0.6054
## ebseq_vs_noiseq 0.6921
## edger_vs_limma 0.8177
## edger_vs_noiseq 0.9027
## limma_vs_noiseq 0.7652
t_cf_clinical_table_sva <- combine_de_tables(
t_cf_clinical_de_sva, keepers = cf_contrast,
excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 94 183 103 159
## limma_sigup limma_sigdown
## 1 50 38
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]t_cf_clinical_sig_sva <- extract_significant_genes(
t_cf_clinical_table_sva,
excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 50 38 103 159 94 183 0
## ebseq_down basic_up basic_down
## outcome 49 29 6
dim(t_cf_clinical_sig_sva[["deseq"]][["ups"]][[1]])## [1] 94 77
dim(t_cf_clinical_sig_sva[["deseq"]][["downs"]][[1]])## [1] 183 77
Repeat without the biopsies.
t_cf_clinicalnb_de_sva <- all_pairwise(t_clinical_nobiop, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 58 51
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_clinical_nobiop <- t_cf_clinicalnb_de_sva[["input"]]
t_cf_clinicalnb_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.8266
## basic_vs_dream 0.8478
## basic_vs_ebseq 0.7405
## basic_vs_edger 0.8367
## basic_vs_limma 0.8571
## basic_vs_noiseq 0.8916
## deseq_vs_dream 0.8452
## deseq_vs_ebseq 0.8187
## deseq_vs_edger 0.9964
## deseq_vs_limma 0.8463
## deseq_vs_noiseq 0.8874
## dream_vs_ebseq 0.7810
## dream_vs_edger 0.8487
## dream_vs_limma 0.9851
## dream_vs_noiseq 0.7767
## ebseq_vs_edger 0.8142
## ebseq_vs_limma 0.7814
## ebseq_vs_noiseq 0.8561
## edger_vs_limma 0.8506
## edger_vs_noiseq 0.8933
## limma_vs_noiseq 0.7865
t_cf_clinicalnb_table_sva <- combine_de_tables(
t_cf_clinicalnb_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinicalnb_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 140 75 142 67
## limma_sigup limma_sigdown
## 1 54 46
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinicalnb_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]t_cf_clinicalnb_sig_sva <- extract_significant_genes(
t_cf_clinicalnb_table_sva,
excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinicalnb_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 54 46 142 67 140 75 1
## ebseq_down basic_up basic_down
## outcome 7 83 30
dim(t_cf_clinicalnb_sig_sva[["deseq"]][["ups"]][[1]])## [1] 140 84
dim(t_cf_clinicalnb_sig_sva[["deseq"]][["downs"]][[1]])## [1] 75 84
As the data structure’s name suggests, the above comparison seeks to learn if there are fail/cure differences discernable across all clinical celltypes in samples taken in Tumaco.
The set of steps taken in this previous block will be essentially repeated for every set of contrasts and way of mixing/matching the data and follows the path:
These datastructures are all exposed to various functions in hpgltools which allow one to poke/compare them; I am not a fan of Excel, but I think the xlsx documents it creates are pretty decent, too.
Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.
tv1_vs_later <- all_pairwise(t_v1vs, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## first later
## 40 69
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_v1vs <- tv1_vs_later[["input"]]
tv1_vs_later## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## ltr_vs_frs
## basic_vs_deseq 0.7946
## basic_vs_dream 0.8022
## basic_vs_ebseq 0.7513
## basic_vs_edger 0.7983
## basic_vs_limma 0.8133
## basic_vs_noiseq 0.8895
## deseq_vs_dream 0.8498
## deseq_vs_ebseq 0.7809
## deseq_vs_edger 0.9983
## deseq_vs_limma 0.8394
## deseq_vs_noiseq 0.8587
## dream_vs_ebseq 0.8100
## dream_vs_edger 0.8564
## dream_vs_limma 0.9717
## dream_vs_noiseq 0.7516
## ebseq_vs_edger 0.7868
## ebseq_vs_limma 0.7791
## ebseq_vs_noiseq 0.8284
## edger_vs_limma 0.8457
## edger_vs_noiseq 0.8626
## limma_vs_noiseq 0.7433
tv1_vs_later_table <- combine_de_tables(
tv1_vs_later, keepers = visit_v1later, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_tables-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
tv1_vs_later_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 later_vs_first 24 7 22 7
## limma_sigup limma_sigdown
## 1 23 7
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tv1_vs_later_sig <- extract_significant_genes(
tv1_vs_later_table,
excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_sig-v{ver}.xlsx"))
tv1_vs_later_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## later_vs_first 23 7 22 7 24 7
## ebseq_up ebseq_down basic_up basic_down
## later_vs_first 0 0 0 3
There is an important caveat when considering the sex of people in the study: there are very few females who failed. As a result I primarily concerned with the cure samples male/female.
t_sex <- subset_expt(tc_sex, subset = "clinic == 'tumaco'")## subset_expt(): There were 184, now there are 123 samples.
t_sex## A modified expressionSet containing 19952 and 123 sample. There are 164 metadata columns and 15 annotation columns.
## The primary condition is comprised of:
## female, male.
## Its current state is: raw(data).
t_sex_de <- all_pairwise(t_sex, model_batch = "svaseq", methods = methods,
filter = TRUE)##
## female male
## 22 101
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_sex <- t_sex_de[["input"]]
t_sex_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## basic_vs_deseq 0.8703
## basic_vs_dream 0.9363
## basic_vs_ebseq 0.7161
## basic_vs_edger 0.8748
## basic_vs_limma 0.9481
## basic_vs_noiseq 0.8530
## deseq_vs_dream 0.8815
## deseq_vs_ebseq 0.7608
## deseq_vs_edger 0.9909
## deseq_vs_limma 0.8596
## deseq_vs_noiseq 0.9120
## dream_vs_ebseq 0.7985
## dream_vs_edger 0.8862
## dream_vs_limma 0.9769
## dream_vs_noiseq 0.8273
## ebseq_vs_edger 0.7802
## ebseq_vs_limma 0.7762
## ebseq_vs_noiseq 0.7579
## edger_vs_limma 0.8663
## edger_vs_noiseq 0.9116
## limma_vs_noiseq 0.8154
t_sex_table <- combine_de_tables(
t_sex_de, scale_p = TRUE,
excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_table-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_sex_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 129 96 116 95
## limma_sigup limma_sigdown
## 1 54 74
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_sex_sig <- extract_significant_genes(
t_sex_table, excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_sig-v{ver}.xlsx"))
t_sex_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 54 74 116 95 129 96
## ebseq_up ebseq_down basic_up basic_down
## male_vs_female 12 13 18 11
In the following block I removed the failed people so that the comparison makes actual sense.
tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")## subset_expt(): There were 184, now there are 122 samples.
t_sex_cure <- subset_expt(tc_sex_cure, subset = "clinic == 'tumaco'")## subset_expt(): There were 122, now there are 67 samples.
t_sex_cure_de <- all_pairwise(t_sex_cure, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## female male
## 13 54
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_sex_cure <- t_sex_cure_de[["input"]]
t_sex_cure_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## mal_vs_fml
## basic_vs_deseq 0.7995
## basic_vs_dream 0.9214
## basic_vs_ebseq 0.6679
## basic_vs_edger 0.8474
## basic_vs_limma 0.9284
## basic_vs_noiseq 0.8792
## deseq_vs_dream 0.8093
## deseq_vs_ebseq 0.7225
## deseq_vs_edger 0.9294
## deseq_vs_limma 0.7804
## deseq_vs_noiseq 0.8453
## dream_vs_ebseq 0.7812
## dream_vs_edger 0.8625
## dream_vs_limma 0.9698
## dream_vs_noiseq 0.8411
## ebseq_vs_edger 0.7687
## ebseq_vs_limma 0.7446
## ebseq_vs_noiseq 0.7109
## edger_vs_limma 0.8380
## edger_vs_noiseq 0.8881
## limma_vs_noiseq 0.8149
t_sex_cure_table <- combine_de_tables(
t_sex_cure_de, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_table-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_sex_cure_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female 176 134 162 143
## limma_sigup limma_sigdown
## 1 64 108
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_sex_cure_sig <- extract_significant_genes(
t_sex_cure_table, excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_sig-v{ver}.xlsx"))
t_sex_cure_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female 64 108 162 143 176 134
## ebseq_up ebseq_down basic_up basic_down
## male_vs_female 11 15 14 5
In a fashion similar to the putative sex comparisons; there are few/no fails for one ethnicity. In addition, the observed ethnicities are very different for the two clinics. This makes comparisons of the ethnicities tricky.
t_ethnicity_de <- all_pairwise(t_etnia_expt, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## afrocol indigena mestiza
## 76 19 28
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_etnia_expt <- t_ethnicity_de[["input"]]
t_ethnicity_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_ethnicity_table <- combine_de_tables(
t_ethnicity_de, keepers = ethnicity_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_table-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_ethnicity_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena 83 97 67 108
## 2 mestiza_vs_afrocol 57 92 52 96
## 3 indigena_vs_afrocol 165 236 187 216
## limma_sigup limma_sigdown
## 1 58 56
## 2 42 53
## 3 165 147
## Plot describing unique/shared genes in a differential expression table.
t_ethnicity_sig <- extract_significant_genes(
t_ethnicity_table, according_to = "deseq",
excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_sig-v{ver}.xlsx"))
t_ethnicity_sig## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## mestizo_indigenous 83 97
## mestizo_afrocol 57 92
## indigenous_afrocol 165 236
One of the most compelling ideas in the data is the opportunity to find genes in the first visit which may help predict the likelihood that a person will respond well to treatment. The following block will therefore look at cure/fail from Tumaco at visit 1.
t_cf_clinical_v1_de_sva <- all_pairwise(tv1_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 30 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv1_samples <- t_cf_clinical_v1_de_sva[["input"]]
t_cf_clinical_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.6955
## basic_vs_dream 0.7310
## basic_vs_ebseq 0.6519
## basic_vs_edger 0.7228
## basic_vs_limma 0.6886
## basic_vs_noiseq 0.8245
## deseq_vs_dream 0.7917
## deseq_vs_ebseq 0.7127
## deseq_vs_edger 0.9537
## deseq_vs_limma 0.7398
## deseq_vs_noiseq 0.7815
## dream_vs_ebseq 0.6921
## dream_vs_edger 0.8274
## dream_vs_limma 0.9332
## dream_vs_noiseq 0.6911
## ebseq_vs_edger 0.6798
## ebseq_vs_limma 0.5529
## ebseq_vs_noiseq 0.7747
## edger_vs_limma 0.7829
## edger_vs_noiseq 0.7899
## limma_vs_noiseq 0.5978
t_cf_clinical_v1_table_sva <- combine_de_tables(
t_cf_clinical_v1_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 27 75 28 55
## limma_sigup limma_sigdown
## 1 3 3
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v1_sig_sva <- extract_significant_genes(
t_cf_clinical_v1_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 3 3 28 55 27 75 0
## ebseq_down basic_up basic_down
## outcome 37 0 0
dim(t_cf_clinical_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 27 84
dim(t_cf_clinical_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 75 84
The visit 2 and visit 3 samples are interesting because they provide an opportunity to see if we can observe changes in response in the middle and end of treatment…
t_cf_clinical_v2_de_sva <- all_pairwise(tv2_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 20 15
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv2_samples <- t_cf_clinical_v2_de_sva[["input"]]
t_cf_clinical_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7689
## basic_vs_dream 0.7173
## basic_vs_ebseq 0.7215
## basic_vs_edger 0.7701
## basic_vs_limma 0.7404
## basic_vs_noiseq 0.8528
## deseq_vs_dream 0.8053
## deseq_vs_ebseq 0.7893
## deseq_vs_edger 0.9986
## deseq_vs_limma 0.8138
## deseq_vs_noiseq 0.8412
## dream_vs_ebseq 0.6823
## dream_vs_edger 0.8077
## dream_vs_limma 0.9633
## dream_vs_noiseq 0.6034
## ebseq_vs_edger 0.7929
## ebseq_vs_limma 0.6902
## ebseq_vs_noiseq 0.8218
## edger_vs_limma 0.8162
## edger_vs_noiseq 0.8401
## limma_vs_noiseq 0.6291
t_cf_clinical_v2_table_sva <- combine_de_tables(
t_cf_clinical_v2_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 51 15 50 11
## limma_sigup limma_sigdown
## 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v2_sig_sva <- extract_significant_genes(
t_cf_clinical_v2_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 50 11 51 15 0
## ebseq_down basic_up basic_down
## outcome 0 0 0
dim(t_cf_clinical_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 51 84
dim(t_cf_clinical_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 15 84
Repeat for visit 3
t_cf_clinical_v3_de_sva <- all_pairwise(tv3_samples, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## cure failure
## 17 17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv3_samples <- t_cf_clinical_v3_de_sva[["input"]]
t_cf_clinical_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7969
## basic_vs_dream 0.8072
## basic_vs_ebseq 0.7585
## basic_vs_edger 0.8030
## basic_vs_limma 0.8193
## basic_vs_noiseq 0.8988
## deseq_vs_dream 0.8559
## deseq_vs_ebseq 0.8006
## deseq_vs_edger 0.9978
## deseq_vs_limma 0.8530
## deseq_vs_noiseq 0.8716
## dream_vs_ebseq 0.7661
## dream_vs_edger 0.8635
## dream_vs_limma 0.9817
## dream_vs_noiseq 0.7378
## ebseq_vs_edger 0.8040
## ebseq_vs_limma 0.7614
## ebseq_vs_noiseq 0.8465
## edger_vs_limma 0.8605
## edger_vs_noiseq 0.8769
## limma_vs_noiseq 0.7409
t_cf_clinical_v3_table_sva <- combine_de_tables(
t_cf_clinical_v3_de_sva, keepers = cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 120 61 120 50
## limma_sigup limma_sigdown
## 1 3 1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_clinical_v3_sig_sva <- extract_significant_genes(
t_cf_clinical_v3_table_sva,
excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 3 1 120 50 120 61 0
## ebseq_down basic_up basic_down
## outcome 0 0 0
dim(t_cf_clinical_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 120 84
dim(t_cf_clinical_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 61 84
Now let us switch our view to each individual cell type collected. The hope here is that we will be able to learn some cell-specific differences in the response for people who did(not) respond well.
A primary hypothesis/assumption that we have held for quite a while with this data: the biopsy samples, given that they are comprised of hetergeneous tissue types as well as a mix of healthy and infected tissue; are unlikely to be very information rich vis a vis cure/fail. The following block seems to support that; we observe very few genes in the biopsies.
I therefore did not spend the time invoking other models.
t_cf_biopsy_de_sva <- all_pairwise(t_biopsies, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 9 5
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_biopsies <- t_cf_biopsy_de_sva[["input"]]
t_cf_biopsy_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8164
## basic_vs_dream 0.8519
## basic_vs_ebseq 0.8011
## basic_vs_edger 0.8809
## basic_vs_limma 0.8497
## basic_vs_noiseq 0.9162
## deseq_vs_dream 0.7992
## deseq_vs_ebseq 0.8628
## deseq_vs_edger 0.9516
## deseq_vs_limma 0.7927
## deseq_vs_noiseq 0.8685
## dream_vs_ebseq 0.7538
## dream_vs_edger 0.8689
## dream_vs_limma 0.9937
## dream_vs_noiseq 0.7760
## ebseq_vs_edger 0.8843
## ebseq_vs_limma 0.7354
## ebseq_vs_noiseq 0.8872
## edger_vs_limma 0.8628
## edger_vs_noiseq 0.9181
## limma_vs_noiseq 0.7668
t_cf_biopsy_table_sva <- combine_de_tables(
t_cf_biopsy_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Biopsies/t_biopsy_cf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_biopsy_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 17 11 19
## edger_sigdown limma_sigup limma_sigdown
## 1 15 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_biopsy_sig_sva <- extract_significant_genes(
t_cf_biopsy_table_sva,
excel = glue("{cf_prefix}/Biopsies/t_cf_biopsy_sig_sva-v{ver}.xlsx"))
t_cf_biopsy_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 19 15 17 11 11
## ebseq_down basic_up basic_down
## outcome 57 0 0
dim(t_cf_biopsy_sig_sva[["deseq"]][["ups"]][[1]])## [1] 17 84
dim(t_cf_biopsy_sig_sva[["deseq"]][["downs"]][[1]])## [1] 11 84
Same question, but this time looking at monocytes. In addition, this comparison was done twice, once using SVA and once using visit as a batch factor.
I have been using this block to ensure that changed I have been making to the hpgltools do not change the analysis results. Thus the comment with a few logFC values; those are the first 6 observed DESeq2 logFC values in my last result before I made some changes to hpgltools in order to be able to work with random effect models.
t_cf_monocyte_de_sva <- all_pairwise(t_monocytes, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## The svs are added to the expressionset during all_pairwise.
t_monocytes <- t_cf_monocyte_de_sva[["input"]]
t_cf_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8506
## basic_vs_dream 0.9183
## basic_vs_ebseq 0.8470
## basic_vs_edger 0.8560
## basic_vs_limma 0.9210
## basic_vs_noiseq 0.9525
## deseq_vs_dream 0.8713
## deseq_vs_ebseq 0.8556
## deseq_vs_edger 0.9989
## deseq_vs_limma 0.8614
## deseq_vs_noiseq 0.8955
## dream_vs_ebseq 0.7883
## dream_vs_edger 0.8755
## dream_vs_limma 0.9910
## dream_vs_noiseq 0.8827
## ebseq_vs_edger 0.8563
## ebseq_vs_limma 0.7794
## ebseq_vs_noiseq 0.8874
## edger_vs_limma 0.8663
## edger_vs_noiseq 0.9000
## limma_vs_noiseq 0.8720
t_cf_monocyte_table_sva <- combine_de_tables(
t_cf_monocyte_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 60 52 56
## edger_sigdown limma_sigup limma_sigdown
## 1 51 11 34
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
head(t_cf_monocyte_table_sva[["data"]][["outcome"]][["deseq_logfc"]])## [1] 0.33760 -0.07193 0.09665 -0.09082 -0.13500 0.23270
## The first few values in my pre-change result set are:
## 0.338, -0.072, 0.097, -0.091, -0.135, 0.233
t_cf_monocyte_sig_sva <- extract_significant_genes(
t_cf_monocyte_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 11 34 56 51 60 52 0
## ebseq_down basic_up basic_down
## outcome 23 168 197
dim(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][[1]])## [1] 60 84
dim(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][[1]])## [1] 52 84
t_cf_monocyte_de_batchvisit <- all_pairwise(t_monocytes, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 21 21
##
## 3 2 1
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_monocyte_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8505
## basic_vs_dream 0.9414
## basic_vs_ebseq 0.8470
## basic_vs_edger 0.8540
## basic_vs_limma 0.9509
## basic_vs_noiseq 0.9525
## deseq_vs_dream 0.8178
## deseq_vs_ebseq 0.9932
## deseq_vs_edger 0.9998
## deseq_vs_limma 0.8120
## deseq_vs_noiseq 0.8857
## dream_vs_ebseq 0.8016
## dream_vs_edger 0.8202
## dream_vs_limma 0.9819
## dream_vs_noiseq 0.9085
## ebseq_vs_edger 0.9935
## ebseq_vs_limma 0.7952
## ebseq_vs_noiseq 0.8874
## edger_vs_limma 0.8150
## edger_vs_noiseq 0.8884
## limma_vs_noiseq 0.9004
t_cf_monocyte_table_batchvisit <- combine_de_tables(
t_cf_monocyte_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_batchvisit-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 43 93 47
## edger_sigdown limma_sigup limma_sigdown
## 1 105 6 13
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_monocyte_sig_batchvisit <- extract_significant_genes(
t_cf_monocyte_table_batchvisit,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_monocyte_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 6 13 47 105 43 93 0
## ebseq_down basic_up basic_down
## outcome 23 168 197
dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 43 84
dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 93 84
Now focus in on the monocyte samples on a per-visit basis.
t_cf_monocyte_v1_de_sva <- all_pairwise(tv1_monocytes, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 8 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tv1_monocytes <- t_cf_monocyte_v1_de_sva[["input"]]
t_cf_monocyte_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8866
## basic_vs_dream 0.9307
## basic_vs_ebseq 0.9060
## basic_vs_edger 0.8870
## basic_vs_limma 0.9440
## basic_vs_noiseq 0.9486
## deseq_vs_dream 0.8965
## deseq_vs_ebseq 0.8945
## deseq_vs_edger 0.9999
## deseq_vs_limma 0.8902
## deseq_vs_noiseq 0.9130
## dream_vs_ebseq 0.8365
## dream_vs_edger 0.8965
## dream_vs_limma 0.9832
## dream_vs_noiseq 0.8987
## ebseq_vs_edger 0.8950
## ebseq_vs_limma 0.8280
## ebseq_vs_noiseq 0.9484
## edger_vs_limma 0.8905
## edger_vs_noiseq 0.9134
## limma_vs_noiseq 0.8847
t_cf_monocyte_v1_table_sva <- combine_de_tables(
t_cf_monocyte_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 14 52 15
## edger_sigdown limma_sigup limma_sigdown
## 1 57 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_monocyte_v1_sig_sva <- extract_significant_genes(
t_cf_monocyte_v1_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 15 57 14 52 0
## ebseq_down basic_up basic_down
## outcome 15 0 0
dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 14 84
dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 52 84
sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][[1]],
tbl2 = t_cf_monocyte_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.694200173169544 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 182, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8633 0.8726
## sample estimates:
## cor
## 0.8681
shared_ids <- rownames(t_cf_monocyte_table_sva[["data"]][[1]]) %in%
rownames(t_cf_monocyte_table_batchvisit[["data"]][[1]])
first <- t_cf_monocyte_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_monocyte_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 182, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8633 0.8726
## sample estimates:
## cor
## 0.8681
Switch context to the Neutrophils, once again repeat the analysis using SVA and visit as a batch factor.
t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 20 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8754
## basic_vs_dream 0.9210
## basic_vs_ebseq 0.8587
## basic_vs_edger 0.8812
## basic_vs_limma 0.9321
## basic_vs_noiseq 0.9457
## deseq_vs_dream 0.8840
## deseq_vs_ebseq 0.9062
## deseq_vs_edger 0.9994
## deseq_vs_limma 0.8742
## deseq_vs_noiseq 0.9367
## dream_vs_ebseq 0.8365
## dream_vs_edger 0.8882
## dream_vs_limma 0.9861
## dream_vs_noiseq 0.8922
## ebseq_vs_edger 0.9068
## ebseq_vs_limma 0.8430
## ebseq_vs_noiseq 0.9212
## edger_vs_limma 0.8784
## edger_vs_noiseq 0.9404
## limma_vs_noiseq 0.8943
t_cf_neutrophil_table_sva <- combine_de_tables(
t_cf_neutrophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 130 30 120
## edger_sigdown limma_sigup limma_sigdown
## 1 27 12 12
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_sig_sva <- extract_significant_genes(
t_cf_neutrophil_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 12 12 120 27 130 30 7
## ebseq_down basic_up basic_down
## outcome 7 7 3
dim(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][[1]])## [1] 130 84
dim(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][[1]])## [1] 30 84
t_cf_neutrophil_de_batchvisit <- all_pairwise(t_neutrophils, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 20 21
##
## 3 2 1
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8644
## basic_vs_dream 0.9574
## basic_vs_ebseq 0.8587
## basic_vs_edger 0.8671
## basic_vs_limma 0.9658
## basic_vs_noiseq 0.9457
## deseq_vs_dream 0.8356
## deseq_vs_ebseq 0.9813
## deseq_vs_edger 0.9999
## deseq_vs_limma 0.8380
## deseq_vs_noiseq 0.9184
## dream_vs_ebseq 0.8264
## dream_vs_edger 0.8377
## dream_vs_limma 0.9840
## dream_vs_noiseq 0.9157
## ebseq_vs_edger 0.9818
## ebseq_vs_limma 0.8284
## ebseq_vs_noiseq 0.9212
## edger_vs_limma 0.8401
## edger_vs_noiseq 0.9204
## limma_vs_noiseq 0.9125
t_cf_neutrophil_table_batchvisit <- combine_de_tables(
t_cf_neutrophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_batchvisit-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 92 47 101
## edger_sigdown limma_sigup limma_sigdown
## 1 44 3 1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_sig_batchvisit <- extract_significant_genes(
t_cf_neutrophil_table_batchvisit,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_neutrophil_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 3 1 101 44 92 47 7
## ebseq_down basic_up basic_down
## outcome 7 7 3
dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 92 84
dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 47 84
When I did this with the monocytes, I split it up into multiple blocks for each visit. This time I am just going to run them all together.
visitcf_factor <- paste0("v", pData(t_neutrophils)[["visitnumber"]], "_",
pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact=visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
t_cf_neutrophil_visits_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_visits_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_neutrophil_visits_table_sva <- combine_de_tables(
t_cf_neutrophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_visits_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 12 6 6 6
## 2 v2_failure_vs_v2_cure 2 6 2 3
## 3 v3_failure_vs_v3_cure 2 2 0 2
## limma_sigup limma_sigdown
## 1 1 0
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_visits_sig_sva <- extract_significant_genes(
t_cf_neutrophil_visits_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_visits_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 0 6 6 12 6 0
## v2cf 0 0 2 3 2 6 1
## v3cf 0 0 0 2 2 2 2
## ebseq_down basic_up basic_down
## v1cf 2 0 0
## v2cf 1 0 0
## v3cf 3 0 0
dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["ups"]][[1]])## [1] 12 84
dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["downs"]][[1]])## [1] 6 84
Now V1
t_cf_neutrophil_v1_de_sva <- all_pairwise(tv1_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 8 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8627
## basic_vs_dream 0.8835
## basic_vs_ebseq 0.8706
## basic_vs_edger 0.8792
## basic_vs_limma 0.8909
## basic_vs_noiseq 0.9312
## deseq_vs_dream 0.8208
## deseq_vs_ebseq 0.9418
## deseq_vs_edger 0.9946
## deseq_vs_limma 0.8180
## deseq_vs_noiseq 0.9183
## dream_vs_ebseq 0.7912
## dream_vs_edger 0.8365
## dream_vs_limma 0.9761
## dream_vs_noiseq 0.8407
## ebseq_vs_edger 0.9421
## ebseq_vs_limma 0.7986
## ebseq_vs_noiseq 0.9456
## edger_vs_limma 0.8319
## edger_vs_noiseq 0.9251
## limma_vs_noiseq 0.8360
t_cf_neutrophil_v1_table_sva <- combine_de_tables(
t_cf_neutrophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 5 8 5
## edger_sigdown limma_sigup limma_sigdown
## 1 11 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v1_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v1_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v1_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 5 11 5 8 0
## ebseq_down basic_up basic_down
## outcome 2 0 0
dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 5 84
dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 8 84
Followed by visit 2.
t_cf_neutrophil_v2_de_sva <- all_pairwise(tv2_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 7 6
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.9021
## basic_vs_dream 0.9514
## basic_vs_ebseq 0.8977
## basic_vs_edger 0.9026
## basic_vs_limma 0.9485
## basic_vs_noiseq 0.9224
## deseq_vs_dream 0.8964
## deseq_vs_ebseq 0.9777
## deseq_vs_edger 0.9986
## deseq_vs_limma 0.8893
## deseq_vs_noiseq 0.9631
## dream_vs_ebseq 0.8740
## dream_vs_edger 0.8948
## dream_vs_limma 0.9938
## dream_vs_noiseq 0.8987
## ebseq_vs_edger 0.9754
## ebseq_vs_limma 0.8654
## ebseq_vs_noiseq 0.9756
## edger_vs_limma 0.8880
## edger_vs_noiseq 0.9613
## limma_vs_noiseq 0.8903
t_cf_neutrophil_v2_table_sva <- combine_de_tables(
t_cf_neutrophil_v2_de_sva, scale_p = TRUE, keepers = t_cf_contrast,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 9 3 20
## edger_sigdown limma_sigup limma_sigdown
## 1 6 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v2_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v2_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 20 6 9 3 1
## ebseq_down basic_up basic_down
## outcome 1 0 0
dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 9 84
dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 3 84
and visit 3.
t_cf_neutrophil_v3_de_sva <- all_pairwise(tv3_neutrophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_neutrophil_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.7528
## basic_vs_dream 0.7868
## basic_vs_ebseq 0.8738
## basic_vs_edger 0.7514
## basic_vs_limma 0.7952
## basic_vs_noiseq 0.9212
## deseq_vs_dream 0.8919
## deseq_vs_ebseq 0.7550
## deseq_vs_edger 0.9993
## deseq_vs_limma 0.8849
## deseq_vs_noiseq 0.8275
## dream_vs_ebseq 0.7659
## dream_vs_edger 0.8932
## dream_vs_limma 0.9848
## dream_vs_noiseq 0.7798
## ebseq_vs_edger 0.7594
## ebseq_vs_limma 0.7499
## ebseq_vs_noiseq 0.9516
## edger_vs_limma 0.8859
## edger_vs_noiseq 0.8291
## limma_vs_noiseq 0.7658
t_cf_neutrophil_v3_table_sva <- combine_de_tables(
t_cf_neutrophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 5 1 5
## edger_sigdown limma_sigup limma_sigdown
## 1 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_neutrophil_v3_sig_sva <- extract_significant_genes(
t_cf_neutrophil_v3_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 5 1 5 1 2
## ebseq_down basic_up basic_down
## outcome 3 0 0
dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 5 84
dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 1 84
sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][[1]],
tbl2 = t_cf_neutrophil_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.673209505652166 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 209, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9060 0.9131
## sample estimates:
## cor
## 0.9096
shared_ids <- rownames(t_cf_neutrophil_table_sva[["data"]][[1]]) %in%
rownames(t_cf_neutrophil_table_batchvisit[["data"]][[1]])
first <- t_cf_neutrophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_neutrophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 209, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9060 0.9131
## sample estimates:
## cor
## 0.9096
This time, with feeling! Repeating the same set of tasks with the eosinophil samples.
t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 17 9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8488
## basic_vs_dream 0.8573
## basic_vs_ebseq 0.8636
## basic_vs_edger 0.8546
## basic_vs_limma 0.8756
## basic_vs_noiseq 0.9094
## deseq_vs_dream 0.9218
## deseq_vs_ebseq 0.8058
## deseq_vs_edger 0.9973
## deseq_vs_limma 0.9099
## deseq_vs_noiseq 0.8693
## dream_vs_ebseq 0.7957
## dream_vs_edger 0.9290
## dream_vs_limma 0.9842
## dream_vs_noiseq 0.8409
## ebseq_vs_edger 0.8134
## ebseq_vs_limma 0.8005
## ebseq_vs_noiseq 0.8986
## edger_vs_limma 0.9174
## edger_vs_noiseq 0.8773
## limma_vs_noiseq 0.8128
t_cf_eosinophil_table_sva <- combine_de_tables(
t_cf_eosinophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 116 75 112
## edger_sigdown limma_sigup limma_sigdown
## 1 63 57 34
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_sig_sva <- extract_significant_genes(
t_cf_eosinophil_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 57 34 112 63 116 75 7
## ebseq_down basic_up basic_down
## outcome 33 0 0
dim(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]])## [1] 116 84
dim(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]])## [1] 75 84
knitr::kable(head(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000198178 | ENSG00000198178 | ENST00000537530 | 10 | 1 | C-type lectin domain family 4 member C [Source:HGNC Symbol;Acc:HGNC:13258] | protein_coding | 267 | 12 | - | 7729415 | 7751605 | CLEC4C | CLEC4C | ENSG00000198178.1 | 510.5 | 3.920 | 0.1398 | 5.537 | 2e-03 | 4.768 | 0.0117 | 2.072 | 0.8248 | 5.152 | 0.1934 | 4.225 | 0.0191 | 2.247 | 1 | 2.0400 | -2.764 | 8.784 | 8.946 | 0.0012 | 4.804 | 193.80 | 1.3030 | 4.249 | 0e+00 | 9.617 | 4.0806 | -1.5610 | 5.030 | 0.0000 | 1.3490 | 4.205 | 89.396 | 375.93 | 188.58 | 4.095 | 0.8248 | 0.1752 | 2.2170 | 5.953 | 0.0147 | -1.4190 | 4.4420 | 0.0002 | 0.1256 | 366.62 | 77.257 | 221.94 | 1.036 | 0.9139 | 0.0861 | 0.0158 | -1.2450 | 0.0123 | -1.3020 | 0.0017 | -1.2450 | 0.1912 | -1.2450 | 0.1527 | 0.8604 | 1 | 13.140 | 0.9751 | -1.451 | 4.982 | 4.880e-03 | 9.796e-04 | 4.956e-03 | 7.107e-05 |
| ENSG00000187569 | ENSG00000187569 | ENST00000345088 | 3 | 3 | developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199] | protein_coding | 480 | 12 | + | 7711433 | 7717559 | DPPA3 | DPPA3 | ENSG00000187569.3 | 480 | 3.662 | 0.1989 | 5.448 | 7e-03 | 3.969 | 0.0408 | 4.587 | 0.0605 | 4.721 | 0.0547 | 3.504 | 0.0470 | 3.622 | 1 | -0.6872 | -4.301 | 7.263 | 2.839 | 0.0035 | 3.614 | 23.10 | 1.4300 | 3.810 | 1e-04 | 5.866 | 0.4183 | -3.6360 | 3.937 | 0.0006 | -1.1340 | 24.038 | 2.470 | 59.59 | 22.24 | 21.893 | 0.0605 | 0.9395 | -0.5913 | 9.779 | 0.0018 | -3.3720 | 3.6990 | 0.0011 | -1.6550 | 56.07 | 4.555 | 30.31 | 2.207 | 0.9935 | 0.0065 | 0.0339 | -1.2420 | 0.0334 | -1.3000 | 0.0049 | -1.2420 | 0.0550 | -1.2420 | 0.7668 | 5.8330 | 1 | 9.882 | 0.9751 | -1.704 | 4.443 | 4.098e-01 | 9.225e-02 | 9.866e-04 | 6.647e-07 |
| ENSG00000136235 | ENSG00000136235 | ENST00000479625 | 16 | 1 | glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462] | protein_coding | undefined | 7 | + | 23235967 | 23275108 | GPNMB | GPNMB | ENSG00000136235.1 | 1447.5 | 2.102 | 0.4987 | 5.410 | 4e-04 | 4.617 | 0.0900 | 5.629 | 0.8546 | 5.374 | 0.0001 | 3.867 | 0.1754 | 4.475 | 1 | -1.1190 | -3.695 | 12.101 | 2.665 | 0.0621 | 2.576 | 53.03 | 1.1380 | 4.752 | 0e+00 | 6.906 | 1.4965 | -3.2100 | 3.221 | 0.0035 | -2.1580 | 49.486 | 2.881 | 143.07 | 51.41 | 39.921 | 0.8546 | 0.1454 | 0.4580 | 25.540 | 0.0000 | -3.2370 | 2.5210 | 0.0184 | -3.2570 | 125.19 | 5.631 | 65.41 | 2.264 | 0.9978 | 0.0022 | 0.1386 | -1.1860 | 0.0918 | -1.2910 | 0.0002 | -1.1860 | 0.0001 | -1.1860 | 0.1253 | 0.6664 | 1 | 7.044 | 0.6634 | -1.718 | 4.798 | 5.245e-02 | 1.093e-02 | 6.124e-03 | 1.125e-04 |
| ENSG00000089012 | ENSG00000089012 | ENST00000497407 | 14 | 2 | signal regulatory protein gamma [Source:HGNC Symbol;Acc:HGNC:15757] | protein_coding | undefined | 20 | - | 1629152 | 1657779 | SIRPG | SIRPG | ENSG00000089012.2 | 880.8 | 1.974 | 0.5427 | 4.040 | 0e+00 | 1.758 | 0.6538 | 5.912 | 0.7384 | 4.018 | 0.0000 | 1.598 | 0.6625 | 5.479 | 0 | 0.8317 | -1.681 | 13.876 | 1.355 | 0.0805 | 2.513 | 272.50 | 0.7310 | 5.526 | 0e+00 | 7.574 | 3.5336 | -1.1950 | 1.093 | 0.2846 | -4.9740 | 60.217 | 12.007 | 723.63 | 258.34 | 50.266 | 0.7384 | 0.2616 | 2.7060 | 32.750 | 0.0000 | -1.1480 | 0.9807 | 0.3360 | -5.0020 | 771.37 | 17.288 | 394.33 | 3.129 | 1.0000 | 0.0000 | 0.5902 | -0.1572 | 0.5723 | -0.3745 | 0.0000 | -0.1572 | 0.0000 | -0.1572 | 0.2245 | 1.4220 | 1 | 6.872 | 0.0000 | -1.725 | 3.148 | 1.579e+00 | 5.016e-01 | 1.120e-01 | 3.763e-02 |
| ENSG00000089127 | ENSG00000089127 | ENST00000540589 | 13 | 2 | 2’-5’-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086] | protein_coding | 68 | 12 | + | 112906783 | 112933222 | OAS1 | OAS1 | ENSG00000089127.2 | 682.8 | 3.284 | 0.2669 | 3.933 | 0e+00 | 3.518 | 0.0562 | 4.691 | 0.0845 | 3.943 | 0.0000 | 3.339 | 0.0596 | 4.036 | 1 | 1.9510 | -1.301 | 8.237 | 1.116 | 0.0092 | 3.252 | 184.60 | 0.5478 | 7.180 | 0e+00 | 7.841 | 3.9081 | -0.5641 | 3.632 | 0.0012 | -0.9652 | 25.834 | 18.535 | 479.09 | 177.96 | 23.947 | 0.0845 | 0.9155 | 2.1560 | 44.580 | 0.0000 | -0.4596 | 3.4950 | 0.0018 | -1.3000 | 410.17 | 25.003 | 217.58 | 2.621 | 0.9894 | 0.0106 | 0.0493 | -1.2400 | 0.0529 | -1.2980 | 0.0000 | -1.2400 | 0.0000 | -1.2400 | 0.7325 | 5.6770 | 1 | 8.893 | 0.9751 | -1.691 | 3.722 | 1.794e-02 | 4.820e-03 | 5.900e-04 | 1.044e-06 |
| ENSG00000137959 | ENSG00000137959 | ENST00000450498 | 16 | 1 | interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817] | protein_coding | 699 | 1 | + | 78619922 | 78646145 | IFI44L | IFI44L | ENSG00000137959.1 | 783.333333333333 | 3.909 | 0.1645 | 3.828 | 0e+00 | 3.369 | 0.0199 | 4.022 | 0.7568 | 3.831 | 0.0000 | 3.443 | 0.0123 | 4.334 | 0 | 5.5560 | 1.793 | 6.584 | 3.318 | 0.0020 | 3.763 | 1932.00 | 0.5401 | 7.087 | 0e+00 | 11.304 | 7.4755 | 2.8960 | 4.483 | 0.0001 | 1.0980 | 16.246 | 295.965 | 4808.31 | 1857.93 | 14.896 | 0.7568 | 0.2432 | 5.4900 | 57.400 | 0.0000 | 3.0090 | 4.7380 | 0.0001 | 1.6850 | 5616.14 | 278.525 | 2947.33 | 3.056 | 1.0000 | 0.0000 | 0.0135 | -1.2450 | 0.0216 | -1.3020 | 0.0000 | -1.2450 | 0.0000 | -1.2450 | 0.2506 | 1.3030 | 1 | 10.290 | 0.0000 | -1.725 | 3.691 | 2.660e-03 | 7.207e-04 | 2.392e-05 | 1.716e-09 |
knitr::kable(head(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000179344 | ENSG00000179344 | ENST00000399084 | 16 | 5 | major histocompatibility complex, class II, DQ beta 1 [Source:HGNC Symbol;Acc:HGNC:4944] | protein_coding | 786 | 6 | - | 32659467 | 32668383 | HLA-DQB1 | HLA-DQB1 | ENSG00000179344.5 | 645.5 | -5.227 | 0.0515 | -5.676 | 0.0000 | -7.319 | 0.0138 | -4.507 | 0.0008 | -5.668 | 0.0000 | -7.612 | 0.0155 | -5.899 | 0 | 0.0597 | 5.6120 | 5.9251 | 7.993 | 0.0001 | -5.552 | 4151.00 | 0.8485 | -6.689 | 0.0000 | 6.782 | 12.458 | 3.7070 | -4.809 | 0.0001 | 1.843 | 0.0440 | 6266.46 | 275.6710 | 4192.72 | 0.0418 | 0.0008 | 0.9992 | 6.575 | 36.280 | 0.0000 | 3.5810 | -4.604 | 0.0001 | 1.291 | 138.566 | 8270.46 | 4204.51 | -3.688 | 1.0000 | 0.0000 | 0.0228 | -1.245 | 0.0137 | -1.3020 | 0.0000 | -1.245 | 0.0000 | -1.245 | 0.9901 | 6.2210 | -20730 | -15.180 | 0.0000 | -1.725 | -5.952 | 1.301e+00 | -2.186e-01 | 3.393e-05 | 3.454e-09 |
| ENSG00000112139 | ENSG00000112139 | ENST00000515437 | 16 | 5 | MAM domain containing glycosylphosphatidylinositol anchor 1 [Source:HGNC Symbol;Acc:HGNC:19267] | protein_coding | 388 | 6 | - | 37630679 | 37699306 | MDGA1 | MDGA1 | ENSG00000112139.5 | 1438.71428571429 | -2.249 | 0.4206 | -5.037 | 0.0015 | -2.682 | 0.3355 | -2.084 | 0.0001 | -4.942 | 0.0398 | -2.844 | 0.2584 | -2.532 | 1 | -3.3850 | -0.1629 | 10.6461 | 14.783 | 0.0366 | -3.222 | 149.80 | 1.1640 | -4.327 | 0.0000 | 2.708 | 7.744 | -1.4100 | -1.996 | 0.0566 | -3.877 | 0.2358 | 205.20 | 48.3754 | 150.91 | 0.2301 | 0.0001 | 0.9999 | 1.813 | 10.640 | 0.0011 | -1.5710 | -2.141 | 0.0421 | -3.687 | 24.552 | 142.01 | 83.28 | -1.170 | 0.9195 | 0.0805 | 0.1984 | -1.109 | 0.3083 | -1.1180 | 0.0011 | -1.109 | 0.0277 | -1.109 | 1.0000 | 6.2260 | -18990 | -8.808 | 0.9751 | -1.468 | -3.895 | 9.304e-01 | -2.389e-01 | 1.440e-02 | 5.749e-04 |
| ENSG00000203972 | ENSG00000203972 | ENST00000545705 | 10 | 1 | glycine-N-acyltransferase like 3 [Source:HGNC Symbol;Acc:HGNC:21349] | protein_coding | 468 | 6 | + | 49499923 | 49528078 | GLYATL3 | GLYATL3 | ENSG00000203972.1 | 667.5 | -3.493 | 0.1675 | -4.718 | 0.0498 | -2.629 | 0.3287 | -6.257 | 0.7136 | -4.599 | 0.0363 | -2.962 | 0.1916 | -3.352 | 1 | -5.7280 | -3.3770 | 0.5218 | 6.712 | 0.0023 | -2.351 | 27.99 | 1.5560 | -3.032 | 0.0024 | -1.881 | 2.837 | -4.7950 | -2.016 | 0.0544 | -3.941 | 0.0131 | 42.18 | 0.5417 | 27.77 | 0.0138 | 0.7136 | 0.2864 | -0.443 | 10.870 | 0.0010 | -4.5450 | -2.444 | 0.0218 | -3.577 | 2.858 | 29.19 | 16.02 | -1.461 | 0.9575 | 0.0425 | 0.1465 | -1.175 | 0.3346 | -1.1250 | 0.0361 | -1.175 | 0.0247 | -1.175 | 0.2995 | 1.5840 | -9924 | -6.426 | 0.9751 | -1.589 | -3.901 | 8.898e-02 | -2.281e-02 | 8.413e-03 | 1.355e-04 |
| ENSG00000196526 | ENSG00000196526 | ENST00000358461 | 10 | 6 | actin filament associated protein 1 [Source:HGNC Symbol;Acc:HGNC:24017] | protein_coding | 2193 | 4 | - | 7758714 | 7939926 | AFAP1 | AFAP1 | ENSG00000196526.6 | 1911 | -2.168 | 0.4335 | -3.294 | 0.0252 | -2.375 | 0.3793 | -3.967 | 0.0000 | -3.293 | 0.0574 | -2.538 | 0.2974 | -3.791 | 1 | 0.5888 | 2.6700 | 2.1638 | 11.578 | 0.0405 | -2.081 | 982.20 | 0.9856 | -3.342 | 0.0008 | 6.891 | 10.185 | 1.7240 | -1.864 | 0.0739 | -4.410 | 0.0640 | 1485.49 | 95.0002 | 1004.17 | 0.0623 | 0.0000 | 1.0000 | 4.492 | 9.604 | 0.0019 | 1.8040 | -1.999 | 0.0565 | -4.138 | 128.406 | 1777.17 | 952.79 | -2.550 | 0.9896 | 0.0104 | 0.3036 | -1.062 | 0.3570 | -1.0610 | 0.0280 | -1.062 | 0.0576 | -1.062 | 1.0000 | 6.2270 | -3031 | -5.688 | 0.9751 | -1.692 | -2.956 | 1.507e-01 | -5.097e-02 | 1.976e-02 | 1.013e-03 |
| ENSG00000175592 | ENSG00000175592 | ENST00000312562 | 9 | 7 | FOS like 1, AP-1 transcription factor subunit [Source:HGNC Symbol;Acc:HGNC:13718] | protein_coding | 816 | 11 | - | 65892049 | 65900573 | FOSL1 | FOSL1 | ENSG00000175592.7 | 496 | -2.045 | 0.4818 | -3.097 | 0.0000 | -2.039 | 0.1876 | -2.017 | 0.9748 | -3.081 | 0.0000 | -2.221 | 0.1738 | -1.557 | 1 | 0.1522 | 1.8020 | 3.6253 | 4.212 | 0.0561 | -1.650 | 267.80 | 0.5692 | -5.441 | 0.0000 | 5.087 | 8.184 | 1.1760 | -2.584 | 0.0158 | -3.011 | 0.2471 | 363.07 | 89.7162 | 268.44 | 0.2384 | 0.9748 | 0.0252 | 2.640 | 30.640 | 0.0000 | 1.0720 | -2.528 | 0.0181 | -3.082 | 93.436 | 274.89 | 184.16 | -1.214 | 0.9249 | 0.0751 | 0.1335 | -1.187 | 0.1854 | -1.2500 | 0.0000 | -1.187 | 0.0000 | -1.187 | 0.0310 | -0.1156 | -2206 | -4.510 | 0.9751 | -1.485 | -2.752 | 2.007e-01 | -7.295e-02 | 6.027e-03 | 1.090e-04 |
| ENSG00000122877 | ENSG00000122877 | ENST00000637191 | 16 | 1 | early growth response 2 [Source:HGNC Symbol;Acc:HGNC:3239] | protein_coding | 418 | 10 | - | 62811996 | 62919900 | EGR2 | EGR2 | ENSG00000122877.1 | 1140.25 | -1.878 | 0.5187 | -2.789 | 0.0117 | -1.437 | 0.4815 | -2.589 | 0.8231 | -2.779 | 0.0110 | -2.011 | 0.2749 | -1.495 | 1 | -1.2330 | -0.0550 | 0.8328 | 5.118 | 0.0731 | -1.178 | 96.53 | 0.7679 | -3.632 | 0.0003 | 3.932 | 6.721 | -0.6343 | -1.561 | 0.1308 | -4.503 | 0.1663 | 136.07 | 22.6139 | 96.80 | 0.1592 | 0.8231 | 0.1769 | 1.188 | 14.120 | 0.0002 | -0.7511 | -2.078 | 0.0480 | -3.767 | 29.831 | 84.08 | 56.96 | -1.112 | 0.9087 | 0.0913 | 0.2254 | -1.090 | 0.4497 | -0.8758 | 0.0088 | -1.090 | 0.0073 | -1.090 | 0.1454 | 0.8716 | -1025 | -3.219 | 0.9751 | -1.434 | -2.514 | 2.515e-01 | -1.000e-01 | 1.616e-02 | 7.617e-04 |
Repeat with batch in the model.
t_cf_eosinophil_de_batchvisit <- all_pairwise(t_eosinophils, model_batch = TRUE,
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 17 9
##
## 3 2 1
## 9 9 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_de_batchvisit## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8961
## basic_vs_dream 0.9176
## basic_vs_ebseq 0.8636
## basic_vs_edger 0.8977
## basic_vs_limma 0.9676
## basic_vs_noiseq 0.9094
## deseq_vs_dream 0.8493
## deseq_vs_ebseq 0.9519
## deseq_vs_edger 0.9998
## deseq_vs_limma 0.8678
## deseq_vs_noiseq 0.9024
## dream_vs_ebseq 0.8133
## dream_vs_edger 0.8517
## dream_vs_limma 0.9469
## dream_vs_noiseq 0.9304
## ebseq_vs_edger 0.9559
## ebseq_vs_limma 0.8328
## ebseq_vs_noiseq 0.8986
## edger_vs_limma 0.8696
## edger_vs_noiseq 0.9056
## limma_vs_noiseq 0.8816
t_cf_eosinophil_table_batchvisit <- combine_de_tables(
t_cf_eosinophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_batchvisit-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_table_batchvisit## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 99 35 103
## edger_sigdown limma_sigup limma_sigdown
## 1 24 35 15
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_sig_batchvisit <- extract_significant_genes(
t_cf_eosinophil_table_batchvisit,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_eosinophil_sig_batchvisit## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 35 15 103 24 99 35 7
## ebseq_down basic_up basic_down
## outcome 33 0 0
dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]])## [1] 99 84
dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]])## [1] 35 84
knitr::kable(head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000165949 | ENSG00000165949 | ENST00000611954 | 12 | 4 | interferon alpha inducible protein 27 [Source:HGNC Symbol;Acc:HGNC:5397] | protein_coding | 180 | 14 | + | 94104836 | 94116698 | IFI27 | IFI27 | ENSG00000165949.4 | 287.454545454545 | 3.323 | 0.2504 | 5.668 | 0e+00 | 4.793 | 0.0485 | 5.470 | 0.8279 | 5.624 | 0e+00 | 4.397 | 0.0591 | 3.963 | 1.000 | -0.1291 | -3.3160 | 7.379 | 1.696 | 0.0077 | 3.186 | 54.63 | 0.8188 | 6.923 | 0 | 7.505 | 1.8370 | -2.7330 | 4.106 | 0.0003 | -0.8668 | 44.34 | 3.243 | 144.20 | 52.03 | 35.61 | 0.8279 | 0.1721 | 0.4882 | 47.95 | 0 | -2.6550 | 3.858 | 0.0007 | -1.1070 | 118.68 | 7.612 | 63.15 | 1.930 | 0.9823 | 0.0177 | 0.0453 | -1.3740 | 0.0350 | -1.422 | 0e+00 | -1.3740 | 0e+00 | -1.3740 | 0.1420 | 0.8403 | 1 | 8.712 | 0.9751 | -1.668 | 5.230 | 0.000e+00 | 0.000e+00 | 2.264e-04 | 1.537e-07 |
| ENSG00000187569 | ENSG00000187569 | ENST00000345088 | 3 | 3 | developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199] | protein_coding | 480 | 12 | + | 7711433 | 7717559 | DPPA3 | DPPA3 | ENSG00000187569.3 | 480 | 3.662 | 0.1989 | 5.537 | 4e-04 | 4.906 | 0.0485 | 4.587 | 0.0605 | 5.386 | 1e-04 | 4.404 | 0.0351 | 3.622 | 1.000 | -0.6872 | -4.3010 | 7.263 | 2.839 | 0.0035 | 3.614 | 23.10 | 1.1660 | 4.747 | 0 | 5.915 | 0.3782 | -3.6360 | 4.124 | 0.0003 | -1.1500 | 24.04 | 2.470 | 59.59 | 22.24 | 21.89 | 0.0605 | 0.9395 | -0.6406 | 25.31 | 0 | -3.3720 | 4.263 | 0.0002 | -0.7244 | 56.07 | 4.555 | 30.31 | 2.207 | 0.9935 | 0.0065 | 0.0300 | -1.3750 | 0.0334 | -1.422 | 3e-04 | -1.3750 | 1e-04 | -1.3750 | 0.7668 | 5.8330 | 1 | 9.882 | 0.9751 | -1.704 | 5.088 | 3.906e-02 | 7.677e-03 | 7.965e-05 | 1.843e-08 |
| ENSG00000136235 | ENSG00000136235 | ENST00000479625 | 16 | 1 | glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462] | protein_coding | undefined | 7 | + | 23235967 | 23275108 | GPNMB | GPNMB | ENSG00000136235.1 | 1447.5 | 2.102 | 0.4987 | 5.426 | 2e-04 | 3.031 | 0.3933 | 5.629 | 0.8546 | 5.360 | 0e+00 | 2.104 | 0.5947 | 4.475 | 1.000 | -1.1190 | -3.6950 | 12.101 | 2.665 | 0.0621 | 2.576 | 53.03 | 1.0740 | 5.053 | 0 | 7.515 | 2.0897 | -3.2100 | 2.161 | 0.0400 | -3.6970 | 49.49 | 2.881 | 143.07 | 51.41 | 39.92 | 0.8546 | 0.1454 | 0.4259 | 29.50 | 0 | -3.2370 | 1.413 | 0.1695 | -4.4990 | 125.19 | 5.631 | 65.41 | 2.264 | 0.9978 | 0.0022 | 0.5334 | -0.8155 | 0.2802 | -1.291 | 1e-04 | -0.8155 | 0e+00 | -0.8155 | 0.1253 | 0.6664 | 1 | 7.044 | 0.6634 | -1.718 | 4.060 | 2.227e+00 | 5.485e-01 | 5.650e-02 | 9.577e-03 |
| ENSG00000089127 | ENSG00000089127 | ENST00000540589 | 13 | 2 | 2’-5’-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086] | protein_coding | 68 | 12 | + | 112906783 | 112933222 | OAS1 | OAS1 | ENSG00000089127.2 | 682.8 | 3.284 | 0.2669 | 4.820 | 0e+00 | 4.229 | 0.1407 | 4.691 | 0.0845 | 4.830 | 0e+00 | 3.978 | 0.1341 | 4.036 | 1.000 | 1.9510 | -1.3010 | 8.237 | 1.116 | 0.0092 | 3.252 | 184.60 | 0.7144 | 6.746 | 0 | 8.840 | 4.0197 | -0.5641 | 3.205 | 0.0035 | -1.8560 | 25.83 | 18.535 | 479.09 | 177.96 | 23.95 | 0.0845 | 0.9155 | 2.1430 | 56.76 | 0 | -0.4596 | 3.160 | 0.0040 | -1.9060 | 410.17 | 25.003 | 217.58 | 2.621 | 0.9894 | 0.0106 | 0.1062 | -1.3630 | 0.1038 | -1.411 | 0e+00 | -1.3630 | 0e+00 | -1.3630 | 0.7325 | 5.6770 | 1 | 8.893 | 0.9751 | -1.691 | 4.652 | 5.441e-02 | 1.170e-02 | 1.328e-03 | 5.293e-06 |
| ENSG00000111335 | ENSG00000111335 | ENST00000551603 | 12 | 1 | 2’-5’-oligoadenylate synthetase 2 [Source:HGNC Symbol;Acc:HGNC:8087] | protein_coding | 183 | 12 | + | 112978395 | 113011723 | OAS2 | OAS2 | ENSG00000111335.1 | 1319.5 | 2.537 | 0.3953 | 4.447 | 0e+00 | 3.840 | 0.1974 | 4.238 | 0.4415 | 4.461 | 0e+00 | 3.601 | 0.2035 | 3.944 | 0.943 | 4.0180 | 0.8398 | 12.517 | 3.025 | 0.0293 | 3.178 | 1028.00 | 0.8157 | 5.451 | 0 | 11.444 | 6.9971 | 1.6630 | 2.826 | 0.0089 | -2.5810 | 18.87 | 137.565 | 2596.53 | 988.75 | 17.44 | 0.4415 | 0.5585 | 4.5830 | 38.11 | 0 | 1.7970 | 2.779 | 0.0100 | -2.7210 | 2117.85 | 137.605 | 1127.73 | 2.734 | 0.9999 | 0.0001 | 0.2023 | -1.3430 | 0.1659 | -1.393 | 0e+00 | -1.3430 | 0e+00 | -1.3430 | 0.5743 | 3.3540 | 1 | 8.690 | 0.0315 | -1.725 | 4.110 | 6.269e-02 | 1.525e-02 | 3.340e-03 | 3.347e-05 |
| ENSG00000137959 | ENSG00000137959 | ENST00000450498 | 16 | 1 | interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817] | protein_coding | 699 | 1 | + | 78619922 | 78646145 | IFI44L | IFI44L | ENSG00000137959.1 | 783.333333333333 | 3.909 | 0.1645 | 4.200 | 0e+00 | 4.025 | 0.0544 | 4.022 | 0.7568 | 4.213 | 0e+00 | 3.902 | 0.0422 | 4.334 | 0.000 | 5.5560 | 1.7930 | 6.584 | 3.318 | 0.0020 | 3.763 | 1932.00 | 0.7590 | 5.534 | 0 | 12.253 | 8.0528 | 2.8960 | 3.985 | 0.0005 | -0.0497 | 16.25 | 295.965 | 4808.31 | 1857.93 | 14.90 | 0.7568 | 0.2432 | 5.4890 | 40.52 | 0 | 3.0090 | 4.118 | 0.0003 | 0.2407 | 5616.14 | 278.525 | 2947.33 | 3.056 | 1.0000 | 0.0000 | 0.0448 | -1.3750 | 0.0666 | -1.421 | 0e+00 | -1.3750 | 0e+00 | -1.3750 | 0.2506 | 1.3030 | 1 | 10.290 | 0.0000 | -1.725 | 4.204 | 7.374e-02 | 1.754e-02 | 1.151e-04 | 3.976e-08 |
knitr::kable(head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000189430 | ENSG00000189430 | ENST00000338835 | 13 | 9 | natural cytotoxicity triggering receptor 1 [Source:HGNC Symbol;Acc:HGNC:6731] | protein_coding | 864 | 19 | + | 54906148 | 54916140 | NCR1 | NCR1 | ENSG00000189430.9 | 798.5 | -3.624 | 0.1472 | -5.820 | 0.0002 | -2.5340 | 0.4295 | -5.817 | 0.0000 | -5.752 | 0.0019 | -3.214 | 0.2894 | -3.025 | 1 | -4.4710 | -1.0630 | 1.835 | 11.567 | 0.0014 | -3.408 | 93.09 | 1.1550 | -5.038 | 0e+00 | 1.4989 | 7.319 | -2.587 | -2.0530 | 0.0502 | -3.8380 | 0.0177 | 144.85 | 2.560 | 95.59 | 0.0182 | 0.0000 | 1.0000 | 1.0570 | 19.19 | 0e+00 | -2.621 | -2.436 | 0.0220 | -3.3850 | 5.676 | 46.20 | 25.94 | -2.1077 | 0.9822 | 0.0178 | 0.1690 | -1.303 | 0.2756 | -1.2570 | 0.0001 | -1.303 | 0.0012 | -1.303 | 0.7824 | 6.227 | -8796 | -9.316 | 0.9751 | -1.668 | -5.110 | 1.583e+00 | -3.098e-01 | 7.344e-03 | 1.615e-04 |
| ENSG00000179344 | ENSG00000179344 | ENST00000399084 | 16 | 5 | major histocompatibility complex, class II, DQ beta 1 [Source:HGNC Symbol;Acc:HGNC:4944] | protein_coding | 786 | 6 | - | 32659467 | 32668383 | HLA-DQB1 | HLA-DQB1 | ENSG00000179344.5 | 645.5 | -5.227 | 0.0515 | -5.667 | 0.0000 | -5.5390 | 0.0390 | -4.507 | 0.0008 | -5.653 | 0.0006 | -5.936 | 0.0332 | -5.899 | 0 | 0.0597 | 5.6120 | 5.925 | 7.993 | 0.0001 | -5.552 | 4151.00 | 0.8879 | -6.382 | 0e+00 | 8.4098 | 14.076 | 3.707 | -4.3850 | 0.0002 | 0.9012 | 0.0440 | 6266.46 | 275.671 | 4192.72 | 0.0418 | 0.0008 | 0.9992 | 6.5750 | 21.99 | 0e+00 | 3.581 | -4.357 | 0.0002 | 0.8177 | 138.566 | 8270.46 | 4204.51 | -3.6877 | 1.0000 | 0.0000 | 0.0336 | -1.376 | 0.0259 | -1.4220 | 0.0000 | -1.376 | 0.0007 | -1.376 | 0.9901 | 6.221 | -20730 | -15.180 | 0.0000 | -1.725 | -5.464 | 1.039e-01 | -1.902e-02 | 6.255e-05 | 1.123e-08 |
| ENSG00000162669 | ENSG00000162669 | ENST00000427444 | 16 | 1 | helicase for meiosis 1 [Source:HGNC Symbol;Acc:HGNC:20193] | protein_coding | 589 | 1 | - | 91260766 | 91404856 | HFM1 | HFM1 | ENSG00000162669.1 | 1528.4 | -3.444 | 0.1672 | -4.617 | 0.0012 | -1.8420 | 0.3845 | -5.349 | 0.0008 | -4.588 | 0.0054 | -2.665 | 0.1784 | -3.246 | 1 | -3.0190 | -0.0765 | 2.110 | 8.425 | 0.0021 | -2.942 | 207.70 | 1.0200 | -4.526 | 0e+00 | 4.2354 | 8.853 | -1.469 | -2.1860 | 0.0379 | -3.6080 | 0.0245 | 332.37 | 8.142 | 220.14 | 0.0241 | 0.0008 | 0.9992 | 2.1480 | 16.51 | 0e+00 | -1.450 | -2.912 | 0.0073 | -2.4740 | 10.706 | 101.59 | 56.15 | -1.8950 | 0.9634 | 0.0366 | 0.1410 | -1.352 | 0.2994 | -1.2980 | 0.0012 | -1.352 | 0.0031 | -1.352 | 0.8276 | 6.222 | -5395 | -8.042 | 0.9751 | -1.608 | -3.922 | 2.638e-01 | -6.727e-02 | 2.452e-03 | 1.764e-05 |
| ENSG00000167634 | ENSG00000167634 | ENST00000328092 | 12 | 9 | NLR family pyrin domain containing 7 [Source:HGNC Symbol;Acc:HGNC:22947] | protein_coding | 3030 | 19 | - | 54923509 | 54966312 | NLRP7 | NLRP7 | ENSG00000167634.9 | 2077.44444444444 | -2.682 | 0.3229 | -4.061 | 0.0071 | -0.9355 | 0.7928 | -4.236 | 0.0168 | -4.004 | 0.0317 | -1.875 | 0.4968 | -1.571 | 1 | -4.1970 | -2.2500 | 0.258 | 8.472 | 0.0153 | -1.947 | 27.08 | 1.0170 | -3.995 | 1e-04 | 0.8671 | 4.928 | -3.312 | -0.9051 | 0.3737 | -4.7730 | 0.0531 | 41.32 | 2.183 | 27.77 | 0.0550 | 0.0168 | 0.9832 | -0.7663 | 12.24 | 5e-04 | -3.363 | -1.721 | 0.0971 | -4.2100 | 5.676 | 16.86 | 11.27 | -1.1972 | 0.9233 | 0.0767 | 0.3157 | -1.055 | 0.7221 | -0.1889 | 0.0044 | -1.055 | 0.0216 | -1.055 | 0.8001 | 6.118 | -940200 | -5.322 | 0.9751 | -1.481 | -3.179 | 1.172e+00 | -3.687e-01 | 3.255e-02 | 3.126e-03 |
| ENSG00000196526 | ENSG00000196526 | ENST00000358461 | 10 | 6 | actin filament associated protein 1 [Source:HGNC Symbol;Acc:HGNC:24017] | protein_coding | 2193 | 4 | - | 7758714 | 7939926 | AFAP1 | AFAP1 | ENSG00000196526.6 | 1911 | -2.168 | 0.4335 | -3.879 | 0.0038 | -2.0550 | 0.5347 | -3.967 | 0.0000 | -3.877 | 0.0088 | -2.357 | 0.3886 | -3.791 | 1 | 0.5888 | 2.6700 | 2.164 | 11.578 | 0.0405 | -2.081 | 982.20 | 0.9252 | -4.192 | 0e+00 | 6.7815 | 10.660 | 1.724 | -1.7170 | 0.0978 | -4.5770 | 0.0640 | 1485.49 | 95.000 | 1004.17 | 0.0623 | 0.0000 | 1.0000 | 4.4950 | 15.42 | 1e-04 | 1.804 | -2.067 | 0.0489 | -4.0740 | 128.406 | 1777.17 | 952.79 | -2.5503 | 0.9896 | 0.0104 | 0.3539 | -1.215 | 0.4110 | -1.1000 | 0.0041 | -1.215 | 0.0090 | -1.215 | 1.0000 | 6.227 | -3031 | -5.688 | 0.9751 | -1.692 | -3.320 | 3.403e-01 | -1.025e-01 | 1.633e-02 | 7.942e-04 |
| ENSG00000277150 | ENSG00000277150 | ENST00000622749 | 1 | 1 | coagulation factor VIII associated 3 [Source:HGNC Symbol;Acc:HGNC:31850] | protein_coding | 1116 | X | - | 155456914 | 155458672 | F8A3 | F8A1 | ENSG00000277150.1 | 1116 | -3.020 | 0.2249 | -3.788 | 0.0153 | -0.8834 | 0.7374 | -4.133 | 0.1777 | -3.704 | 0.0589 | -1.712 | 0.3848 | -1.821 | 1 | -4.5610 | -2.3170 | 1.574 | 6.415 | 0.0059 | -2.244 | 21.72 | 1.0170 | -3.724 | 2e-04 | 1.9848 | 5.772 | -3.641 | -1.0960 | 0.2830 | -4.6810 | 0.0570 | 34.05 | 1.931 | 22.93 | 0.0606 | 0.1777 | 0.8223 | -1.5770 | 10.78 | 1e-03 | -3.506 | -2.080 | 0.0476 | -3.8470 | 5.524 | 19.52 | 12.52 | -0.9781 | 0.8722 | 0.1278 | 0.2415 | -1.219 | 0.6628 | -0.4884 | 0.0102 | -1.219 | 0.0375 | -1.219 | 0.8350 | 5.070 | -2709000 | -6.134 | 0.9751 | -1.317 | -2.842 | 9.067e-01 | -3.191e-01 | 1.628e-02 | 7.364e-04 |
Repeat with visit in the condition contrast.
visitcf_factor <- paste0("v", pData(t_eosinophils)[["visitnumber"]], "_",
pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
t_cf_eosinophil_visits_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_visits_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_eosinophil_visits_table_sva <- combine_de_tables(
t_cf_eosinophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_visits_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 9 11 2 3
## 2 v2_failure_vs_v2_cure 4 3 5 2
## 3 v3_failure_vs_v3_cure 14 7 17 2
## limma_sigup limma_sigdown
## 1 0 1
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_visits_sig_sva <- extract_significant_genes(
t_cf_eosinophil_visits_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_visits_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 0 1 2 3 9 11 4
## v2cf 0 0 5 2 4 3 11
## v3cf 0 0 17 2 14 7 3
## ebseq_down basic_up basic_down
## v1cf 86 0 0
## v2cf 18 0 0
## v3cf 10 0 0
dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]])## [1] 9 84
dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]])## [1] 11 84
knitr::kable(head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000143416 | ENSG00000143416 | ENST00000443708 | 21 | 5 | selenium binding protein 1 [Source:HGNC Symbol;Acc:HGNC:10719] | protein_coding | 372 | 1 | - | 151364304 | 151372707 | SELENBP1 | SELENBP1 | ENSG00000143416.5 | 727.272727272727 | 0.6636 | 0.9946 | 21.040 | 0.0292 | 6.664 | 0.3331 | 13.6378 | 0.7073 | 6.667 | 0.8813 | 6.969 | 0.2738 | 4.3217 | 1 | -2.9540 | -5.0410 | 29.4269 | 0.4130 | 0.5742 | 2.0870 | 20.49 | 5.0190 | 4.192 | 0e+00 | -0.4417 | -21.477 | -4.9810 | 2.606 | 0.0160 | -3.5460 | 12746.087 | 0.00 | 127.45 | 47.79 | 249.156 | 0.7073 | 0.2927 | -0.7976 | 2.956 | 0.0856 | -4.6440 | 3.162 | 0.0045 | -2.8790 | 53.22 | 2.661 | 27.94 | 1.1523 | 0.9511 | 0.0489 | 0.2738 | -1.433 | 0.3524 | -1.373 | 0.0226 | -1.433 | 0.7929 | -1.433 | 0.2363 | 0.5755 | 1 | 4.3750 | 0.4164 | -1.1210 | 9.641 | 2.099e+01 | 2.178e+00 | 3.004e-02 | 2.319e-03 |
| ENSG00000136732 | ENSG00000136732 | ENST00000459787 | 16 | 1 | glycophorin C (Gerbich blood group) [Source:HGNC Symbol;Acc:HGNC:4704] | protein_coding | undefined | 2 | + | 126656133 | 126696667 | GYPC | GYPC | ENSG00000136732.1 | 347 | 1.2540 | 0.9946 | 5.298 | 0.0465 | 4.031 | 0.3123 | 4.0095 | 0.9653 | 5.290 | 0.0505 | 4.195 | 0.3253 | 4.0248 | 1 | 4.1470 | 1.8540 | 9.5905 | 0.7295 | 0.3274 | 2.2930 | 601.60 | 1.3500 | 3.925 | 1e-04 | 12.1164 | 6.818 | 2.1910 | 2.834 | 0.0095 | -2.6190 | 16.105 | 198.13 | 3191.08 | 1320.49 | 12.632 | 0.9653 | 0.0347 | 3.8060 | 17.080 | 0.0000 | 2.2010 | 2.907 | 0.0082 | -2.5120 | 3558.42 | 218.611 | 1888.51 | 1.4435 | 0.9490 | 0.0510 | 0.3253 | -1.421 | 0.3208 | -1.394 | 0.0418 | -1.421 | 0.0558 | -1.421 | 0.0524 | -0.3585 | 1 | 4.8060 | 0.4164 | -1.1150 | 5.055 | 4.846e-01 | 9.587e-02 | 2.758e-03 | 2.181e-05 |
| ENSG00000136689 | ENSG00000136689 | ENST00000472292 | 18 | 1 | interleukin 1 receptor antagonist [Source:HGNC Symbol;Acc:HGNC:6000] | protein_coding | undefined | 2 | + | 113107214 | 113134016 | IL1RN | IL1RN | ENSG00000136689.1 | 484.2 | 2.1950 | 0.9946 | 4.047 | 0.0292 | 3.608 | 0.2141 | 1.1638 | 0.6966 | 4.009 | 0.0602 | 3.902 | 0.2145 | 2.0500 | 1 | -0.1165 | -1.7740 | 0.6101 | 1.8351 | 0.0707 | 1.6570 | 43.99 | 0.9437 | 4.289 | 0e+00 | 6.7104 | 2.663 | -0.8124 | 3.775 | 0.0010 | -1.3260 | 2.240 | 19.86 | 44.50 | 29.10 | 2.161 | 0.6966 | 0.3034 | 0.0573 | 16.080 | 0.0001 | -0.7526 | 4.085 | 0.0005 | -0.7619 | 57.68 | 13.928 | 35.80 | 1.1572 | 0.9485 | 0.0515 | 0.2144 | -1.446 | 0.1971 | -1.422 | 0.0203 | -1.446 | 0.0632 | -1.446 | 0.2493 | 0.6145 | 1 | 3.4740 | 0.4164 | -1.1130 | 4.002 | 1.351e-03 | 3.376e-04 | 1.883e-04 | 6.705e-08 |
| ENSG00000169429 | ENSG00000169429 | ENST00000401931 | 11 | 1 | C-X-C motif chemokine ligand 8 [Source:HGNC Symbol;Acc:HGNC:6025] | protein_coding | 288 | 4 | + | 73740541 | 73743716 | CXCL8 | CXCL8 | ENSG00000169429.1 | 294 | 0.3227 | 0.9946 | 3.654 | 0.0465 | 4.434 | 0.2287 | 0.6389 | 0.9741 | 3.645 | 0.0448 | 4.721 | 0.2145 | 0.9802 | 1 | 0.8932 | 0.5284 | 3.2382 | 0.9915 | 0.7698 | 0.3648 | 281.80 | 0.9314 | 3.923 | 1e-04 | 8.5916 | 4.937 | 0.7983 | 3.595 | 0.0016 | -1.1320 | 1.557 | 80.86 | 125.92 | 97.76 | 1.455 | 0.9741 | 0.0259 | 2.7020 | 18.170 | 0.0000 | 0.8090 | 3.852 | 0.0009 | -0.6052 | 147.68 | 74.856 | 111.27 | 0.4793 | 0.9080 | 0.0920 | 0.2144 | -1.445 | 0.2226 | -1.420 | 0.0356 | -1.445 | 0.0480 | -1.445 | 0.0307 | -0.3901 | 1 | 0.7655 | 0.4164 | -0.9926 | 3.957 | 5.385e-01 | 1.361e-01 | 3.224e-04 | 2.176e-07 |
| ENSG00000135862 | ENSG00000135862 | ENST00000258341 | 6 | 5 | laminin subunit gamma 1 [Source:HGNC Symbol;Acc:HGNC:6492] | protein_coding | 4830 | 1 | + | 183023420 | 183145592 | LAMC1 | LAMC1 | ENSG00000135862.5 | 2471.5 | 1.4800 | 0.9946 | 3.168 | 0.0488 | 2.838 | 0.2799 | 0.8170 | 0.8664 | 3.157 | 0.1489 | 2.875 | 0.2422 | 1.3755 | 1 | 0.7111 | -0.8383 | 1.0858 | 3.6722 | 0.1895 | 1.5490 | 74.31 | 0.8152 | 3.886 | 1e-04 | 7.2209 | 4.053 | 0.0654 | 3.164 | 0.0044 | -2.0790 | 1.762 | 47.92 | 84.43 | 61.61 | 1.690 | 0.8664 | 0.1336 | 0.7674 | 12.500 | 0.0004 | 0.1736 | 3.369 | 0.0028 | -1.6790 | 95.47 | 36.794 | 66.13 | 0.6791 | 0.9196 | 0.0804 | 0.2421 | -1.439 | 0.2733 | -1.410 | 0.0356 | -1.439 | 0.1218 | -1.439 | 0.1172 | -0.0003 | 1 | 3.2470 | 0.4164 | -1.0270 | 3.075 | 2.285e-04 | 7.430e-05 | 1.089e-03 | 2.115e-06 |
| ENSG00000105889 | ENSG00000105889 | ENST00000424363 | 15 | 5 | STEAP family member 1B [Source:HGNC Symbol;Acc:HGNC:41907] | protein_coding | 762 | 7 | - | 22419444 | 22727613 | STEAP1B | STEAP1B | ENSG00000105889.5 | 794.75 | 2.0340 | 0.9946 | 1.952 | 0.0339 | 1.536 | 0.2141 | 0.6165 | 0.9309 | 1.961 | 0.1193 | 1.736 | 0.1766 | 0.9709 | 1 | 1.9180 | 1.0580 | 0.2279 | 0.5140 | 0.0900 | 0.8599 | 98.64 | 0.4730 | 4.127 | 0e+00 | 7.9468 | 5.995 | 1.0130 | 3.858 | 0.0008 | -0.5549 | 1.533 | 107.00 | 164.06 | 128.40 | 1.513 | 0.9309 | 0.0691 | 1.1870 | 13.590 | 0.0002 | 1.0240 | 4.730 | 0.0001 | 1.2500 | 232.04 | 118.381 | 175.21 | 0.5638 | 0.9145 | 0.0855 | 0.1766 | -1.447 | 0.1971 | -1.422 | 0.0217 | -1.447 | 0.1035 | -1.447 | 0.0648 | -0.2337 | 1 | 1.8030 | 0.4164 | -1.0120 | 1.868 | 7.500e-03 | 4.016e-03 | 1.216e-04 | 9.417e-09 |
knitr::kable(head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]]))| ensembl_gene_id | ensembl_transcript_id | version | transcript_version | description | gene_biotype | cds_length | chromosome_name | strand | start_position | end_position | hgnc_symbol | uniprot_gn_symbol | transcript | mean_cds_len | basic_logfc | basic_adjp | deseq_logfc | deseq_adjp | dream_logfc | dream_adjp | ebseq_logfc | ebseq_adjp | edger_logfc | edger_adjp | limma_logfc | limma_adjp | noiseq_logfc | noiseq_adjp | basic_num | basic_den | basic_numvar | basic_denvar | basic_t | basic_p | deseq_basemean | deseq_lfcse | deseq_stat | deseq_p | deseq_num | deseq_den | dream_ave | dream_t | dream_p | dream_b | ebseq_fc | ebseq_c1mean | ebseq_c2mean | ebseq_mean | ebseq_postfc | ebseq_ppee | ebseq_ppde | edger_logcpm | edger_lr | edger_p | limma_ave | limma_t | limma_p | limma_b | noiseq_num | noiseq_den | noiseq_mean | noiseq_theta | noiseq_prob | noiseq_p | limma_adjp_ihw | limma_p_zstd | dream_adjp_ihw | dream_p_zstd | deseq_adjp_ihw | deseq_p_zstd | edger_adjp_ihw | edger_p_zstd | ebseq_adjp_ihw | ebseq_p_zstd | basic_adjp_ihw | basic_p_zstd | noiseq_adjp_ihw | noiseq_p_zstd | lfc_meta | lfc_var | lfc_varbymed | p_meta | p_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000129295 | ENSG00000129295 | ENST00000522789 | 9 | 5 | leucine rich repeat containing 6 [Source:HGNC Symbol;Acc:HGNC:16725] | protein_coding | 621 | 8 | - | 132570416 | 132675592 | LRRC6 | LRRC6 | ENSG00000129295.5 | 998.375 | -2.3670 | 0.9946 | -4.504 | 0.0210 | -4.670 | 0.2141 | -2.7493 | 0.0409 | -4.506 | 0.0448 | -4.525 | 0.2264 | -2.4407 | 1 | 0.8385 | 3.226 | 1.6048 | 2.4094 | 0.0628 | -2.3870 | 357.7 | 0.9891 | -4.554 | 0e+00 | 5.009 | 9.513 | 2.201 | -3.738 | 0.0011 | -0.7583 | 0.1487 | 629.8 | 93.66 | 428.8 | 0.1450 | 0.0409 | 0.9591 | 3.048 | 18.13 | 0e+00 | 2.194 | -3.609 | 0.0015 | -1.0380 | 122.7 | 666.2 | 394.4 | -1.0714 | 0.9557 | 0.0443 | 0.2264 | -1.443 | 0.1609 | -1.421 | 0.0203 | -1.443 | 0.0480 | -1.443 | 0.9875 | 2.9880 | -1964.000 | -5.0020 | 0.4164 | -1.1350 | -4.671 | 3.654e-01 | -7.823e-02 | 5.246e-04 | 7.855e-07 |
| ENSG00000140090 | ENSG00000140090 | ENST00000526482 | 17 | 1 | solute carrier family 24 member 4 [Source:HGNC Symbol;Acc:HGNC:10978] | protein_coding | undefined | 14 | + | 92322581 | 92501483 | SLC24A4 | SLC24A4 | ENSG00000140090.1 | 1388.6 | -1.0630 | 0.9946 | -3.452 | 0.0465 | -3.501 | 0.2652 | -1.2758 | 0.9609 | -3.445 | 0.1115 | -3.286 | 0.3003 | -0.9902 | 1 | 0.4640 | 1.562 | 2.0840 | 1.8659 | 0.3460 | -1.0980 | 194.0 | 0.8672 | -3.980 | 1e-04 | 4.475 | 7.926 | 1.560 | -3.286 | 0.0033 | -1.7060 | 0.4130 | 183.4 | 75.75 | 143.0 | 0.3983 | 0.9609 | 0.0391 | 2.181 | 14.22 | 2e-04 | 1.560 | -3.011 | 0.0064 | -2.2390 | 103.0 | 204.6 | 153.8 | -0.3861 | 0.8614 | 0.1386 | 0.3003 | -1.427 | 0.2516 | -1.414 | 0.0307 | -1.427 | 0.1035 | -1.427 | 0.0495 | -0.3425 | -172.600 | -2.3000 | 0.4472 | -0.8542 | -3.447 | 4.091e-02 | -1.187e-02 | 2.213e-03 | 1.320e-05 |
| ENSG00000120049 | ENSG00000120049 | ENST00000343195 | 19 | 8 | potassium voltage-gated channel interacting protein 2 [Source:HGNC Symbol;Acc:HGNC:15522] | protein_coding | 663 | 10 | - | 101825974 | 101843920 | KCNIP2 | KCNIP2 | ENSG00000120049.8 | 679.2 | -1.3150 | 0.9946 | -1.956 | 0.0001 | -2.216 | 0.0314 | -1.1698 | 0.9280 | -1.957 | 0.0068 | -2.038 | 0.0446 | -0.8999 | 1 | 2.9440 | 3.887 | 1.1125 | 0.7187 | 0.2664 | -0.9432 | 629.3 | 0.3415 | -5.729 | 0e+00 | 7.752 | 9.708 | 3.696 | -6.173 | 0.0000 | 4.5490 | 0.4445 | 814.0 | 361.80 | 644.4 | 0.4356 | 0.9280 | 0.0720 | 3.861 | 24.77 | 0e+00 | 3.710 | -6.053 | 0.0000 | 4.3070 | 469.4 | 875.8 | 672.6 | -0.5744 | 0.9304 | 0.0696 | 0.0446 | -1.448 | 0.0314 | -1.425 | 0.0001 | -1.448 | 0.0076 | -1.448 | 0.0915 | -0.2231 | -16.900 | -1.9760 | 0.4164 | -1.0590 | -2.001 | 8.748e-03 | -4.372e-03 | 1.629e-06 | 5.185e-12 |
| ENSG00000089335 | ENSG00000089335 | ENST00000502743 | 21 | 5 | zinc finger protein 302 [Source:HGNC Symbol;Acc:HGNC:13848] | protein_coding | 360 | 19 | + | 34677639 | 34686397 | ZNF302 | ZNF302 | ENSG00000089335.5 | 803.25 | -1.5630 | 0.9946 | -1.763 | 0.0465 | -2.202 | 0.1999 | -1.0398 | 0.5318 | -1.763 | 0.1323 | -1.975 | 0.1762 | -0.7208 | 1 | 1.9470 | 2.847 | 0.7883 | 0.3437 | 0.2137 | -0.8999 | 294.4 | 0.4451 | -3.962 | 1e-04 | 6.794 | 8.557 | 2.642 | -4.901 | 0.0001 | 1.7480 | 0.4864 | 365.2 | 177.62 | 294.8 | 0.4825 | 0.5318 | 0.4682 | 2.775 | 13.21 | 3e-04 | 2.644 | -5.118 | 0.0000 | 2.2240 | 253.4 | 417.7 | 335.6 | -0.4143 | 0.8741 | 0.1259 | 0.1762 | -1.448 | 0.1609 | -1.425 | 0.0287 | -1.448 | 0.1178 | -1.448 | 0.4589 | 1.2110 | -15.230 | -1.8850 | 0.4472 | -0.8918 | -1.860 | 2.803e-02 | -1.507e-02 | 1.306e-04 | 1.664e-08 |
| ENSG00000169330 | ENSG00000169330 | ENST00000305428 | 9 | 8 | membrane integral NOTCH2 associated receptor 1 [Source:HGNC Symbol;Acc:HGNC:29172] | protein_coding | 2751 | 15 | + | 79432336 | 79472304 | MINAR1 | MINAR1 | ENSG00000169330.8 | 2658 | -0.7188 | 0.9946 | -1.621 | 0.0292 | -1.859 | 0.2141 | -0.7262 | 0.9063 | -1.622 | 0.0505 | -1.718 | 0.2145 | -0.3545 | 1 | 4.3750 | 4.695 | 0.1995 | 0.6554 | 0.4993 | -0.3195 | 1482.0 | 0.3843 | -4.218 | 0e+00 | 8.983 | 10.604 | 4.913 | -4.336 | 0.0003 | 0.5512 | 0.6045 | 1439.1 | 869.93 | 1225.7 | 0.5975 | 0.9063 | 0.0937 | 5.098 | 17.09 | 0e+00 | 4.935 | -4.533 | 0.0002 | 0.9596 | 1215.2 | 1553.7 | 1384.4 | -0.3327 | 0.8341 | 0.1659 | 0.2144 | -1.447 | 1.0000 | -1.424 | 0.0353 | -1.447 | 1.0000 | -1.447 | 0.1190 | -0.1447 | -1.123 | -0.6686 | 0.4663 | -0.7729 | -1.659 | 8.008e-04 | -4.828e-04 | 7.427e-05 | 5.882e-09 |
| ENSG00000282246 | ENSG00000282246 | ENST00000596044 | 1 | 5 | novel protein | protein_coding | 57 | 10 | + | 13610047 | 13655929 | ENSG00000282246.5 | 257 | -0.7066 | 0.9946 | -1.575 | 0.0465 | -1.673 | 0.2141 | -0.7959 | 0.9518 | -1.575 | 0.1115 | -1.490 | 0.2145 | -0.3864 | 1 | 3.2860 | 3.645 | 0.2675 | 0.8440 | 0.5063 | -0.3589 | 556.1 | 0.4018 | -3.921 | 1e-04 | 7.883 | 9.458 | 3.543 | -3.907 | 0.0007 | -0.3918 | 0.5760 | 711.5 | 409.82 | 598.4 | 0.5635 | 0.9518 | 0.0482 | 3.688 | 14.19 | 2e-04 | 3.546 | -3.824 | 0.0009 | -0.6119 | 590.2 | 771.4 | 680.8 | -0.2528 | 0.7316 | 0.2684 | 0.2144 | -1.445 | 0.1971 | -1.423 | 0.0647 | -1.445 | 0.1140 | -1.445 | 0.0630 | -0.3095 | -2.869 | -0.7512 | 0.5820 | -0.4678 | -1.556 | 3.292e-03 | -2.115e-03 | 3.915e-04 | 2.120e-07 |
As a reminder, there are a few genes of particular interest:
expected_genes <- c("IFI44L", "IFI27", "PRR5", "PRR5-ARHGAP8", "RHCE",
"FBXO39", "RSAD2", "SMTNL1", "USP18", "AFAP1")
annot <- fData(t_monocytes)
wanted_idx <- annot[["hgnc_symbol"]] %in% expected_genes
expected_ensg <- rownames(annot)[wanted_idx]Either above or below this section I have a nearly identical block which seeks to demonstrate the similarities/difference observed between my preferred/simplified model vs. a more explicitly correct and complex model. If the trend holds from what we observed with the eosinophils and neutrophils, I would expect to see that the results are marginally ‘better’ (as defined by the strength of the perceived interleukin response and raw number of ‘significant’ genes); but I remain worried that this will prove a more brittle and error-prone analysis.
Start out by extracting the perceived svs via svaseq on the filtered input.
## The original pairwise invocation with sva:
##t_cf_monocyte_de_sva <- all_pairwise(t_monocyte, model_batch = "svaseq",
## filter = TRUE, parallel = FALSE,
## methods = methods)
test_monocytes <- normalize_expt(t_monocytes, filter = "simple")## Removing 0 low-count genes (10862 remaining).
test_mono_design <- pData(test_monocytes)
test_formula <- as.formula("~ finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_mono_design)
null_formula <- as.formula("~ visitnumber")
null_model <- model.matrix(null_formula, data = test_mono_design)
linear_mtrx <- exprs(test_monocytes)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 2
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 2
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])We can now create a new DESeq2 dataset which takes these putative surrogates into account.
colnames(model_adjust) <- paste0("SV", seq_len(chosen_surrogates))
rownames(model_adjust) <- rownames(pData(test_monocytes))
addition_string <- ""
for (sv in colnames(model_adjust)) {
addition_string <- paste0(addition_string, " + ", sv)
}
longer_model <- as.formula(glue("~ finaloutcome + visitnumber{addition_string}"))
mono_design_svs <- cbind(test_mono_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = mono_design_svs,
design = longer_model)## converting counts to integer mode
In order to compare these and the previous results, I tend to rely on simple correlations and aucc plots. I have been reading the modelr code recently and it looks like there is a suite of other metrics which might be more appropriate.
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 1075, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9952 0.9955
## sample estimates:
## cor
## 0.9953
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")],
add_cor = TRUE, add_rsq = TRUE, identity = TRUE,
add_equation = TRUE)
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_monocyte_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 1.3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9938
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10802 60
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 7 57 53
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 110
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 9 MF
## 147 BP
## 0 KEGG
## 0 REAC
## 0 WP
## 2 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 60
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 0 MF
## 44 BP
## 3 KEGG
## 0 REAC
## 2 WP
## 0 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_monocytes), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 10865 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10865 column: 101.
Are the expected Ensembl gene IDs found in this new set?
sum(new_genes %in% expected_ensg)## [1] 10
We wish to ensure that my model simplification did not do anything incorrect to the data for all three cell types, I already did this for the neutrophils, let us repeat for the eosinophils. I am therefore (mostly) copy/pasting the neutrophil section here.
## The original pairwise invocation with sva:
#t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
# filter = TRUE, parallel=FALSE, methods = methods)
test_eosinophils <- normalize_expt(t_eosinophils, filter = "simple")## Removing 2652 low-count genes (17300 remaining).
test_eo_design <- pData(test_eosinophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_eo_design)
null_formula <- as.formula("~ 0 + visitnumber")
null_model <- model.matrix(null_formula, data = test_eo_design)
linear_mtrx <- exprs(test_eosinophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 3
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 3
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])
colnames(model_adjust) <- c("SV1", "SV2", "SV3")
rownames(model_adjust) <- rownames(pData(test_eosinophils))
longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3")
eo_design_svs <- cbind(test_eo_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = eo_design_svs,
design = longer_model)## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 228, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9084 0.9149
## sample estimates:
## cor
## 0.9117
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_eosinophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 3.5e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.8214
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_eosinophil_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10416 116
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 38 193 78
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 271
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 11 MF
## 186 BP
## 4 KEGG
## 6 REAC
## 5 WP
## 7 TF
## 0 MIRNA
## 1 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 116
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 26 MF
## 112 BP
## 3 KEGG
## 7 REAC
## 4 WP
## 72 TF
## 1 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_eosinophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17303 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10535 column: 101.
Check our genes of particular interest
sum(new_genes %in% expected_ensg)## [1] 5
Not quite as similar as the monocyte data.
## The original pairwise invocation with sva:
## t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
## parallel = parallel, filter = TRUE,
## methods = methods)
test_neutrophils <- normalize_expt(t_neutrophils, filter = "simple")## Removing 2652 low-count genes (17300 remaining).
test_neut_design <- pData(test_neutrophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_neut_design)
## Note to self: double-check that the following line is correct.
null_formula <- as.formula("~ 0 + visitnumber")
## null_model <- test_model[, c(1, 2)]
null_model <- model.matrix(null_formula, data = test_neut_design)
linear_mtrx <- exprs(test_neutrophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates## [1] 4
surrogate_result <- sva::svaseq(
dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)## Number of significant surrogate variables is: 4
## Iteration (out of 5 ):1 2 3 4 5
model_adjust <- as.matrix(surrogate_result[["sv"]])
## I don't think the following is actually required, but it is weird to just have this
## unnamed matrix hangingout.
## Set the columns to the SV#s
colnames(model_adjust) <- c("SV1", "SV2", "SV3", "SV4")
## Set the rows the sample IDs
rownames(model_adjust) <- rownames(pData(test_neutrophils))
longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3 + SV4")
neut_design_svs <- cbind(test_neut_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
colData = neut_design_svs,
design = longer_model)## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
contrast = c("finaloutcome", "failure", "cure"),
format = "DataFrame"))
## We should be able to directly compare this to the the deseq columns from the above
## data structure named: t_cf_neutrophil_table_sva
big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 393, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9706 0.9729
## sample estimates:
## cor
## 0.9718
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("DESeq2 log2FC: Visit explicitly in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_neutrophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["padj"]] and merged[["deseq_adjp"]]
## S = 1e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9202
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Visit explicitly in model") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_neutrophil_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 8971 130
previous_genes <- rownames(big_table)[previous_sig_idx]
new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))## A Venn object on 2 sets named
## previous,new
## 00 10 01 11
## 0 51 92 79
test_new <- simple_gprofiler(new_genes)
test_new## A set of ontologies produced by gprofiler using 171
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 1 MF
## 12 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 3 TF
## 2 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old## A set of ontologies produced by gprofiler using 130
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are:
## 4 MF
## 67 BP
## 0 KEGG
## 5 REAC
## 2 WP
## 57 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_neutrophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_new.xlsx")## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17303 column: 23.
old_annotated <- merge(fData(t_neutrophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_old.xlsx")## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 9104 column: 101.
Once again, see how many of our favorite genes are here
sum(new_genes %in% expected_ensg)## [1] 8
When the above work was reviewed for publication, one concern raised arose because we are not considering the variance of each person in the contrasts above and are potentially over-representing the significance/power of the results because the models we are using do not include the donor. My previous understanding was that it is sufficient to include visit in the model because that would result in a model matrix which separates samples from each person; but I am now reasonably certain this is incorrect.
Therefore, the previous couple of blocks I now think are not approaching this problem correctly. We spent some time talking with Neal and discussing the various models and methods we employed. He made a series of suggestions about ways which might prove more correct. It seems that a mixed linear model is the most appropriate method for this type of query. I think I can perform that with limma, via voom. Let us try and see what happens. After doing some reading, I think the most appropriate way to perform this is to use dream() from varianceParition, which is cool because I really like it.
As I write this, we are reasonably certain that a mixed linear model provides a statistically correct framework for representing our expression data as a function of finaloutcome, visit, and person, e.g:
exprs ~ finaloutcome + visit + (1|donor)
In our discussions surrounding the various ways to compare/contrast the various results with/out the mixed linear model; there were a few primary goals laid out by Maria Adelaida and Neal. The goal is to observe if/how well our previous analyses agree with results obtained using a mixed linear model. There are a couple of caveats:
So, with that in mind, Maria Adelaida, Najib, and Neal focused on repeating a useful subset of the analyses using the mlm and comparing them to our extant results rather than re-implementing everything. The following are the things they suggested are the most important comparison points:
I have already written a skeleton function ‘dream_pairwise()’ as a sibling to my other *_pairwise() functions. I think that with some minor modifications (or maybe none at all, when I wrote it I was thinking about fun models that variancePartition supports) it can accept the mixed linear model of interest.
In the following block, the mixed formula will get passed to dream. I set the code to use the first element (after the intercept) as the ‘condition’ factor. Thus if I had made the model ‘~ 0 + visitnumber + finaloutcome + (1|donor)’, it would compare visits.
The dream_pairwise() function is responsible for making sure the variancePartition replacement functions are used for things like voom, lmfit, ebayes, and toptable. Strangely, some of them will automatically fall back to limma’s functions if there is no random-effect in the model, but others will not. As a result, I have a check and invoke the appropriate functions explicitly in dream_pairwise().
mixed_fstring <- "~ 0 + finaloutcome + visitnumber + (1|donor)"
mixed_form <- as.formula(mixed_fstring)
get_formula_factors(mixed_form)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] TRUE
##
## $mixers
## [1] "1" "donor"
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "visitnumber" "donor"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_xlsx <- write_de_table(mixed_eosinophil_de, type = "limma",
excel = glue("excel/mixed_eosinophil_table-v{ver}.xlsx"))
mixed_monocyte_de <- dream_pairwise(t_monocytes, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_xlsx <- write_de_table(mixed_monocyte_de, type = "limma",
excel = glue("excel/mixed_monocyte_table-v{ver}.xlsx"))
mixed_neutrophil_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_xlsx <- write_de_table(mixed_neutrophil_de, type = "limma",
excel = glue("excel/mixed_neutrophil_table-v{ver}.xlsx"))In other words, the following invocations will go much faster and likely be nearly (or completely) identical to the results from limma using the same model since the ‘mixed_fstring_fv’ does not have a random effect.
mixed_fstring_fv <- "~ 0 + finaloutcome + visitnumber"
mixed_form_fv <- as.formula(mixed_fstring_fv)
get_formula_factors(mixed_form_fv)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] FALSE
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "visitnumber"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fv_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_nodonor_xlsx <- write_de_table(mixed_eosinophil_fv_de, type = "limma",
excel = glue("excel/mixed_eosinophil_nodonor_table-v{ver}.xlsx"))
mixed_monocyte_fv_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_nodonor_xlsx <- write_de_table(mixed_monocyte_fv_de, type = "limma",
excel = glue("excel/mixed_monocyte_nodonor_table-v{ver}.xlsx"))
mixed_neutrophil_fv_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fv)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_nodonor_xlsx <- write_de_table(mixed_neutrophil_fv_de, type = "limma",
excel = glue("excel/mixed_neutrophil_nodonor_table-v{ver}.xlsx"))There are a couple observations here which are important and/or troubling:
Najib asked if I would compare the set of overlapping genes observed with the various significance metrics provided. I think I should write a little function to do this because there are ample opportunities for typeos.
deseq_df <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
deseq_gene_idx <- abs(deseq_df[["deseq_logfc"]]) >= 1.0 &
deseq_df[["deseq_adjp"]] <= 0.05
deseq_symb <- annot[deseq_gene_idx, "hgnc_symbol"]
deseq_symb## [1] "SYN1" "TENM1" "LTF" "PHLDB1"
## [5] "SLAMF7" "SCML1" "BCAR1" "TRIP13"
## [9] "SLC12A1" "ADAMTS2" "GP6" "SIGLEC1"
## [13] "SIRPG" "CHKB" "IL2RB" "CTSG"
## [17] "PLEK2" "NTSR1" "MSLN" "FZD3"
## [21] "TULP2" "HAS1" "GSDME" "PRUNE2"
## [25] "PALD1" "UNC5B" "CCL8" "FOLR1"
## [29] "RAD51AP1" "PRLR" "OTOF" "IL1R2"
## [33] "IL1R1" "CD274" "PRB2" "MAPK8IP1"
## [37] "CXCR4" "IL1B" "LAMP5" "MTUS1"
## [41] "IDO1" "TMTC1" "RSAD2" "HRK"
## [45] "IL6" "THBS1" "IFI44L" "ADAMTS10"
## [49] "GPR174" "LCN2" "TENM4" "CD8A"
## [53] "PGM5" "TBC1D24" "TRIM58" "HESX1"
## [57] "CAMP" "SAP30" "CFAP47" "AQP3"
## [61] "HECTD2" "IFI27" "C15orf48" "LAIR2"
## [65] "ANGPTL4" "RAB3IL1" "DDIT4" "KIF5C"
## [69] "COL3A1" "RNF150" "HTRA3" "S1PR1"
## [73] "LGALS4" "OLR1" "JUP" "HOXB2"
## [77] "SH3PXD2B" "FBXW8" "FBXO39" "HLA-DQB1"
## [81] "OR6C2" "CSMD1" "EFHC2" "OLFML1"
## [85] "USP18" "RGPD2" "PRR5" "RHCE"
## [89] "AKR1C3" "AFAP1" "MMP1" "HLA-DQA1"
## [93] "SCAMP5" "SUCNR1" "OOEP" "GLYATL3"
## [97] "HLA-DMA" "C9orf129" "NT5M" "POU5F1B"
## [101] "SMTNL1" "HLA-DQB2" "DEFA3" "UPK3B"
## [105] "PRR5-ARHGAP8" "TRNP1" "MGAM" "RNASE4"
## [109] "MRC1" "LINC02210-CRHR1" "FCGBP" "CCL3"
deseq_genes <- rownames(annot)[deseq_gene_idx]
overlap_sig <- function(mixed, deseq = deseq_genes, mixed_pcol = "P.Value",
annot = fData(t_monocytes), mixed_cutoff = 0.05, direction = "lt",
expected = expected_genes) {
if (direction == "lt") {
mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
mixed[[mixed_pcol]] <= mixed_cutoff
} else {
mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
mixed[[mixed_pcol]] >= mixed_cutoff
}
mixed_genes <- rownames(mixed)[mixed_sig_idx]
venn_lst <- list(
"mixed_model" = mixed_genes,
"DESeq_sva" = deseq)
mixed_deseq_comp <- Vennerable::Venn(venn_lst)
Vennerable::plot(mixed_deseq_comp)
mixed_ensg <- mixed_deseq_comp@IntersectionSets[["11"]]
overlap_genes <- annot[mixed_ensg, "hgnc_symbol"]
message("The set of all overlapping genes:")
print(overlap_genes)
found_idx <- expected %in% overlap_genes
message("Overlapping genes in the 10 favorites:")
print(expected[found_idx])
}In this block I am looking at the similarities between the mixed model with donor and without donor (which is no longer a mixed model; it is just using the dream functions (which I am pretty sure just fall back to limma when there is not a random effect)).
monocyte_visit_with_donor <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]
monocyte_visit_without_donor <- mixed_monocyte_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(monocyte_visit_with_donor, monocyte_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.66601108014503 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 384, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9637 0.9663
## sample estimates:
## cor
## 0.965
with_donor_genes <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(monocyte_visit_with_donor)[with_donor_genes]
donor_z_idx <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["z.std"]] >= 1.0
donor_z_genes <- rownames(monocyte_visit_with_donor)[donor_z_idx]
overlap_sig(monocyte_visit_with_donor)## The set of all overlapping genes:
## [1] "DEFA3" "CTSG" "HRK" "TENM1"
## [5] "PLEK2" "TMTC1" "HLA-DQB2" "IDO1"
## [9] "FBXO39" "PRR5-ARHGAP8" "TRIM58" "CCL3"
## [13] "IL6" "LAMP5" "CD274" "NTSR1"
## [17] "ANGPTL4" "DDIT4" "EFHC2" "SH3PXD2B"
## [21] "HECTD2" "MAPK8IP1" "SMTNL1" "PGM5"
## [25] "OLFML1" "TBC1D24" "ADAMTS10" "LINC02210-CRHR1"
## [29] "FZD3" "CHKB" "CXCR4" "GPR174"
## [33] "PRLR" "OOEP" "NT5M" "IL1R1"
## [37] "CAMP" "FBXW8"
## Overlapping genes in the 10 favorites:
## [1] "PRR5-ARHGAP8" "FBXO39" "SMTNL1"
overlap_sig(monocyte_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "POU5F1B" "HRK" "COL3A1" "RGPD2"
## [5] "RNF150" "HLA-DQB2" "IDO1" "FBXO39"
## [9] "PRR5-ARHGAP8" "CCL3" "SUCNR1" "PRR5"
## [13] "IL6" "CD274" "UPK3B" "EFHC2"
## [17] "CSMD1" "HECTD2" "MAPK8IP1" "SMTNL1"
## [21] "PGM5" "OLFML1" "TBC1D24" "LINC02210-CRHR1"
## [25] "FZD3" "CHKB" "IFI44L" "GPR174"
## [29] "PRLR" "OOEP" "HLA-DMA" "PRB2"
## [33] "FBXW8"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "FBXO39" "SMTNL1"
I would have sworn that the 2.0 z-score set was much larger than the p-value set and included all of the 10 genes. Apparently I was very wrong.
Now examine the various models for the neutrophil samples.
neutrophil_visit_with_donor <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]
neutrophil_visit_without_donor <- mixed_neutrophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(neutrophil_visit_with_donor, neutrophil_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.544934636423573 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 742, df = 19950, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9819 0.9828
## sample estimates:
## cor
## 0.9824
with_donor_genes <- abs(neutrophil_visit_with_donor[["logFC"]]) >= 1.0 &
neutrophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(neutrophil_visit_without_donor[["logFC"]]) >= 1.0 &
neutrophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(neutrophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(neutrophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
"with_donor" = donor_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 0 15 214
overlap_sig(neutrophil_visit_with_donor)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "TENM1" "AFAP1" "PRR5" "HRK"
## [6] "IFI44L" "CHKB" "FBXW8" "IDO1" "SMTNL1"
## [11] "POU5F1B" "OLR1" "AKR1C3"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "SMTNL1" "AFAP1"
overlap_sig(neutrophil_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "PRR5-ARHGAP8" "PRR5" "OTOF" "SIGLEC1" "HRK"
## [6] "IFI44L" "USP18" "JUP" "RSAD2" "CHKB"
## [11] "FBXW8" "FBXO39" "TBC1D24" "IDO1" "SMTNL1"
## [16] "POU5F1B"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "PRR5" "PRR5-ARHGAP8" "FBXO39" "RSAD2"
## [6] "SMTNL1" "USP18"
Finally, compare for the eosinophil samples.
eosinophil_visit_with_donor <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]
eosinophil_visit_without_donor <- mixed_eosinophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(eosinophil_visit_with_donor, eosinophil_visit_without_donor,
px = "adj.P.Val", py = "adj.P.Val",
lx = "logFC", ly = "logFC")
donor_aucc## These two tables have an aucc value of: 0.90324615179282 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 746, df = 19950, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.982 0.983
## sample estimates:
## cor
## 0.9825
with_donor_genes <- abs(eosinophil_visit_with_donor[["logFC"]]) >= 1.0 &
eosinophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(eosinophil_visit_without_donor[["logFC"]]) >= 1.0 &
eosinophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(eosinophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(eosinophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
"with_donor" = donor_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 0 26 3709
overlap_sig(eosinophil_visit_with_donor)## The set of all overlapping genes:
## [1] "HLA-DQB1" "IFI27" "SIGLEC1" "CFAP47" "OOEP" "SMTNL1"
## [7] "MGAM" "RNASE4" "CD274" "MSLN"
## Overlapping genes in the 10 favorites:
## [1] "IFI27" "SMTNL1"
overlap_sig(eosinophil_visit_with_donor,
mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)## The set of all overlapping genes:
## [1] "IFI27" "PRR5-ARHGAP8" "IFI44L" "SIGLEC1" "OTOF"
## [6] "CFAP47" "FBXO39" "OOEP" "SMTNL1" "USP18"
## [11] "RSAD2" "CD274" "HESX1"
## Overlapping genes in the 10 favorites:
## [1] "IFI44L" "IFI27" "PRR5-ARHGAP8" "FBXO39" "RSAD2"
## [6] "SMTNL1" "USP18"
Compare back to deseq with SVA and with SVA+visit and see how they look with respect to the dream invocation without the random donor effect.
deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
px = "deseq_adjp", py = "P.Value",
lx = "deseq_logfc", ly = "logFC")
deseq_aucc## These two tables have an aucc value of: 0.163279700348566 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 41, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3869 0.4223
## sample estimates:
## cor
## 0.4048
deseq_genes_idx <- abs(merged[["deseq_logfc"]]) >= 1.0 &
merged[["deseq_adjp"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
"with_donor" = deseq_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 152 44 8
This time we are comparing back to the monocyte results which did not include the random donor effect.
deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
px = "log2FoldChange", py = "padj",
lx = "adj.P.Val", ly = "logFC")
deseq_aucc## These two tables have an aucc value of: 0.359950025998036 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = -32, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3471 -0.3093
## sample estimates:
## cor
## -0.3283
deseq_genes_idx <- abs(merged[["log2FoldChange"]]) >= 1.0 &
merged[["padj"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
"with_donor" = deseq_genes,
"with_visit" = visit_genes)
Vennerable::Venn(venn_lst)## A Venn object on 2 sets named
## with_donor,with_visit
## 00 10 01 11
## 0 106 45 7
This is the orthologous approach: include a random effect for donor and ignore the visit effect.
mixed_fstring_fd <- "~ 0 + finaloutcome + (1|donor)"
mixed_form_fd <- as.formula(mixed_fstring_fd)
get_formula_factors(mixed_form_fd)## $type
## [1] "cellmeans"
##
## $interaction
## [1] FALSE
##
## $mixed
## [1] TRUE
##
## $mixers
## [1] "1" "donor"
##
## $cellmeans_intercept
## [1] "0"
##
## $factors
## [1] "finaloutcome" "donor"
##
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fd_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_fd_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_fd_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fd)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
Now see how these results compare against our previous results…
monocyte_dream_result <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, monocyte_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 184, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8655 0.8746
## sample estimates:
## cor
## 0.8701
t_cf_monocyte_de_sva[["dream"]] <- mixed_monocyte_de
test <- combine_de_tables(
t_cf_monocyte_de_sva, scale_p = TRUE,
excel = "excel/test_monocyte_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = monocyte_dream_result,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_monocyte_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10802 60
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 10812 50
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]## ensembl_gene_id ensembl_transcript_id version
## ENSG00000100288 ENSG00000100288 ENST00000479003 19
## ENSG00000104290 ENSG00000104290 ENST00000537916 11
## ENSG00000113494 ENSG00000113494 ENST00000231423 17
## ENSG00000120217 ENSG00000120217 ENST00000492923 14
## ENSG00000121653 ENSG00000121653 ENST00000395629 11
## ENSG00000131203 ENSG00000131203 ENST00000519154 13
## ENSG00000135116 ENSG00000135116 ENST00000586941 9
## ENSG00000136244 ENSG00000136244 ENST00000401630 12
## ENSG00000147138 ENSG00000147138 ENST00000645147 2
## ENSG00000154330 ENSG00000154330 ENST00000396392 13
## ENSG00000162065 ENSG00000162065 ENST00000627285 14
## ENSG00000165338 ENSG00000165338 ENST00000498446 16
## ENSG00000174989 ENSG00000174989 ENST00000455858 13
## ENSG00000177294 ENSG00000177294 ENST00000572251 7
## ENSG00000183690 ENSG00000183690 ENST00000343571 13
## ENSG00000183801 ENSG00000183801 ENST00000329293 8
## ENSG00000203907 ENSG00000203907 ENST00000441145 9
## ENSG00000214872 ENSG00000214872 ENST00000399154 8
## ENSG00000232629 ENSG00000232629 ENST00000427449 9
## ENSG00000248405 ENSG00000248405 ENST00000361473 10
## ENSG00000263715 ENSG00000263715 ENST00000587305 7
## ENSG00000277632 ENSG00000277632 ENST00000613922 2
## transcript_version
## ENSG00000100288 5
## ENSG00000104290 2
## ENSG00000113494 7
## ENSG00000120217 1
## ENSG00000121653 2
## ENSG00000131203 5
## ENSG00000135116 1
## ENSG00000136244 7
## ENSG00000147138 1
## ENSG00000154330 5
## ENSG00000162065 1
## ENSG00000165338 1
## ENSG00000174989 2
## ENSG00000177294 1
## ENSG00000183690 3
## ENSG00000183801 4
## ENSG00000203907 1
## ENSG00000214872 3
## ENSG00000232629 1
## ENSG00000248405 9
## ENSG00000263715 1
## ENSG00000277632 2
## description
## ENSG00000100288 choline kinase beta [Source:HGNC Symbol;Acc:HGNC:1938]
## ENSG00000104290 frizzled class receptor 3 [Source:HGNC Symbol;Acc:HGNC:4041]
## ENSG00000113494 prolactin receptor [Source:HGNC Symbol;Acc:HGNC:9446]
## ENSG00000120217 CD274 molecule [Source:HGNC Symbol;Acc:HGNC:17635]
## ENSG00000121653 mitogen-activated protein kinase 8 interacting protein 1 [Source:HGNC Symbol;Acc:HGNC:6882]
## ENSG00000131203 indoleamine 2,3-dioxygenase 1 [Source:HGNC Symbol;Acc:HGNC:6059]
## ENSG00000135116 harakiri, BCL2 interacting protein [Source:HGNC Symbol;Acc:HGNC:5185]
## ENSG00000136244 interleukin 6 [Source:HGNC Symbol;Acc:HGNC:6018]
## ENSG00000147138 G protein-coupled receptor 174 [Source:HGNC Symbol;Acc:HGNC:30245]
## ENSG00000154330 phosphoglucomutase 5 [Source:HGNC Symbol;Acc:HGNC:8908]
## ENSG00000162065 TBC1 domain family member 24 [Source:HGNC Symbol;Acc:HGNC:29203]
## ENSG00000165338 HECT domain E3 ubiquitin protein ligase 2 [Source:HGNC Symbol;Acc:HGNC:26736]
## ENSG00000174989 F-box and WD repeat domain containing 8 [Source:HGNC Symbol;Acc:HGNC:13597]
## ENSG00000177294 F-box protein 39 [Source:HGNC Symbol;Acc:HGNC:28565]
## ENSG00000183690 EF-hand domain containing 2 [Source:HGNC Symbol;Acc:HGNC:26233]
## ENSG00000183801 olfactomedin like 1 [Source:HGNC Symbol;Acc:HGNC:24473]
## ENSG00000203907 oocyte expressed protein [Source:HGNC Symbol;Acc:HGNC:21382]
## ENSG00000214872 smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000232629 major histocompatibility complex, class II, DQ beta 2 [Source:HGNC Symbol;Acc:HGNC:4945]
## ENSG00000248405 PRR5-ARHGAP8 readthrough [Source:HGNC Symbol;Acc:HGNC:34512]
## ENSG00000263715 LINC02210-CRHR1 readthrough [Source:HGNC Symbol;Acc:HGNC:51483]
## ENSG00000277632 C-C motif chemokine ligand 3 [Source:HGNC Symbol;Acc:HGNC:10627]
## gene_biotype cds_length chromosome_name strand start_position
## ENSG00000100288 protein_coding undefined 22 - 50578949
## ENSG00000104290 protein_coding 2001 8 + 28494205
## ENSG00000113494 protein_coding 1131 5 - 35048756
## ENSG00000120217 protein_coding undefined 9 + 5450503
## ENSG00000121653 protein_coding 2106 11 + 45885651
## ENSG00000131203 protein_coding 540 8 + 39902275
## ENSG00000135116 protein_coding undefined 12 - 116856144
## ENSG00000136244 protein_coding 570 7 + 22725884
## ENSG00000147138 protein_coding 1002 X + 79144663
## ENSG00000154330 protein_coding 1164 9 + 68328308
## ENSG00000162065 protein_coding 1662 16 + 2475051
## ENSG00000165338 protein_coding undefined 10 + 91409280
## ENSG00000174989 protein_coding 1599 12 + 116910950
## ENSG00000177294 protein_coding 151 17 + 6776215
## ENSG00000183690 protein_coding undefined X - 44147872
## ENSG00000183801 protein_coding 1209 11 + 7485388
## ENSG00000203907 protein_coding 201 6 - 73368555
## ENSG00000214872 protein_coding 1374 11 + 57542641
## ENSG00000232629 protein_coding 656 6 - 32756098
## ENSG00000248405 protein_coding 1695 22 + 44702233
## ENSG00000263715 protein_coding undefined 17 + 45620344
## ENSG00000277632 protein_coding 279 17 - 36088256
## end_position hgnc_symbol uniprot_gn_symbol
## ENSG00000100288 50601455 CHKB CHKB
## ENSG00000104290 28574267 FZD3 FZD3
## ENSG00000113494 35230487 PRLR PRLR
## ENSG00000120217 5470566 CD274 CD274
## ENSG00000121653 45906465 MAPK8IP1 MAPK8IP1
## ENSG00000131203 39928790 IDO1 IDO1
## ENSG00000135116 116881441 HRK HRK
## ENSG00000136244 22732002 IL6 IL6
## ENSG00000147138 79175315 GPR174 GPR174
## ENSG00000154330 68531061 PGM5 PGM5
## ENSG00000162065 2509560 TBC1D24 TBC1D24
## ENSG00000165338 91514829 HECTD2 HECTD2
## ENSG00000174989 117031148 FBXW8 FBXW8
## ENSG00000177294 6797101 FBXO39 FBXO39
## ENSG00000183690 44343672 EFHC2 EFHC2
## ENSG00000183801 7511377 OLFML1 OLFML1
## ENSG00000203907 73395133 OOEP OOEP
## ENSG00000214872 57550274 SMTNL1 SMTNL1
## ENSG00000232629 32763532 HLA-DQB2 HLA-DQB2
## ENSG00000248405 44862706 PRR5-ARHGAP8 PRR5-ARHGAP8
## ENSG00000263715 45835826 LINC02210-CRHR1 CRHR1
## ENSG00000277632 36090169 CCL3 CCL3
## transcript mean_cds_len
## ENSG00000100288 ENSG00000100288.5 1188
## ENSG00000104290 ENSG00000104290.2 1369.33333333333
## ENSG00000113494 ENSG00000113494.7 753.421052631579
## ENSG00000120217 ENSG00000120217.1 702
## ENSG00000121653 ENSG00000121653.2 2121
## ENSG00000131203 ENSG00000131203.5 642.166666666667
## ENSG00000135116 ENSG00000135116.1 166.5
## ENSG00000136244 ENSG00000136244.7 526.142857142857
## ENSG00000147138 ENSG00000147138.1 1002
## ENSG00000154330 ENSG00000154330.5 1161.33333333333
## ENSG00000162065 ENSG00000162065.1 1332
## ENSG00000165338 ENSG00000165338.1 1329.8
## ENSG00000174989 ENSG00000174989.2 1627
## ENSG00000177294 ENSG00000177294.1 530.333333333333
## ENSG00000183690 ENSG00000183690.3 2250
## ENSG00000183801 ENSG00000183801.4 698.5
## ENSG00000203907 ENSG00000203907.1 312
## ENSG00000214872 ENSG00000214872.3 1429.5
## ENSG00000232629 ENSG00000232629.1 740.75
## ENSG00000248405 ENSG00000248405.9 1713.66666666667
## ENSG00000263715 ENSG00000263715.1 723
## ENSG00000277632 ENSG00000277632.2 279
Vennerable::plot(compare)neutrophil_dream_result <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, neutrophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 175, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8727 0.8821
## sample estimates:
## cor
## 0.8775
t_cf_neutrophil_de_sva[["dream"]] <- mixed_neutrophil_de
test <- combine_de_tables(
t_cf_neutrophil_de_sva, scale_p = TRUE,
excel = "excel/test_neutrophil_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = neutrophil_dream_result,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_neutrophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 8971 130
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 9025 76
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_neutrophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]## ensembl_gene_id ensembl_transcript_id version
## ENSG00000020129 ENSG00000020129 ENST00000356090 16
## ENSG00000023171 ENSG00000023171 ENST00000532581 18
## ENSG00000078098 ENSG00000078098 ENST00000480044 14
## ENSG00000101342 ENSG00000101342 ENST00000602922 10
## ENSG00000106804 ENSG00000106804 ENST00000480188 8
## ENSG00000108387 ENSG00000108387 ENST00000321691 15
## ENSG00000108771 ENSG00000108771 ENST00000590637 13
## ENSG00000120008 ENSG00000120008 ENST00000605178 16
## ENSG00000120675 ENSG00000120675 ENST00000379221 6
## ENSG00000122729 ENSG00000122729 ENST00000309951 19
## ENSG00000131203 ENSG00000131203 ENST00000519154 13
## ENSG00000133106 ENSG00000133106 ENST00000313640 14
## ENSG00000134326 ENSG00000134326 ENST00000458098 11
## ENSG00000134809 ENSG00000134809 ENST00000257245 9
## ENSG00000135116 ENSG00000135116 ENST00000586941 9
## ENSG00000136514 ENSG00000136514 ENST00000259030 3
## ENSG00000137628 ENSG00000137628 ENST00000513997 17
## ENSG00000137959 ENSG00000137959 ENST00000450498 16
## ENSG00000138646 ENSG00000138646 ENST00000502913 9
## ENSG00000141664 ENSG00000141664 ENST00000585873 10
## ENSG00000145244 ENSG00000145244 ENST00000610355 12
## ENSG00000146205 ENSG00000146205 ENST00000481071 13
## ENSG00000151692 ENSG00000151692 ENST00000432850 15
## ENSG00000152056 ENSG00000152056 ENST00000444408 17
## ENSG00000155158 ENSG00000155158 ENST00000512701 20
## ENSG00000155363 ENSG00000155363 ENST00000468624 18
## ENSG00000160469 ENSG00000160469 ENST00000585418 17
## ENSG00000160932 ENSG00000160932 ENST00000521182 11
## ENSG00000162772 ENSG00000162772 ENST00000492118 17
## ENSG00000163644 ENSG00000163644 ENST00000514204 15
## ENSG00000164125 ENSG00000164125 ENST00000592057 15
## ENSG00000164136 ENSG00000164136 ENST00000296545 17
## ENSG00000166257 ENSG00000166257 ENST00000392770 9
## ENSG00000167014 ENSG00000167014 ENST00000557864 11
## ENSG00000170448 ENSG00000170448 ENST00000464756 12
## ENSG00000171365 ENSG00000171365 ENST00000642885 17
## ENSG00000172159 ENSG00000172159 ENST00000376438 16
## ENSG00000172716 ENSG00000172716 ENST00000591682 16
## ENSG00000174989 ENSG00000174989 ENST00000455858 13
## ENSG00000179044 ENSG00000179044 ENST00000564324 16
## ENSG00000180061 ENSG00000180061 ENST00000585918 10
## ENSG00000186654 ENSG00000186654 ENST00000403696 21
## ENSG00000187608 ENSG00000187608 ENST00000624697 10
## ENSG00000188157 ENSG00000188157 ENST00000620552 15
## ENSG00000188290 ENSG00000188290 ENST00000304952 10
## ENSG00000196141 ENSG00000196141 ENST00000421573 14
## ENSG00000196369 ENSG00000196369 ENST00000494534 11
## ENSG00000196405 ENSG00000196405 ENST00000555048 13
## ENSG00000198087 ENSG00000198087 ENST00000479857 7
## ENSG00000214872 ENSG00000214872 ENST00000399154 8
## ENSG00000228696 ENSG00000228696 ENST00000575960 9
## ENSG00000248405 ENSG00000248405 ENST00000361473 10
## ENSG00000269720 ENSG00000269720 ENST00000597169 2
## transcript_version
## ENSG00000020129 8
## ENSG00000023171 5
## ENSG00000078098 5
## ENSG00000101342 5
## ENSG00000106804 1
## ENSG00000108387 3
## ENSG00000108771 1
## ENSG00000120008 5
## ENSG00000120675 4
## ENSG00000122729 8
## ENSG00000131203 5
## ENSG00000133106 11
## ENSG00000134326 5
## ENSG00000134809 9
## ENSG00000135116 1
## ENSG00000136514 3
## ENSG00000137628 1
## ENSG00000137959 1
## ENSG00000138646 1
## ENSG00000141664 5
## ENSG00000145244 4
## ENSG00000146205 1
## ENSG00000151692 1
## ENSG00000152056 1
## ENSG00000155158 6
## ENSG00000155363 5
## ENSG00000160469 1
## ENSG00000160932 5
## ENSG00000162772 2
## ENSG00000163644 1
## ENSG00000164125 1
## ENSG00000164136 11
## ENSG00000166257 6
## ENSG00000167014 1
## ENSG00000170448 6
## ENSG00000171365 1
## ENSG00000172159 5
## ENSG00000172716 5
## ENSG00000174989 2
## ENSG00000179044 5
## ENSG00000180061 5
## ENSG00000186654 5
## ENSG00000187608 4
## ENSG00000188157 4
## ENSG00000188290 10
## ENSG00000196141 5
## ENSG00000196369 1
## ENSG00000196405 5
## ENSG00000198087 1
## ENSG00000214872 3
## ENSG00000228696 5
## ENSG00000248405 9
## ENSG00000269720 1
## description
## ENSG00000020129 neurochondrin [Source:HGNC Symbol;Acc:HGNC:17597]
## ENSG00000023171 GRAM domain containing 1B [Source:HGNC Symbol;Acc:HGNC:29214]
## ENSG00000078098 fibroblast activation protein alpha [Source:HGNC Symbol;Acc:HGNC:3590]
## ENSG00000101342 TBC/LysM-associated domain containing 2 [Source:HGNC Symbol;Acc:HGNC:16112]
## ENSG00000106804 complement C5 [Source:HGNC Symbol;Acc:HGNC:1331]
## ENSG00000108387 septin 4 [Source:HGNC Symbol;Acc:HGNC:9165]
## ENSG00000108771 DExH-box helicase 58 [Source:HGNC Symbol;Acc:HGNC:29517]
## ENSG00000120008 WD repeat domain 11 [Source:HGNC Symbol;Acc:HGNC:13831]
## ENSG00000120675 DnaJ heat shock protein family (Hsp40) member C15 [Source:HGNC Symbol;Acc:HGNC:20325]
## ENSG00000122729 aconitase 1 [Source:HGNC Symbol;Acc:HGNC:117]
## ENSG00000131203 indoleamine 2,3-dioxygenase 1 [Source:HGNC Symbol;Acc:HGNC:6059]
## ENSG00000133106 epithelial stromal interaction 1 [Source:HGNC Symbol;Acc:HGNC:16465]
## ENSG00000134326 cytidine/uridine monophosphate kinase 2 [Source:HGNC Symbol;Acc:HGNC:27015]
## ENSG00000134809 translocase of inner mitochondrial membrane 10 [Source:HGNC Symbol;Acc:HGNC:11814]
## ENSG00000135116 harakiri, BCL2 interacting protein [Source:HGNC Symbol;Acc:HGNC:5185]
## ENSG00000136514 receptor transporter protein 4 [Source:HGNC Symbol;Acc:HGNC:23992]
## ENSG00000137628 DExD/H-box helicase 60 [Source:HGNC Symbol;Acc:HGNC:25942]
## ENSG00000137959 interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817]
## ENSG00000138646 HECT and RLD domain containing E3 ubiquitin protein ligase 5 [Source:HGNC Symbol;Acc:HGNC:24368]
## ENSG00000141664 zinc finger CCHC-type containing 2 [Source:HGNC Symbol;Acc:HGNC:22916]
## ENSG00000145244 corin, serine peptidase [Source:HGNC Symbol;Acc:HGNC:19012]
## ENSG00000146205 anoctamin 7 [Source:HGNC Symbol;Acc:HGNC:31677]
## ENSG00000151692 ring finger protein 144A [Source:HGNC Symbol;Acc:HGNC:20457]
## ENSG00000152056 adaptor related protein complex 1 subunit sigma 3 [Source:HGNC Symbol;Acc:HGNC:18971]
## ENSG00000155158 tetratricopeptide repeat domain 39B [Source:HGNC Symbol;Acc:HGNC:23704]
## ENSG00000155363 Mov10 RISC complex RNA helicase [Source:HGNC Symbol;Acc:HGNC:7200]
## ENSG00000160469 BR serine/threonine kinase 1 [Source:HGNC Symbol;Acc:HGNC:18994]
## ENSG00000160932 lymphocyte antigen 6 family member E [Source:HGNC Symbol;Acc:HGNC:6727]
## ENSG00000162772 activating transcription factor 3 [Source:HGNC Symbol;Acc:HGNC:785]
## ENSG00000163644 protein phosphatase, Mg2+/Mn2+ dependent 1K [Source:HGNC Symbol;Acc:HGNC:25415]
## ENSG00000164125 golgi associated kinase 1B [Source:HGNC Symbol;Acc:HGNC:25312]
## ENSG00000164136 interleukin 15 [Source:HGNC Symbol;Acc:HGNC:5977]
## ENSG00000166257 sodium voltage-gated channel beta subunit 3 [Source:HGNC Symbol;Acc:HGNC:20665]
## ENSG00000167014 telomere repeat binding bouquet formation protein 2 [Source:HGNC Symbol;Acc:HGNC:28520]
## ENSG00000170448 nuclear transcription factor, X-box binding like 1 [Source:HGNC Symbol;Acc:HGNC:18726]
## ENSG00000171365 chloride voltage-gated channel 5 [Source:HGNC Symbol;Acc:HGNC:2023]
## ENSG00000172159 FERM domain containing 3 [Source:HGNC Symbol;Acc:HGNC:24125]
## ENSG00000172716 schlafen family member 11 [Source:HGNC Symbol;Acc:HGNC:26633]
## ENSG00000174989 F-box and WD repeat domain containing 8 [Source:HGNC Symbol;Acc:HGNC:13597]
## ENSG00000179044 exocyst complex component 3 like 1 [Source:HGNC Symbol;Acc:HGNC:27540]
## ENSG00000180061 transmembrane protein 150B [Source:HGNC Symbol;Acc:HGNC:34415]
## ENSG00000186654 proline rich 5 [Source:HGNC Symbol;Acc:HGNC:31682]
## ENSG00000187608 ISG15 ubiquitin like modifier [Source:HGNC Symbol;Acc:HGNC:4053]
## ENSG00000188157 agrin [Source:HGNC Symbol;Acc:HGNC:329]
## ENSG00000188290 hes family bHLH transcription factor 4 [Source:HGNC Symbol;Acc:HGNC:24149]
## ENSG00000196141 spermatogenesis associated serine rich 2 like [Source:HGNC Symbol;Acc:HGNC:24574]
## ENSG00000196369 SLIT-ROBO Rho GTPase activating protein 2B [Source:HGNC Symbol;Acc:HGNC:35237]
## ENSG00000196405 Enah/Vasp-like [Source:HGNC Symbol;Acc:HGNC:20234]
## ENSG00000198087 CD2 associated protein [Source:HGNC Symbol;Acc:HGNC:14258]
## ENSG00000214872 smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000228696 ADP ribosylation factor like GTPase 17B [Source:HGNC Symbol;Acc:HGNC:32387]
## ENSG00000248405 PRR5-ARHGAP8 readthrough [Source:HGNC Symbol;Acc:HGNC:34512]
## ENSG00000269720 coiled-coil domain containing 194 [Source:HGNC Symbol;Acc:HGNC:53438]
## gene_biotype cds_length chromosome_name strand start_position
## ENSG00000020129 protein_coding 2190 1 + 35557473
## ENSG00000023171 protein_coding undefined 11 + 123358428
## ENSG00000078098 protein_coding undefined 2 - 162170684
## ENSG00000101342 protein_coding 648 20 + 36876121
## ENSG00000106804 protein_coding undefined 9 - 120952335
## ENSG00000108387 protein_coding 1713 17 - 58520250
## ENSG00000108771 protein_coding undefined 17 - 42101404
## ENSG00000120008 protein_coding undefined 10 + 120851305
## ENSG00000120675 protein_coding 453 13 + 43023203
## ENSG00000122729 protein_coding 2670 9 + 32384603
## ENSG00000131203 protein_coding 540 8 + 39902275
## ENSG00000133106 protein_coding 1233 13 - 42886388
## ENSG00000134326 protein_coding 1101 2 - 6840570
## ENSG00000134809 protein_coding 273 11 - 57528464
## ENSG00000135116 protein_coding undefined 12 - 116856144
## ENSG00000136514 protein_coding 741 3 + 187368385
## ENSG00000137628 protein_coding 314 4 - 168216294
## ENSG00000137959 protein_coding 699 1 + 78619922
## ENSG00000138646 protein_coding undefined 4 + 88457119
## ENSG00000141664 protein_coding 3295 18 + 62523025
## ENSG00000145244 protein_coding 2817 4 - 47593999
## ENSG00000146205 protein_coding undefined 2 + 241188509
## ENSG00000151692 protein_coding 794 2 + 6917412
## ENSG00000152056 protein_coding 170 2 - 223751686
## ENSG00000155158 protein_coding 2049 9 - 15163622
## ENSG00000155363 protein_coding undefined 1 + 112673141
## ENSG00000160469 protein_coding 1032 19 + 55282072
## ENSG00000160932 protein_coding 126 8 + 143017982
## ENSG00000162772 protein_coding undefined 1 + 212565334
## ENSG00000163644 protein_coding 549 4 - 88257620
## ENSG00000164125 protein_coding 996 4 - 158124474
## ENSG00000164136 protein_coding 489 4 + 141636583
## ENSG00000166257 protein_coding 648 11 - 123629187
## ENSG00000167014 protein_coding 131 15 + 44956687
## ENSG00000170448 protein_coding 2202 4 - 47847233
## ENSG00000171365 protein_coding 2241 X + 49922596
## ENSG00000172159 protein_coding 1671 9 - 83242990
## ENSG00000172716 protein_coding 369 17 - 35350305
## ENSG00000174989 protein_coding 1599 12 + 116910950
## ENSG00000179044 protein_coding 243 16 - 67184379
## ENSG00000180061 protein_coding 457 19 - 55312801
## ENSG00000186654 protein_coding 532 22 + 44668547
## ENSG00000187608 protein_coding 474 1 + 1001138
## ENSG00000188157 protein_coding 5793 1 + 1020120
## ENSG00000188290 protein_coding 666 1 - 998962
## ENSG00000196141 protein_coding 463 2 + 200305881
## ENSG00000196369 protein_coding undefined 1 - 144887265
## ENSG00000196405 protein_coding 852 14 + 99971449
## ENSG00000198087 protein_coding undefined 6 + 47477789
## ENSG00000214872 protein_coding 1374 11 + 57542641
## ENSG00000228696 protein_coding 546 17 - 46274784
## ENSG00000248405 protein_coding 1695 22 + 44702233
## ENSG00000269720 protein_coding undefined 19 - 17390509
## end_position hgnc_symbol uniprot_gn_symbol transcript
## ENSG00000020129 35567274 NCDN NCDN ENSG00000020129.8
## ENSG00000023171 123627774 GRAMD1B GRAMD1B ENSG00000023171.5
## ENSG00000078098 162245151 FAP FAP ENSG00000078098.5
## ENSG00000101342 36894235 TLDC2 TLDC2 ENSG00000101342.5
## ENSG00000106804 121050275 C5 C5 ENSG00000106804.1
## ENSG00000108387 58544368 SEPTIN4 C17orf47 ENSG00000108387.3
## ENSG00000108771 42112714 DHX58 DHX58 ENSG00000108771.1
## ENSG00000120008 120909524 WDR11 WDR11 ENSG00000120008.5
## ENSG00000120675 43114213 DNAJC15 DNAJC15 ENSG00000120675.4
## ENSG00000122729 32454769 ACO1 ACO1 ENSG00000122729.8
## ENSG00000131203 39928790 IDO1 IDO1 ENSG00000131203.5
## ENSG00000133106 42992271 EPSTI1 EPSTI1 ENSG00000133106.11
## ENSG00000134326 6866635 CMPK2 CMPK2 ENSG00000134326.5
## ENSG00000134809 57530803 TIMM10 TIMM10 ENSG00000134809.9
## ENSG00000135116 116881441 HRK HRK ENSG00000135116.1
## ENSG00000136514 187372076 RTP4 RTP4 ENSG00000136514.3
## ENSG00000137628 168318804 DDX60 DDX60 ENSG00000137628.1
## ENSG00000137959 78646145 IFI44L IFI44L ENSG00000137959.1
## ENSG00000138646 88506163 HERC5 HERC5 ENSG00000138646.1
## ENSG00000141664 62587709 ZCCHC2 ZCCHC2 ENSG00000141664.5
## ENSG00000145244 47838106 CORIN CORIN ENSG00000145244.4
## ENSG00000146205 241225377 ANO7 ANO7 ENSG00000146205.1
## ENSG00000151692 7068286 RNF144A RNF144A ENSG00000151692.1
## ENSG00000152056 223838027 AP1S3 AP1S3 ENSG00000152056.1
## ENSG00000155158 15307360 TTC39B TTC39B ENSG00000155158.6
## ENSG00000155363 112700746 MOV10 MOV10 ENSG00000155363.5
## ENSG00000160469 55312562 BRSK1 BRSK1 ENSG00000160469.1
## ENSG00000160932 143023832 LY6E LY6E ENSG00000160932.5
## ENSG00000162772 212620777 ATF3 ATF3 ENSG00000162772.2
## ENSG00000163644 88284769 PPM1K PPM1K ENSG00000163644.1
## ENSG00000164125 158173318 GASK1B GASK1B ENSG00000164125.1
## ENSG00000164136 141733987 IL15 IL15 ENSG00000164136.11
## ENSG00000166257 123655244 SCN3B SCN3B ENSG00000166257.6
## ENSG00000167014 44979229 TERB2 TERB2 ENSG00000167014.1
## ENSG00000170448 47914667 NFXL1 NFXL1 ENSG00000170448.6
## ENSG00000171365 50099235 CLCN5 CLCN5 ENSG00000171365.1
## ENSG00000172159 83538546 FRMD3 FRMD3 ENSG00000172159.5
## ENSG00000172716 35373701 SLFN11 SLFN11 ENSG00000172716.5
## ENSG00000174989 117031148 FBXW8 FBXW8 ENSG00000174989.2
## ENSG00000179044 67190185 EXOC3L1 EXOC3L1 ENSG00000179044.5
## ENSG00000180061 55334048 TMEM150B TMEM150B ENSG00000180061.5
## ENSG00000186654 44737681 PRR5 PRR5 ENSG00000186654.5
## ENSG00000187608 1014540 ISG15 ISG15 ENSG00000187608.4
## ENSG00000188157 1056118 AGRN AGRN ENSG00000188157.4
## ENSG00000188290 1000172 HES4 HES4 ENSG00000188290.10
## ENSG00000196141 200482264 SPATS2L SPATS2L ENSG00000196141.5
## ENSG00000196369 145095528 SRGAP2B SRGAP2B ENSG00000196369.1
## ENSG00000196405 100144236 EVL EVL ENSG00000196405.5
## ENSG00000198087 47627263 CD2AP CD2AP ENSG00000198087.1
## ENSG00000214872 57550274 SMTNL1 SMTNL1 ENSG00000214872.3
## ENSG00000228696 46361797 ARL17B ARL17A ENSG00000228696.5
## ENSG00000248405 44862706 PRR5-ARHGAP8 PRR5-ARHGAP8 ENSG00000248405.9
## ENSG00000269720 17394158 CCDC194 CCDC194 ENSG00000269720.1
## mean_cds_len
## ENSG00000020129 1566.4
## ENSG00000023171 1518.2
## ENSG00000078098 1117.57142857143
## ENSG00000101342 460
## ENSG00000106804 5031
## ENSG00000108387 1062.89473684211
## ENSG00000108771 773
## ENSG00000120008 1374.75
## ENSG00000120675 453
## ENSG00000122729 2670
## ENSG00000131203 642.166666666667
## ENSG00000133106 718.4
## ENSG00000134326 1227
## ENSG00000134809 273
## ENSG00000135116 166.5
## ENSG00000136514 741
## ENSG00000137628 1596.75
## ENSG00000137959 783.333333333333
## ENSG00000138646 2532
## ENSG00000141664 1908.16666666667
## ENSG00000145244 2801.5
## ENSG00000146205 1637.25
## ENSG00000151692 500.75
## ENSG00000152056 345.25
## ENSG00000155158 1536.5
## ENSG00000155363 2970
## ENSG00000160469 1293.5
## ENSG00000160932 306.642857142857
## ENSG00000162772 440.555555555556
## ENSG00000163644 512.142857142857
## ENSG00000164125 898.5
## ENSG00000164136 448.5
## ENSG00000166257 560.428571428571
## ENSG00000167014 313.666666666667
## ENSG00000170448 2602.5
## ENSG00000171365 1620.11111111111
## ENSG00000172159 1213.5
## ENSG00000172716 749.8
## ENSG00000174989 1627
## ENSG00000179044 1423.6
## ENSG00000180061 366.833333333333
## ENSG00000186654 833.727272727273
## ENSG00000187608 467.666666666667
## ENSG00000188157 5911.5
## ENSG00000188290 660
## ENSG00000196141 950.952380952381
## ENSG00000196369 951.5
## ENSG00000196405 702.090909090909
## ENSG00000198087 1920
## ENSG00000214872 1429.5
## ENSG00000228696 391.888888888889
## ENSG00000248405 1713.66666666667
## ENSG00000269720 705
Vennerable::plot(compare)eosinophil_dream_result <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]
big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, eosinophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 177, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8601 0.8697
## sample estimates:
## cor
## 0.865
t_cf_eosinophil_de_sva[["dream"]] <- mixed_eosinophil_de
test <- combine_de_tables(
t_cf_eosinophil_de_sva, scale_p = TRUE,
excel = "excel/test_eosinophil_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = eosinophil_dream_result,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_eosinophil_logfc.svg")
logfc_plot
dev.off()## png
## 2
logfc_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10416 116
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 10467 65
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_eosinophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]## ensembl_gene_id ensembl_transcript_id version
## ENSG00000011105 ENSG00000011105 ENST00000649909 14
## ENSG00000087237 ENSG00000087237 ENST00000379780 12
## ENSG00000089127 ENSG00000089127 ENST00000540589 13
## ENSG00000106211 ENSG00000106211 ENST00000248553 9
## ENSG00000108679 ENSG00000108679 ENST00000592255 13
## ENSG00000108774 ENSG00000108774 ENST00000547517 14
## ENSG00000115267 ENSG00000115267 ENST00000464129 8
## ENSG00000117228 ENSG00000117228 ENST00000495131 10
## ENSG00000118785 ENSG00000118785 ENST00000513981 14
## ENSG00000123689 ENSG00000123689 ENST00000367029 6
## ENSG00000126709 ENSG00000126709 ENST00000339145 15
## ENSG00000130203 ENSG00000130203 ENST00000446996 10
## ENSG00000135047 ENSG00000135047 ENST00000495822 15
## ENSG00000136235 ENSG00000136235 ENST00000479625 16
## ENSG00000136689 ENSG00000136689 ENST00000472292 18
## ENSG00000137965 ENSG00000137965 ENST00000476911 11
## ENSG00000138646 ENSG00000138646 ENST00000502913 9
## ENSG00000141574 ENSG00000141574 ENST00000269389 8
## ENSG00000145431 ENSG00000145431 ENST00000511985 11
## ENSG00000152689 ENSG00000152689 ENST00000490150 18
## ENSG00000152778 ENSG00000152778 ENST00000371795 9
## ENSG00000154451 ENSG00000154451 ENST00000481145 14
## ENSG00000157680 ENSG00000157680 ENST00000424189 15
## ENSG00000160932 ENSG00000160932 ENST00000521182 11
## ENSG00000161055 ENSG00000161055 ENST00000292641 4
## ENSG00000162645 ENSG00000162645 ENST00000370466 13
## ENSG00000162654 ENSG00000162654 ENST00000355754 9
## ENSG00000165949 ENSG00000165949 ENST00000611954 12
## ENSG00000173762 ENSG00000173762 ENST00000312648 8
## ENSG00000174276 ENSG00000174276 ENST00000310597 7
## ENSG00000177989 ENSG00000177989 ENST00000405135 13
## ENSG00000179455 ENSG00000179455 ENST00000649065 9
## ENSG00000187569 ENSG00000187569 ENST00000345088 3
## ENSG00000187608 ENSG00000187608 ENST00000624697 10
## ENSG00000196141 ENSG00000196141 ENST00000421573 14
## ENSG00000198178 ENSG00000198178 ENST00000537530 10
## ENSG00000198286 ENSG00000198286 ENST00000355508 9
## ENSG00000203907 ENSG00000203907 ENST00000441145 9
## ENSG00000204291 ENSG00000204291 ENST00000471477 11
## ENSG00000204632 ENSG00000204632 ENST00000428701 11
## ENSG00000214872 ENSG00000214872 ENST00000399154 8
## ENSG00000221963 ENSG00000221963 ENST00000409652 6
## ENSG00000231767 ENSG00000231767 ENST00000443181 4
## ENSG00000269720 ENSG00000269720 ENST00000597169 2
## transcript_version
## ENSG00000011105 1
## ENSG00000087237 6
## ENSG00000089127 2
## ENSG00000106211 7
## ENSG00000108679 1
## ENSG00000108774 5
## ENSG00000115267 1
## ENSG00000117228 1
## ENSG00000118785 5
## ENSG00000123689 5
## ENSG00000126709 8
## ENSG00000130203 5
## ENSG00000135047 1
## ENSG00000136235 1
## ENSG00000136689 1
## ENSG00000137965 1
## ENSG00000138646 1
## ENSG00000141574 8
## ENSG00000145431 1
## ENSG00000152689 1
## ENSG00000152778 5
## ENSG00000154451 1
## ENSG00000157680 6
## ENSG00000160932 5
## ENSG00000161055 4
## ENSG00000162645 4
## ENSG00000162654 7
## ENSG00000165949 4
## ENSG00000173762 8
## ENSG00000174276 6
## ENSG00000177989 5
## ENSG00000179455 1
## ENSG00000187569 3
## ENSG00000187608 4
## ENSG00000196141 5
## ENSG00000198178 1
## ENSG00000198286 3
## ENSG00000203907 1
## ENSG00000204291 1
## ENSG00000204632 5
## ENSG00000214872 3
## ENSG00000221963 5
## ENSG00000231767 2
## ENSG00000269720 1
## description
## ENSG00000011105 tetraspanin 9 [Source:HGNC Symbol;Acc:HGNC:21640]
## ENSG00000087237 cholesteryl ester transfer protein [Source:HGNC Symbol;Acc:HGNC:1869]
## ENSG00000089127 2'-5'-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086]
## ENSG00000106211 heat shock protein family B (small) member 1 [Source:HGNC Symbol;Acc:HGNC:5246]
## ENSG00000108679 galectin 3 binding protein [Source:HGNC Symbol;Acc:HGNC:6564]
## ENSG00000108774 RAB5C, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:9785]
## ENSG00000115267 interferon induced with helicase C domain 1 [Source:HGNC Symbol;Acc:HGNC:18873]
## ENSG00000117228 guanylate binding protein 1 [Source:HGNC Symbol;Acc:HGNC:4182]
## ENSG00000118785 secreted phosphoprotein 1 [Source:HGNC Symbol;Acc:HGNC:11255]
## ENSG00000123689 G0/G1 switch 2 [Source:HGNC Symbol;Acc:HGNC:30229]
## ENSG00000126709 interferon alpha inducible protein 6 [Source:HGNC Symbol;Acc:HGNC:4054]
## ENSG00000130203 apolipoprotein E [Source:HGNC Symbol;Acc:HGNC:613]
## ENSG00000135047 cathepsin L [Source:HGNC Symbol;Acc:HGNC:2537]
## ENSG00000136235 glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462]
## ENSG00000136689 interleukin 1 receptor antagonist [Source:HGNC Symbol;Acc:HGNC:6000]
## ENSG00000137965 interferon induced protein 44 [Source:HGNC Symbol;Acc:HGNC:16938]
## ENSG00000138646 HECT and RLD domain containing E3 ubiquitin protein ligase 5 [Source:HGNC Symbol;Acc:HGNC:24368]
## ENSG00000141574 secreted and transmembrane 1 [Source:HGNC Symbol;Acc:HGNC:10707]
## ENSG00000145431 platelet derived growth factor C [Source:HGNC Symbol;Acc:HGNC:8801]
## ENSG00000152689 RAS guanyl releasing protein 3 [Source:HGNC Symbol;Acc:HGNC:14545]
## ENSG00000152778 interferon induced protein with tetratricopeptide repeats 5 [Source:HGNC Symbol;Acc:HGNC:13328]
## ENSG00000154451 guanylate binding protein 5 [Source:HGNC Symbol;Acc:HGNC:19895]
## ENSG00000157680 diacylglycerol kinase iota [Source:HGNC Symbol;Acc:HGNC:2855]
## ENSG00000160932 lymphocyte antigen 6 family member E [Source:HGNC Symbol;Acc:HGNC:6727]
## ENSG00000161055 secretoglobin family 3A member 1 [Source:HGNC Symbol;Acc:HGNC:18384]
## ENSG00000162645 guanylate binding protein 2 [Source:HGNC Symbol;Acc:HGNC:4183]
## ENSG00000162654 guanylate binding protein 4 [Source:HGNC Symbol;Acc:HGNC:20480]
## ENSG00000165949 interferon alpha inducible protein 27 [Source:HGNC Symbol;Acc:HGNC:5397]
## ENSG00000173762 CD7 molecule [Source:HGNC Symbol;Acc:HGNC:1695]
## ENSG00000174276 zinc finger HIT-type containing 2 [Source:HGNC Symbol;Acc:HGNC:1177]
## ENSG00000177989 outer dense fiber of sperm tails 3B [Source:HGNC Symbol;Acc:HGNC:34388]
## ENSG00000179455 makorin ring finger protein 3 [Source:HGNC Symbol;Acc:HGNC:7114]
## ENSG00000187569 developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199]
## ENSG00000187608 ISG15 ubiquitin like modifier [Source:HGNC Symbol;Acc:HGNC:4053]
## ENSG00000196141 spermatogenesis associated serine rich 2 like [Source:HGNC Symbol;Acc:HGNC:24574]
## ENSG00000198178 C-type lectin domain family 4 member C [Source:HGNC Symbol;Acc:HGNC:13258]
## ENSG00000198286 caspase recruitment domain family member 11 [Source:HGNC Symbol;Acc:HGNC:16393]
## ENSG00000203907 oocyte expressed protein [Source:HGNC Symbol;Acc:HGNC:21382]
## ENSG00000204291 collagen type XV alpha 1 chain [Source:HGNC Symbol;Acc:HGNC:2192]
## ENSG00000204632 major histocompatibility complex, class I, G [Source:HGNC Symbol;Acc:HGNC:4964]
## ENSG00000214872 smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000221963 apolipoprotein L6 [Source:HGNC Symbol;Acc:HGNC:14870]
## ENSG00000231767 novel protein similar to ribosomal protein S27a RPS27A
## ENSG00000269720 coiled-coil domain containing 194 [Source:HGNC Symbol;Acc:HGNC:53438]
## gene_biotype cds_length chromosome_name strand start_position
## ENSG00000011105 protein_coding 395 12 + 3077355
## ENSG00000087237 protein_coding 1302 16 + 56961923
## ENSG00000089127 protein_coding 68 12 + 112906783
## ENSG00000106211 protein_coding 618 7 + 76302673
## ENSG00000108679 protein_coding undefined 17 - 78971238
## ENSG00000108774 protein_coding 750 17 - 42124976
## ENSG00000115267 protein_coding undefined 2 - 162267074
## ENSG00000117228 protein_coding undefined 1 - 89052319
## ENSG00000118785 protein_coding undefined 4 + 87975650
## ENSG00000123689 protein_coding 312 1 + 209675412
## ENSG00000126709 protein_coding 417 1 - 27666064
## ENSG00000130203 protein_coding 648 19 + 44905791
## ENSG00000135047 protein_coding undefined 9 + 87726109
## ENSG00000136235 protein_coding undefined 7 + 23235967
## ENSG00000136689 protein_coding undefined 2 + 113107214
## ENSG00000137965 protein_coding undefined 1 + 78649796
## ENSG00000138646 protein_coding undefined 4 + 88457119
## ENSG00000141574 protein_coding 747 17 - 82321024
## ENSG00000145431 protein_coding undefined 4 - 156760454
## ENSG00000152689 protein_coding undefined 2 + 33436324
## ENSG00000152778 protein_coding 1449 10 + 89414568
## ENSG00000154451 protein_coding undefined 1 - 89258950
## ENSG00000157680 protein_coding 3237 7 - 137381037
## ENSG00000160932 protein_coding 126 8 + 143017982
## ENSG00000161055 protein_coding 315 5 - 180590105
## ENSG00000162645 protein_coding 1776 1 - 89106132
## ENSG00000162654 protein_coding 1923 1 - 89181144
## ENSG00000165949 protein_coding 180 14 + 94104836
## ENSG00000173762 protein_coding 723 17 - 82314868
## ENSG00000174276 protein_coding 1212 11 - 65116403
## ENSG00000177989 protein_coding 846 22 - 50529710
## ENSG00000179455 protein_coding 486 15 + 23565674
## ENSG00000187569 protein_coding 480 12 + 7711433
## ENSG00000187608 protein_coding 474 1 + 1001138
## ENSG00000196141 protein_coding 463 2 + 200305881
## ENSG00000198178 protein_coding 267 12 - 7729415
## ENSG00000198286 protein_coding 815 7 - 2906141
## ENSG00000203907 protein_coding 201 6 - 73368555
## ENSG00000204291 protein_coding undefined 9 + 98943179
## ENSG00000204632 protein_coding 1017 6 + 29826967
## ENSG00000214872 protein_coding 1374 11 + 57542641
## ENSG00000221963 protein_coding 1032 22 + 35648446
## ENSG00000231767 protein_coding 468 1 + 192716132
## ENSG00000269720 protein_coding undefined 19 - 17390509
## end_position hgnc_symbol uniprot_gn_symbol transcript
## ENSG00000011105 3286564 TSPAN9 TSPAN9 ENSG00000011105.1
## ENSG00000087237 56983845 CETP CETP ENSG00000087237.6
## ENSG00000089127 112933222 OAS1 OAS1 ENSG00000089127.2
## ENSG00000106211 76304295 HSPB1 HSPB1 ENSG00000106211.7
## ENSG00000108679 78979947 LGALS3BP LGALS3BP ENSG00000108679.1
## ENSG00000108774 42155044 RAB5C RAB5C ENSG00000108774.5
## ENSG00000115267 162318684 IFIH1 IFIH1 ENSG00000115267.1
## ENSG00000117228 89065230 GBP1 GBP1 ENSG00000117228.1
## ENSG00000118785 87983426 SPP1 SPP1 ENSG00000118785.5
## ENSG00000123689 209676390 G0S2 G0S2 ENSG00000123689.5
## ENSG00000126709 27672198 IFI6 IFI6 ENSG00000126709.8
## ENSG00000130203 44909393 APOE APOE ENSG00000130203.5
## ENSG00000135047 87731469 CTSL CTSL ENSG00000135047.1
## ENSG00000136235 23275108 GPNMB GPNMB ENSG00000136235.1
## ENSG00000136689 113134016 IL1RN IL1RN ENSG00000136689.1
## ENSG00000137965 78664078 IFI44 IFI44 ENSG00000137965.1
## ENSG00000138646 88506163 HERC5 HERC5 ENSG00000138646.1
## ENSG00000141574 82334074 SECTM1 SECTM1 ENSG00000141574.8
## ENSG00000145431 156971799 PDGFC PDGFC ENSG00000145431.1
## ENSG00000152689 33564750 RASGRP3 RASGRP3 ENSG00000152689.1
## ENSG00000152778 89420997 IFIT5 IFIT5 ENSG00000152778.5
## ENSG00000154451 89272804 GBP5 GBP5 ENSG00000154451.1
## ENSG00000157680 137847092 DGKI DGKI ENSG00000157680.6
## ENSG00000160932 143023832 LY6E LY6E ENSG00000160932.5
## ENSG00000161055 180591499 SCGB3A1 SCGB3A1 ENSG00000161055.4
## ENSG00000162645 89150456 GBP2 GBP2 ENSG00000162645.4
## ENSG00000162654 89198942 GBP4 GBP4 ENSG00000162654.7
## ENSG00000165949 94116698 IFI27 IFI27 ENSG00000165949.4
## ENSG00000173762 82317608 CD7 CD7 ENSG00000173762.8
## ENSG00000174276 65117701 ZNHIT2 ZNHIT2 ENSG00000174276.6
## ENSG00000177989 50532580 ODF3B ODF3B ENSG00000177989.5
## ENSG00000179455 23630075 MKRN3 MKRN3 ENSG00000179455.1
## ENSG00000187569 7717559 DPPA3 DPPA3 ENSG00000187569.3
## ENSG00000187608 1014540 ISG15 ISG15 ENSG00000187608.4
## ENSG00000196141 200482264 SPATS2L SPATS2L ENSG00000196141.5
## ENSG00000198178 7751605 CLEC4C CLEC4C ENSG00000198178.1
## ENSG00000198286 3043945 CARD11 CARD11 ENSG00000198286.3
## ENSG00000203907 73395133 OOEP OOEP ENSG00000203907.1
## ENSG00000204291 99070792 COL15A1 COL15A1 ENSG00000204291.1
## ENSG00000204632 29831125 HLA-G HLA-G ENSG00000204632.5
## ENSG00000214872 57550274 SMTNL1 SMTNL1 ENSG00000214872.3
## ENSG00000221963 35668404 APOL6 APOL6 ENSG00000221963.5
## ENSG00000231767 192716653 ENSG00000231767.2
## ENSG00000269720 17394158 CCDC194 CCDC194 ENSG00000269720.1
## mean_cds_len
## ENSG00000011105 559.833333333333
## ENSG00000087237 1357
## ENSG00000089127 682.8
## ENSG00000106211 431
## ENSG00000108679 445.666666666667
## ENSG00000108774 482.142857142857
## ENSG00000115267 2235
## ENSG00000117228 1779
## ENSG00000118785 863
## ENSG00000123689 312
## ENSG00000126709 405
## ENSG00000130203 766.75
## ENSG00000135047 894
## ENSG00000136235 1447.5
## ENSG00000136689 484.2
## ENSG00000137965 716.666666666667
## ENSG00000138646 2532
## ENSG00000141574 400.375
## ENSG00000145431 550.25
## ENSG00000152689 906.555555555556
## ENSG00000152778 1449
## ENSG00000154451 1605.5
## ENSG00000157680 2916.6
## ENSG00000160932 306.642857142857
## ENSG00000161055 315
## ENSG00000162645 1776
## ENSG00000162654 1923
## ENSG00000165949 287.454545454545
## ENSG00000173762 461
## ENSG00000174276 885.5
## ENSG00000177989 632.5
## ENSG00000179455 547.166666666667
## ENSG00000187569 480
## ENSG00000187608 467.666666666667
## ENSG00000196141 950.952380952381
## ENSG00000198178 510.5
## ENSG00000198286 1499.66666666667
## ENSG00000203907 312
## ENSG00000204291 4146
## ENSG00000204632 835.5
## ENSG00000214872 1429.5
## ENSG00000221963 1032
## ENSG00000231767 469.5
## ENSG00000269720 705
Vennerable::plot(compare)Now that I have performed all of the above, I think it should be possible to have a working analysis using dream that includes celltype, visitnumber, finaloutcome, donor, and perhaps SVs.
mixed_fstring <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor)"
mixed_formula <- as.formula(mixed_fstring)
mixed_fstring_svs <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor) + svaseq_SV1 + svaseq_SV2 + svaseq_SV3 + svaseq_SV4"
mixed_formula_svs <- as.formula(mixed_fstring_svs)
all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula)
mixed_all_celltypes_de_xlsx <- write_de_table(all_dream_de, type = "limma", excel = glue("excel/mixed_all_celltypes_nobiop_table-v{ver}.xlsx"))
all_dream_result <- all_dream_de[["all_tables"]][["failure_vs_cure"]] %>%
arrange(desc(logFC))
fc_sig_idx <- all_dream_result[["logFC"]] >= 1.0 & all_dream_result[["z.std"]] >= 2.0
dream_sig <- rownames(all_dream_result[fc_sig_idx, ])
svs_all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula_svs)
test <- hpgl_padjust(svs_all_dream_de[["all_tables"]][["failure_vs_cure"]]
mean_column = "AveExpr", method = "ihw", type = "limma")t_clinical_outcomecell_fact <- paste0(pData(t_clinical_nobiop)[["finaloutcome"]], "_",
pData(t_clinical_nobiop)[["typeofcells"]])
t_clinical_outcomecell <- t_clinical_nobiop
pData(t_clinical_outcomecell)[["outcomecell"]] <- t_clinical_outcomecell_fact
t_clinical_outcomecell <- set_expt_conditions(t_clinical_outcomecell, fact = "outcomecell")## The numbers of samples by condition are:
##
## cure_eosinophils cure_monocytes cure_neutrophils failure_eosinophils
## 17 21 20 9
## failure_monocytes failure_neutrophils
## 21 21
t_clinical_outcomecell_de <- all_pairwise(t_clinical_outcomecell, keepers = outcometype_contrasts,
model_batch = "svaseq")##
## cure_eosinophils cure_monocytes cure_neutrophils failure_eosinophils
## 17 21 20 9
## failure_monocytes failure_neutrophils
## 21 21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## edger, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## edger, cure_neutrophils_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## limma, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## limma, cure_neutrophils_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## noiseq, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## noiseq, cure_neutrophils_vs_cure_eosinophils failed.
mixed_fstring <- "~ 0 + condition + visitnumber + (1|donor)"
t_clinical_outcomecell_dream <- dream_pairwise(t_clinical_outcomecell,
alt_model = as.formula(mixed_fstring),
keepers = outcometype_contrasts)## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
t_clinical_outcomecell_table <- write_de_table(t_clinical_outcomecell_dream,
type = "limma",
excel = glue("excel/mixed_clinical_outcomecell-v{ver}.xlsx"))big_table <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, all_dream_result, by = "row.names")## Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'merge': object 'all_dream_result' not found
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value##
## Pearson's product-moment correlation
##
## data: merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 177, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8601 0.8697
## sample estimates:
## cor
## 0.865
test_aucc <- calculate_aucc(big_table, tbl2 = monocyte_dream_result,
px = "deseq_adjp", py = "adj.P.Val",
lx = "deseq_logfc", ly = "logFC")
test_aucc## These two tables have an aucc value of: 0.215501890577542 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 80, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5952 0.6190
## sample estimates:
## cor
## 0.6072
logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
xlab("Dream log2FC with (1|donor) and visit in model") +
ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "images/compare_cf_and_dream_clinical_samples.png")
logfc_plot
dev.off()## png
## 2
logfc_plotcor_value <- cor.test(merged[["P.Value"]], merged[["deseq_adjp"]], method = "spearman")## Warning in cor.test.default(merged[["P.Value"]], merged[["deseq_adjp"]], :
## Cannot compute exact p-value with ties
cor_value##
## Spearman's rank correlation rho
##
## data: merged[["P.Value"]] and merged[["deseq_adjp"]]
## S = 8.3e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5745
adjp_plotter <- plot_linear_scatter(merged[, c("P.Value", "deseq_adjp")])## Warning in lmrob.S(x, y, control = control): S refinements did not converge (to
## refine.tol=1e-07) in 200 (= k.max) steps
## Warning in lmrob.fit(x, y, control, init = init): initial estim. 'init' not
## converged -- will be return()ed basically unchanged
adjp_plot <- adjp_plotter[["scatter"]] +
xlab("DESeq2 adjp: Dream not-adjusted p-value") +
ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()## png
## 2
adjp_plotprevious_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)## Mode FALSE TRUE
## logical 10416 116
previous_genes <- rownames(merged)[previous_sig_idx]
new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)## Mode FALSE TRUE
## logical 10467 65
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]
annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]## [1] ensembl_gene_id ensembl_transcript_id version
## [4] transcript_version description gene_biotype
## [7] cds_length chromosome_name strand
## [10] start_position end_position hgnc_symbol
## [13] uniprot_gn_symbol transcript mean_cds_len
## <0 rows> (or 0-length row.names)
Let us use the overlap_sig() from above to see how similar this result is to our DESeq2+SVA.
all_dream_table <- all_dream_de[["all_tables"]][["failure_vs_cure"]]## Error in eval(expr, envir, enclos): object 'all_dream_de' not found
overlap_sig(all_dream_table)## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
overlap_sig(all_dream_table, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
all_dream_table_svs <- svs_all_dream_de[["all_tables"]][["failure_vs_cure"]]## Error in eval(expr, envir, enclos): object 'svs_all_dream_de' not found
overlap_sig(all_dream_table_svs)## Error in eval(expr, envir, enclos): object 'all_dream_table_svs' not found
overlap_sig(all_dream_table_svs, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)## Error in eval(expr, envir, enclos): object 'all_dream_table_svs' not found
One figure I did not create is a venn diagram showing the overlap of the eosionphil, neutrophil, and monocyte results and the 10 genes shared among them all. At least in theory I should be easily able to create a similar/identical plot.
observed_eosinophils <- c(
rownames(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_monocytes <- c(
rownames(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_neutrophils <- c(
rownames(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
rownames(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
venn_input <- list(
"eosinophil" = observed_eosinophils,
"monocyte" = observed_monocytes,
"neutrophils" = observed_neutrophils)
shared <- Vennerable::Venn(venn_input)
shared## A Venn object on 3 sets named
## eosinophil,monocyte,neutrophils
## 000 100 010 110 001 101 011 111
## 0 136 81 10 106 33 9 12
Vennerable::plot(shared)intersect <- "eosinophil:monocyte:neutrophils"
celltype_upset <- UpSetR::upset(UpSetR::fromList(venn_input), text.scale = 2)
celltype_upsetcelltype_shared_genes <- overlap_groups(venn_input)
celltype_geneids <- overlap_geneids(celltype_shared_genes, intersect)
ids <- attr(celltype_shared_genes, "elements")[celltype_shared_genes[[intersect]]]
ids## eosinophil4 eosinophil6 eosinophil7 eosinophil9
## "ENSG00000089012" "ENSG00000137959" "ENSG00000115155" "ENSG00000165949"
## eosinophil23 eosinophil24 eosinophil28 eosinophil41
## "ENSG00000186654" "ENSG00000248405" "ENSG00000188672" "ENSG00000177294"
## eosinophil46 eosinophil52 eosinophil54 eosinophil120
## "ENSG00000134321" "ENSG00000214872" "ENSG00000184979" "ENSG00000196526"
rows <- fData(t_monocytes)[ids, ]
rows[["hgnc_symbol"]]## [1] "SIRPG" "IFI44L" "OTOF" "IFI27" "PRR5"
## [6] "PRR5-ARHGAP8" "RHCE" "FBXO39" "RSAD2" "SMTNL1"
## [11] "USP18" "AFAP1"
Note to self, when I rendered the html, stupid R ran out of temp files and so did not actually print the darn html document, as a result I modified the render function to try to make sure there is a clean directory in which to work; testing now. If it continues to not work, I will need to remove some of the images created in this document.
Maria Adelaida has asked about the distribution of (non)adjusted p-values produced by the various methods we employed. I use BH by default; so lets take a moment to examine the distribution of p-values and how they get adjusted by BH and a few of the other methods.
dream_pvalues <- all_dream_table[["P.Value"]]## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
names(dream_pvalues) <- rownames(all_dream_table)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'all_dream_table' not found
deseq_pvalues <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]][["deseq_p"]]
names(deseq_pvalues) <- rownames(t_cf_clinicalnb_table_sva[["data"]][["outcome"]])
## Note, my xlsx files provide these images.
plot_histogram(dream_pvalues)## Error in eval(expr, envir, enclos): object 'dream_pvalues' not found
plot_histogram(deseq_pvalues)Immediately we see that the values produced have very different distributions and that, though there are many low p-values produced by dream, they are far fewer than observed by deseq.
Now consider the BH correction; using it, we rank order the p-values from lowest to highest. Then we choose a denominator for every p-value which ranges from 1 to the number of elements in the set of p-values. Finally we take the minimum between 1 and the cumulative minimum of (#pvalues/denominator) * that-pvalue. Written out the process looks like this:
test_pvalues <- deseq_pvalues
idx <- order(test_pvalues)
test_pvalues <- test_pvalues[idx]
num_pvalues <- length(test_pvalues)
new_pvalues <- test_pvalues
for (i in seq_along(test_pvalues)) {
element <- test_pvalues[i]
new_pvalues[i] <- min(1, cummin((num_pvalues / i) * element))
}
test_against <- p.adjust(test_pvalues, method = "BH")So, consider for a moment the first p-values produced by deseq: 1.195e-24, 3.489e-22, 9.612e-22, 4.853e-18, 9.864e-15, 3.275e-14
The new p-values will be the (number of genes / the current position) * the current element
In contrast, consider the first few values from dream ordered in the same fashion: 2.162e-07, 3.757e-05, 8.119e-05, 1.664e-04, 3.123e-04, 5.600e-04
These start at values which are 1e17 higher than those from DESeq and so we can expect the resulting values to end up starting at ~ 5e11 higher than similar values. Thus when we do the math (and be amused at the fact that the number of p-values in the table is a factor of 2,3,4,5,6):
11910 * 2.16e-07: 0.002573 5955 * 3.757e-5: 0.223711 3970 * 8.119e-5: 0.322297 2978 * 1.664e-4: 0.4955 2382 * 3.123e-4: 0.743836 1985 * 5.600e-4: 1.112 which is caught by pmin() and reset to 1.
Having performed all of the above, let us plot some of the results with a few labels of the top-10 genes on each side of the contrasts.
num_color <- color_choices[["clinic_cf"]][["tumaco_failure"]]
den_color <- color_choices[["clinic_cf"]][["tumaco_cure"]]
cf_monocyte_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
cf_monocyte_volcano <- plot_volcano_condition_de(
cf_monocyte_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_monocyte_volcano_labeled.svg")
cf_monocyte_volcano[["plot"]]
dev.off()## png
## 2
cf_monocyte_volcano[["plot"]]cf_monocyte_volcano_top10 <- plot_volcano_condition_de(
cf_monocyte_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_monocyte_volcano_labeled_top10-v{ver}.svg"))
cf_monocyte_volcano_top10[["plot"]]
dev.off()## png
## 2
cf_monocyte_volcano_top10[["plot"]]cf_eosinophil_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
cf_eosinophil_volcano <- plot_volcano_condition_de(
cf_eosinophil_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_eosinophil_volcano_labeled.svg")
cf_eosinophil_volcano[["plot"]]
dev.off()## png
## 2
cf_eosinophil_volcano[["plot"]]cf_eosinophil_volcano_top10 <- plot_volcano_condition_de(
cf_eosinophil_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_eosinophil_volcano_labeled_top10-v{ver}.svg"))
cf_eosinophil_volcano_top10[["plot"]]
dev.off()## png
## 2
cf_eosinophil_volcano_top10[["plot"]]cf_neutrophil_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
cf_neutrophil_volcano <- plot_volcano_condition_de(
cf_neutrophil_table, "outcome", label = expected_genes,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_neutrophil_volcano_labeled.svg")
cf_neutrophil_volcano[["plot"]]
dev.off()## png
## 2
cf_neutrophil_volcano[["plot"]]cf_neutrophil_volcano_top10 <- plot_volcano_condition_de(
cf_neutrophil_table, "outcome", label = 10,
fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_neutrophil_volcano_labeled_top10-v{ver}.svg"))
cf_neutrophil_volcano_top10[["plot"]]
dev.off()## png
## 2
cf_neutrophil_volcano_top10[["plot"]]t_cf_eosinophil_v1_de_sva <- all_pairwise(tv1_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 5 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v1_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8703
## basic_vs_dream 0.8892
## basic_vs_ebseq 0.8820
## basic_vs_edger 0.8716
## basic_vs_limma 0.9207
## basic_vs_noiseq 0.8842
## deseq_vs_dream 0.8326
## deseq_vs_ebseq 0.8647
## deseq_vs_edger 0.9996
## deseq_vs_limma 0.8464
## deseq_vs_noiseq 0.8575
## dream_vs_ebseq 0.8401
## dream_vs_edger 0.8359
## dream_vs_limma 0.9865
## dream_vs_noiseq 0.8565
## ebseq_vs_edger 0.8681
## ebseq_vs_limma 0.8493
## ebseq_vs_noiseq 0.9975
## edger_vs_limma 0.8498
## edger_vs_noiseq 0.8613
## limma_vs_noiseq 0.8644
t_cf_eosinophil_v1_table_sva <- combine_de_tables(
t_cf_eosinophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 13 19 11
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v1_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v1_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v1_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 13 19 11
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["ups"]][[1]])## [1] 13 84
dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["downs"]][[1]])## [1] 19 84
t_cf_eosinophil_v2_de_sva <- all_pairwise(tv2_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v2_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8446
## basic_vs_dream 0.8578
## basic_vs_ebseq 0.8616
## basic_vs_edger 0.8465
## basic_vs_limma 0.8936
## basic_vs_noiseq 0.8899
## deseq_vs_dream 0.8310
## deseq_vs_ebseq 0.8802
## deseq_vs_edger 0.9996
## deseq_vs_limma 0.8540
## deseq_vs_noiseq 0.8632
## dream_vs_ebseq 0.7282
## dream_vs_edger 0.8348
## dream_vs_limma 0.9758
## dream_vs_noiseq 0.8017
## ebseq_vs_edger 0.8815
## ebseq_vs_limma 0.7589
## ebseq_vs_noiseq 0.9116
## edger_vs_limma 0.8581
## edger_vs_noiseq 0.8662
## limma_vs_noiseq 0.8158
t_cf_eosinophil_v2_table_sva <- combine_de_tables(
t_cf_eosinophil_v2_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v2_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 9 4 10
## edger_sigdown limma_sigup limma_sigdown
## 1 1 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v2_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v2_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v2_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 10 1 9 4 9
## ebseq_down basic_up basic_down
## outcome 17 0 0
dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["ups"]][[1]])## [1] 9 84
dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["downs"]][[1]])## [1] 4 84
t_cf_eosinophil_v3_de_sva <- all_pairwise(tv3_eosinophils, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## tumaco_cure tumaco_failure
## 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_cf_eosinophil_v3_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## tmc_flr___
## basic_vs_deseq 0.8019
## basic_vs_dream 0.8287
## basic_vs_ebseq 0.8927
## basic_vs_edger 0.8021
## basic_vs_limma 0.8620
## basic_vs_noiseq 0.8925
## deseq_vs_dream 0.8661
## deseq_vs_ebseq 0.8832
## deseq_vs_edger 1.0000
## deseq_vs_limma 0.9137
## deseq_vs_noiseq 0.8126
## dream_vs_ebseq 0.7588
## dream_vs_edger 0.8663
## dream_vs_limma 0.9683
## dream_vs_noiseq 0.8097
## ebseq_vs_edger 0.8833
## ebseq_vs_limma 0.7984
## ebseq_vs_noiseq 0.9402
## edger_vs_limma 0.9138
## edger_vs_noiseq 0.8129
## limma_vs_noiseq 0.7990
t_cf_eosinophil_v3_table_sva <- combine_de_tables(
t_cf_eosinophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v3_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure 68 29 73
## edger_sigdown limma_sigup limma_sigdown
## 1 10 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_cf_eosinophil_v3_sig_sva <- extract_significant_genes(
t_cf_eosinophil_v3_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v3_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome 0 0 73 10 68 29 2
## ebseq_down basic_up basic_down
## outcome 9 0 0
dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["ups"]][[1]])## [1] 68 84
dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["downs"]][[1]])## [1] 29 84
sva_aucc <- calculate_aucc(t_cf_eosinophil_table_sva[["data"]][[1]],
tbl2 = t_cf_eosinophil_table_batchvisit[["data"]][[1]],
py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc## These two tables have an aucc value of: 0.576029928864987 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 152, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.823 0.835
## sample estimates:
## cor
## 0.8291
shared_ids <- rownames(t_cf_eosinophil_table_sva[["data"]][[1]]) %in%
rownames(t_cf_eosinophil_table_batchvisit[["data"]][[1]])
first <- t_cf_eosinophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_eosinophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 152, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.823 0.835
## sample estimates:
## cor
## 0.8291
t_mono_neut_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_neutrophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_mono_neut_sva_aucc## These two tables have an aucc value of: 0.204316386168083 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 43, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4028 0.4376
## sample estimates:
## cor
## 0.4203
t_mono_eo_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_mono_eo_sva_aucc## These two tables have an aucc value of: 0.0963678364630121 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 22, df = 9765, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2015 0.2393
## sample estimates:
## cor
## 0.2205
t_neut_eo_sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][["outcome"]],
tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
py = "deseq_adjp", ly = "deseq_logfc")
t_neut_eo_sva_aucc## These two tables have an aucc value of: 0.20148477670576 and correlation:
##
## Pearson's product-moment correlation
##
## data: tbl[[lx]] and tbl[[ly]]
## t = 42, df = 8571, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3973 0.4323
## sample estimates:
## cor
## 0.415
For these contrasts, we want to see fail_v1 vs. cure_v1, fail_v2 vs. cure_v2 etc. As a result, we will need to juggle the data slightly and add another set of contrasts.
t_visit_cf_all_de_sva <- all_pairwise(t_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 30 24 20 15 17 17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_all_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_all_table_sva <- combine_de_tables(
t_visit_cf_all_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/t_all_visitcf_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_all_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 26 76 26 58
## 2 v2_failure_vs_v2_cure 51 41 43 28
## 3 v3_failure_vs_v3_cure 77 32 33 25
## limma_sigup limma_sigdown
## 1 9 17
## 2 3 0
## 3 3 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_all_sig_sva <- extract_significant_genes(
t_visit_cf_all_table_sva,
excel = glue("{cf_prefix}/t_all_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_all_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 9 17 26 58 26 76 0
## v2cf 3 0 43 28 51 41 0
## v3cf 3 0 33 25 77 32 1
## ebseq_down basic_up basic_down
## v1cf 37 0 0
## v2cf 0 0 0
## v3cf 0 0 0
In the following block, I am including all samples for the monocytes and splitting them up by visit and then comparing v1 cure/fail, v2 cure/fail, v3 cure/fail.
I expect that this should be more robust than the datasets of only visit 1.
visitcf_factor <- paste0("v", pData(t_monocytes)[["visitnumber"]], "_",
pData(t_monocytes)[["finaloutcome"]])
t_monocytes_visitcf <- set_expt_conditions(t_monocytes, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 6 7
t_visit_cf_monocyte_de_sva <- all_pairwise(t_monocytes_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 6 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_monocyte_table_sva <- combine_de_tables(
t_visit_cf_monocyte_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_table_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 15 10 10 13
## 2 v2_failure_vs_v2_cure 0 0 0 0
## 3 v3_failure_vs_v3_cure 0 0 0 0
## limma_sigup limma_sigdown
## 1 1 1
## 2 0 0
## 3 0 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_monocyte_sig_sva <- extract_significant_genes(
t_visit_cf_monocyte_table_sva,
excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_monocyte_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 1 10 13 15 10 0
## v2cf 0 0 0 0 0 0 1
## v3cf 0 0 0 0 0 0 0
## ebseq_down basic_up basic_down
## v1cf 15 0 0
## v2cf 5 0 0
## v3cf 1 0 0
t_v1fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v1cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v1_maplot.png")
t_v1fc_deseq_ma
closed <- dev.off()
t_v1fc_deseq_mat_v2fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v2cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v2_maplot.png")
t_v2fc_deseq_ma
closed <- dev.off()
t_v2fc_deseq_mat_v3fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v3cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v3_maplot.png")
t_v3fc_deseq_ma
closed <- dev.off()
t_v3fc_deseq_maOne query from Alejandro is to look at the genes shared up/down across visits. I am not entirely certain we have enough samples for this to work, but let us find out.
I am thinking this is a good place to use the AUCC curves I learned about thanks to Julie Cridland.
Note that the following is all monocyte samples, this should therefore potentially be moved up and a version of this with only the Tumaco samples put here?
v1cf <- t_visit_cf_monocyte_table_sva[["data"]][["v1cf"]]
v2cf <- t_visit_cf_monocyte_table_sva[["data"]][["v2cf"]]
v3cf <- t_visit_cf_monocyte_table_sva[["data"]][["v3cf"]]
v1_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v1cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v1cf"]]))
length(v1_sig)## [1] 25
v2_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v2_sig)## [1] 0
v3_sig <- c(
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v3_sig)## [1] 0
t_monocyte_visit_aucc_v2v1 <- calculate_aucc(v1cf, tbl2 = v2cf,
py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v2v1_aucc.png")
t_monocyte_visit_aucc_v2v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v2v1[["plot"]]t_monocyte_visit_aucc_v3v1 <- calculate_aucc(v1cf, tbl2 = v3cf,
py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v3v1_aucc.png")
t_monocyte_visit_aucc_v3v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v3v1[["plot"]]visitcf_factor <- paste0("v", pData(t_neutrophils)[["visitnumber"]], "_",
pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
t_visit_cf_neutrophil_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 8 8 7 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
t_visit_cf_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_neutrophil_table_sva <- combine_de_tables(
t_visit_cf_neutrophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_table_sva-v202501.xlsx before writing the tables.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 12 6 6 6
## 2 v2_failure_vs_v2_cure 2 6 2 3
## 3 v3_failure_vs_v3_cure 2 2 0 2
## limma_sigup limma_sigdown
## 1 1 0
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_neutrophil_sig_sva <- extract_significant_genes(
t_visit_cf_neutrophil_table_sva,
excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_sig_sva-v202501.xlsx before writing the tables.
t_visit_cf_neutrophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 1 0 6 6 12 6 0
## v2cf 0 0 2 3 2 6 1
## v3cf 0 0 0 2 2 2 2
## ebseq_down basic_up basic_down
## v1cf 2 0 0
## v2cf 1 0 0
## v3cf 3 0 0
visitcf_factor <- paste0("v", pData(t_eosinophils)[["visitnumber"]], "_",
pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)## The numbers of samples by condition are:
##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
t_visit_cf_eosinophil_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
filter = TRUE,
methods = methods, keepers = visitcf_contrasts)##
## v1_cure v1_failure v2_cure v2_failure v3_cure v3_failure
## 5 3 6 3 6 3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and edger,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and edger,
## v2_failure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and limma,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and limma,
## v2_failure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and noiseq,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and noiseq,
## v2_failure_vs_v1_cure failed.
t_visit_cf_eosinophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_eosinophil_table_sva <- combine_de_tables(
t_visit_cf_eosinophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_table_sva-v202501.xlsx before writing the tables.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_eosinophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure 9 11 2 3
## 2 v2_failure_vs_v2_cure 4 3 5 2
## 3 v3_failure_vs_v3_cure 14 7 17 2
## limma_sigup limma_sigdown
## 1 0 1
## 2 0 0
## 3 0 0
## Plot describing unique/shared genes in a differential expression table.
t_visit_cf_eosinophil_sig_sva <- extract_significant_genes(
t_visit_cf_eosinophil_table_sva,
excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_sig_sva-v202501.xlsx before writing the tables.
t_visit_cf_eosinophil_sig_sva## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf 0 1 2 3 9 11 4
## v2cf 0 0 5 2 4 3 11
## v3cf 0 0 17 2 14 7 3
## ebseq_down basic_up basic_down
## v1cf 86 0 0
## v2cf 18 0 0
## v3cf 10 0 0
Having put some SL read mapping information in the sample sheet, Maria Adelaida added a new column using it with the putative persistence state on a per-sample basis. One question which arised from that: what differences are observable between the persistent yes vs. no samples on a per-cell-type basis among the visit 3 samples.
First things first, create the datasets.
persistence_expt <- subset_expt(t_clinical, subset = "persistence=='Y'|persistence=='N'") %>%
subset_expt(subset = 'visitnumber==3') %>%
set_expt_conditions(fact = 'persistence')## subset_expt(): There were 123, now there are 83 samples.
## subset_expt(): There were 83, now there are 30 samples.
## The numbers of samples by condition are:
##
## N Y
## 6 24
## persistence_biopsy <- subset_expt(persistence_expt, subset = "typeofcells=='biopsy'")
persistence_monocyte <- subset_expt(persistence_expt, subset = "typeofcells=='monocytes'")## subset_expt(): There were 30, now there are 12 samples.
persistence_neutrophil <- subset_expt(persistence_expt, subset = "typeofcells=='neutrophils'")## subset_expt(): There were 30, now there are 10 samples.
persistence_eosinophil <- subset_expt(persistence_expt, subset = "typeofcells=='eosinophils'")## subset_expt(): There were 30, now there are 8 samples.
See if there are any patterns which look usable.
## All
persistence_norm <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 2767 low-count genes (11389 remaining).
## transform_counts: Found 15 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_norm)[["plot"]]persistence_nb <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 2767 low-count genes (11389 remaining).
## Setting 1544 low elements to zero.
## transform_counts: Found 1544 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_nb)[["plot"]]## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
## norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)[["plot"]]
## Insufficient data
## Monocytes
persistence_monocyte_norm <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 3827 low-count genes (10329 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_norm)[["plot"]]persistence_monocyte_nb <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 3827 low-count genes (10329 remaining).
## Setting 47 low elements to zero.
## transform_counts: Found 47 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_nb)[["plot"]]## Neutrophils
persistence_neutrophil_norm <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 5762 low-count genes (8394 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_norm)[["plot"]]persistence_neutrophil_nb <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 5762 low-count genes (8394 remaining).
## Setting 46 low elements to zero.
## transform_counts: Found 46 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_nb)[["plot"]]## Eosinophils
persistence_eosinophil_norm <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 4126 low-count genes (10030 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_norm)[["plot"]]persistence_eosinophil_nb <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 4126 low-count genes (10030 remaining).
## Setting 25 low elements to zero.
## transform_counts: Found 25 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_nb)[["plot"]]This is pretty sparse and unlikely to yield any interesting results I am thinking.
persistence_de_sva <- all_pairwise(persistence_expt, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## N Y
## 6 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
persistence_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## Y_vs_N
## basic_vs_deseq 0.7178
## basic_vs_dream 0.7992
## basic_vs_ebseq 0.7451
## basic_vs_edger 0.7791
## basic_vs_limma 0.8217
## basic_vs_noiseq 0.9152
## deseq_vs_dream 0.8040
## deseq_vs_ebseq 0.7777
## deseq_vs_edger 0.9605
## deseq_vs_limma 0.8112
## deseq_vs_noiseq 0.7448
## dream_vs_ebseq 0.7899
## dream_vs_edger 0.8695
## dream_vs_limma 0.9789
## dream_vs_noiseq 0.7236
## ebseq_vs_edger 0.7900
## ebseq_vs_limma 0.7876
## ebseq_vs_noiseq 0.8327
## edger_vs_limma 0.8765
## edger_vs_noiseq 0.8002
## limma_vs_noiseq 0.7477
persistence_table_sva <- combine_de_tables(
persistence_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_all_de_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N 55 44 26 49 7
## limma_sigdown
## 1 22
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
persistence_monocyte_de_sva <- all_pairwise(persistence_monocyte, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## N Y
## 2 10
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
persistence_monocyte_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## Y_vs_N
## basic_vs_deseq 0.9237
## basic_vs_dream 0.9683
## basic_vs_ebseq 0.9209
## basic_vs_edger 0.9245
## basic_vs_limma 0.9858
## basic_vs_noiseq 0.9405
## deseq_vs_dream 0.9268
## deseq_vs_ebseq 0.9808
## deseq_vs_edger 1.0000
## deseq_vs_limma 0.9260
## deseq_vs_noiseq 0.9677
## dream_vs_ebseq 0.9180
## dream_vs_edger 0.9277
## dream_vs_limma 0.9821
## dream_vs_noiseq 0.9421
## ebseq_vs_edger 0.9809
## ebseq_vs_limma 0.9239
## ebseq_vs_noiseq 0.9823
## edger_vs_limma 0.9270
## edger_vs_noiseq 0.9685
## limma_vs_noiseq 0.9426
persistence_monocyte_table_sva <- combine_de_tables(
persistence_monocyte_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_monocyte_de_sva-v{ver}.xlsx"))## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_monocyte_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N 1 0 0 1 0
## limma_sigdown
## 1 0
## Only Y_vs_N_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
persistence_neutrophil_de_sva <- all_pairwise(persistence_neutrophil, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## N Y
## 3 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
persistence_neutrophil_de_sva## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## Y_vs_N
## basic_vs_deseq 0.8270
## basic_vs_dream 0.8581
## basic_vs_ebseq 0.9144
## basic_vs_edger 0.8283
## basic_vs_limma 0.8808
## basic_vs_noiseq 0.9393
## deseq_vs_dream 0.9564
## deseq_vs_ebseq 0.7485
## deseq_vs_edger 0.9985
## deseq_vs_limma 0.9407
## deseq_vs_noiseq 0.8211
## dream_vs_ebseq 0.7597
## dream_vs_edger 0.9558
## dream_vs_limma 0.9858
## dream_vs_noiseq 0.8212
## ebseq_vs_edger 0.7601
## ebseq_vs_limma 0.7776
## ebseq_vs_noiseq 0.9725
## edger_vs_limma 0.9408
## edger_vs_noiseq 0.8250
## limma_vs_noiseq 0.8296
persistence_neutrophil_table_sva <- combine_de_tables(
persistence_neutrophil_de_sva, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Persistence/persistence_neutrophil_de_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_neutrophil_table_sva## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N 26 49 17 35 0
## limma_sigdown
## 1 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## There are insufficient samples (1) in the 'N' category.
##persistence_eosinophil_de_sva <- all_pairwise(persistence_eosinophil, filter = TRUE,
#model_batch = "svaseq",
## methods = methods)
##persistence_eosinophil_de_sva
##persistence_eosinophil_table_sva <- combine_de_tables(
## persistence_eosinophil_de_sva,
## excel = glue("{xlsx_prefix}/DE_Persistence/persistence_eosinophil_de_sva-v{ver}.xlsx"))In the following, I am hoping to lower variance associated with factors other than visit via sva and therefore be able to see what genes are changing for everyone with respect to time.
This is the one instance where I think it would be really nice to have biopsy samples for all three visits; I presume that we would have a really nice signal of stuff like keratin and other wound-healing associated genes.
t_visit_all_de_sva <- all_pairwise(t_visit, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## 3 2 1
## 34 35 40
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero
t_visit_all_de_sva## Error in eval(expr, envir, enclos): object 't_visit_all_de_sva' not found
t_visit_all_table_sva <- combine_de_tables(
t_visit_all_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_table_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_all_de_sva' not found
t_visit_all_table_sva## Error in eval(expr, envir, enclos): object 't_visit_all_table_sva' not found
t_visit_all_sig_sva <- extract_significant_genes(
t_visit_all_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_all_table_sva' not found
t_visit_all_sig_sva## Error in eval(expr, envir, enclos): object 't_visit_all_sig_sva' not found
t_visit_monocytes <- set_expt_conditions(t_monocytes, fact = "visitnumber")## The numbers of samples by condition are:
##
## 3 2 1
## 13 13 16
t_visit_monocyte_de_sva <- all_pairwise(t_visit_monocytes, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## 3 2 1
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero
t_visit_monocyte_de_sva## Error in eval(expr, envir, enclos): object 't_visit_monocyte_de_sva' not found
t_visit_monocyte_table_sva <- combine_de_tables(
t_visit_monocyte_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_table_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_monocyte_de_sva' not found
t_visit_monocyte_table_sva## Error in eval(expr, envir, enclos): object 't_visit_monocyte_table_sva' not found
t_visit_monocyte_sig_sva <- extract_significant_genes(
t_visit_monocyte_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_monocyte_table_sva' not found
t_visit_monocyte_sig_sva## Error in eval(expr, envir, enclos): object 't_visit_monocyte_sig_sva' not found
t_visit_neutrophils <- set_expt_conditions(t_neutrophils, fact = "visitnumber")## The numbers of samples by condition are:
##
## 3 2 1
## 12 13 16
t_visit_neutrophil_de_sva <- all_pairwise(t_visit_neutrophils, filter = TRUE,
model_batch = "svaseq",
methods = methods)##
## 3 2 1
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero
t_visit_neutrophil_de_sva## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_de_sva' not found
t_visit_neutrophil_table_sva <- combine_de_tables(
t_visit_neutrophil_de_sva, keepers = visit_contrasts, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_table_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_de_sva' not found
t_visit_neutrophil_table_sva## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_table_sva' not found
t_visit_neutrophil_sig_sva <- extract_significant_genes(
t_visit_neutrophil_table_sva,
excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_sig_sva-v{ver}.xlsx"))## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_table_sva' not found
t_visit_neutrophil_sig_sva## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_sig_sva' not found
t_visit_eosinophils <- set_expt_conditions(t_eosinophils, fact="visitnumber")
t_visit_eosinophil_de <- all_pairwise(t_visit_eosinophils, filter = TRUE,
model_batch = "svaseq",
methods = methods)
t_visit_eosinophil_de
t_visit_eosinophil_table <- combine_de_tables(
t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_table_sva-v{ver}.xlsx"))
t_visit_eosinophil_table
t_visit_eosinophil_sig <- extract_significant_genes(
t_visit_eosinophil_table,
excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_sig_sva-v{ver}.xlsx"))
## No significant genes observed.## Error: <text>:9:3: unexpected symbol
## 8: t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
## 9: excel
## ^
Alejandro showed some ROC curves for eosinophil data showing sensitivity vs. specificity of a couple genes which were observed in v1 eosinophils vs. all-times eosinophils across cure/fail. I am curious to better understand how this was done and what utility it might have in other contexts.
To that end, I want to try something similar myself. In order to properly perform the analysis with these various tools, I need to reconfigure the data in a pretty specific format:
If I intend to use this for our tx data, I will likely need a utility function to create the properly formatted input df.
For the purposes of my playing, I will choose three genes from the eosinophil C/F table, one which is significant, one which is not, and an arbitrary.
The input genes will therefore be chosen from the data structure: t_cf_eosinophil_table_sva:
ENSG00000198178, ENSG00000179344, ENSG00000182628
eo_rpkm <- normalize_expt(tv1_eosinophils, convert = "rpkm", column = "cds_length")## There appear to be 5355 genes without a length.
This paper is DOI:10.1126/scitranslmed.aax4204
Variable gene expression and parasite load predict treatment outcome in cutaneous leishmaniasis.
One query from Maria Adelaida is to see how this data fits with ours. I have read this paper a couple of times now and I get confused on a couple of points every time, which I will explain in a moment. The expermental design is key to my confusion and key to what I think is being missed in our interpretation of the results:
external_norm <- normalize_expt(external_cf, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
plot_pca(external_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
external_nb <- normalize_expt(external_cf, filter = TRUE, batch = "svaseq",
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
## Setting 171 low elements to zero.
## transform_counts: Found 171 values equal to 0, adding 1 to the matrix.
plot_pca(external_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
external_de <- all_pairwise(external_cf, filter = TRUE, methods = methods,
model_batch = "svaseq")##
## cure failure
## 14 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
external_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.3908
## basic_vs_dream 0.4488
## basic_vs_ebseq 0.9027
## basic_vs_edger 0.3914
## basic_vs_limma 0.4180
## basic_vs_noiseq 0.9604
## deseq_vs_dream 0.8718
## deseq_vs_ebseq 0.4149
## deseq_vs_edger 0.9997
## deseq_vs_limma 0.8487
## deseq_vs_noiseq 0.4412
## dream_vs_ebseq 0.4304
## dream_vs_edger 0.8727
## dream_vs_limma 0.9654
## dream_vs_noiseq 0.4269
## ebseq_vs_edger 0.4177
## ebseq_vs_limma 0.3577
## ebseq_vs_noiseq 0.9407
## edger_vs_limma 0.8497
## edger_vs_noiseq 0.4418
## limma_vs_noiseq 0.3627
external_table <- combine_de_tables(
external_de, scale_p = TRUE,
excel = "excel/scott_table.xlsx")## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
external_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 0 0 0 0
## limma_sigup limma_sigdown
## 1 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
external_sig <- extract_significant_genes(external_table, excel = "excel/scott_sig.xlsx")
external_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure 0 0 0 0 0 0
## ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure 0 0 0 0
external_top100 <- extract_significant_genes(external_table, n = 100)
external_up <- external_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
external_down <- external_top100[["deseq"]][["downs"]][["failure_vs_cure"]]I think I am getting a significantly different result from Scott, so I am going to do an explicit side-by-side comparison of our results at each step. In order to do this, I am using the capsule they kindly provided with their publication.
I am copy/pasting material from their publication with some modification which I will note as I go.
Here is their block ‘r packages’
Note/Spoiler alert: It actually turns out our results are basically relatively similar, I just didn’t understand what comparisons are actually in paper vs those I have primary interest. In addition, we handled gene IDs differently (gene card vs. EnsemblID) which has a surprisingly big effect.
Oh, I just realized that when I did these analyses, I did them in a completely separate tree and compared the results post-facto. This assumption remains in this document and therefore is unlikely to work properly in the containerized environment I am attempting to create. Given that the primary goal of this section is to show to myself that I compared the two datasets as thoroughly as I could, perhaps I should just disable them for the container and allow the reader to perform the exercise de-novo.
library(tidyverse)
library(ggthemes)
library(reshape2)
library(edgeR)
library(patchwork)
library(vegan)
library(DT)
library(tximport)
library(gplots)
library(FinCal)
library(ggrepel)
library(gt)
library(ggExtra)
library(EnsDb.Hsapiens.v86)
library(stringr)
library(cowplot)
library(ggpubr)I have a separate tree in which I copied the capsule and data. I performed exactly their steps kallisto quant steps within it and put the output data into the same place within it. I did change the commands slightly because I downloaded the files from SRA and so don’t have them with names like ‘host_CL01’, but instead ‘PRJNA…’. But the samples are in the same order, so I sent the output files to the same final filenames. Here is an example from the first sample:
cd preprocessing
module add kallisto
kallisto index -i Homo_sapiens.GRCh38.cdna.all.Index Homo_sapiens.GRCh38.cdna.all.fa
# Map reads to the indexed reference transcriptome for HOST
# first the healthy subjects (HS)
export LESS = '--buffers 0 -B'
kallisto quant -i Homo_sapiens.GRCh38.cdna.all.Index -o host_HS01 -t 24 -b 60 \
--single -l 250 -s 30 <(less SRR8668755/*-trimmed.fastq.xz) 2>host_HS01.log 1>&2 &I am going to change the path very slightly in the following block simply because I put the capsule in a separate directory and do not want to copy it here. Otherwise it is unmodified. Also, the function gt::tab_header() annoys the crap out of me.
import <- read_tsv("../scott_2019/capsule-6534016/data/studydesign.txt")
import %>% dplyr::filter(disease == "cutaneous") %>%
dplyr::select(-2) %>% gt() %>%
tab_header(title = md("Clinical metadata from patients with cutaneous leishmaniasis (CL)"),
subtitle = md("`(n=21)`")) %>% cols_align(align = "center", columns = TRUE)
targets.lesion <- import
targets.onlypatients <- targets.lesion[8:28,] # only CL lesions (n=21)
# Making factors that will be used for pairwise comparisons:
# HS vs. CL lesions as a factor:
disease.lesion <- factor(targets.lesion$disease)
# Cure vs. Failure lesions as a factor:
treatment.lesion <- factor(targets.onlypatients$treatment_outcome)They did use a slightly different annotation set, Ensembl revision 86. Once again I am modifying the paths slightly to reflect where I put the capsule.
# capturing Ensembl transcript IDs (tx) and gene symbols ("gene_name") from
# EnsDb.Hsapiens.v86 annotation package
Tx <- as.data.frame(transcripts(EnsDb.Hsapiens.v86,
columns=c(listColumns(EnsDb.Hsapiens.v86, "tx"),
"gene_name")))
Tx <- dplyr::rename(Tx, target_id = tx_id)
row.names(Tx) <- NULL
Tx <- Tx[,c(6,12)]
# getting file paths for Kallisto outputs
paths.all <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.lesion$sample, "abundance.h5")
paths.patients <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.onlypatients$sample, "abundance.h5")
# importing .h5 Kallisto data and collapsing transcript-level data to genes
Txi.lesion.coding <- tximport(paths.all,
type = "kallisto",
tx2gene = Tx,
txOut = FALSE,
ignoreTxVersion = TRUE,
countsFromAbundance = "lengthScaledTPM")
# importing againg, but this time just the CL patients
Txi.lesion.coding.onlypatients <- tximport(paths.patients,
type = "kallisto",
tx2gene = Tx,
txOut = FALSE,
ignoreTxVersion = TRUE,
countsFromAbundance = "lengthScaledTPM")The block ‘visualizationDatasets’ follows unchanged. In the next block I will add another plot or perhaps 2
# First make a DGEList from the counts:
Txi.lesion.coding.DGEList <- DGEList(Txi.lesion.coding$counts)
colnames(Txi.lesion.coding.DGEList$counts) <- targets.lesion$sample
colnames(Txi.lesion.coding$counts) <- targets.lesion$sample
Txi.lesion.coding.DGEList.OP <- DGEList(Txi.lesion.coding.onlypatients$counts)
colnames(Txi.lesion.coding.DGEList.OP) <- targets.onlypatients$sample
# Convert to counts per million:
Txi.lesion.coding.DGEList.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList, log = TRUE)
Txi.lesion.coding.DGEList.OP.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList.OP, log = TRUE)
keepers.coding <- rowSums(Txi.lesion.coding.DGEList.cpm>1)>=7
keepers.coding.OP <- rowSums(Txi.lesion.coding.DGEList.OP.cpm>1)>=7
Txi.lesion.coding.DGEList.filtered <- Txi.lesion.coding.DGEList[keepers.coding,]
Txi.lesion.coding.DGEList.OP.filtered <- Txi.lesion.coding.DGEList.OP[keepers.coding.OP,]
# convert back to cpm:
Txi.lesion.coding.DGEList.LogCPM.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.filtered,
log=TRUE)
Txi.lesion.coding.DGEList.LogCPM.OP.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.OP.filtered,
log=TRUE)
# Normalizing data:
calcNorm1 <- calcNormFactors(Txi.lesion.coding.DGEList.filtered, method = "TMM")
calcNorm2 <- calcNormFactors(Txi.lesion.coding.DGEList.OP.filtered, method = "TMM")
Txi.lesion.coding.DGEList.LogCPM.filtered.norm <- edgeR::cpm(calcNorm1, log=TRUE)
colnames(Txi.lesion.coding.DGEList.LogCPM.filtered.norm) <- targets.lesion$sample
Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm <- edgeR::cpm(calcNorm2, log=TRUE)
colnames(Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm) <- targets.onlypatients$sample
# Raw dataset:
V1 <- as.data.frame(Txi.lesion.coding.DGEList.cpm)
colnames(V1) <- targets.lesion$sample
V1 <- melt(V1)
colnames(V1) <- c("sample","expression")
# Filtered dataset:
V1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered)
colnames(V1.1) <- targets.lesion$sample
V1.1 <- melt(V1.1)
colnames(V1.1) <- c("sample","expression")
# Filtered-normalized dataset:
V1.1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered.norm)
colnames(V1.1.1) <- targets.lesion$sample
V1.1.1 <- melt(V1.1.1)
colnames(V1.1.1) <- c("sample","expression")
# plotting:
ggplot(V1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Raw dataset") +
ggplot(V1.1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Filtered dataset") +
ggplot(V1.1.1, aes(x=sample, y=expression, fill=sample)) +
geom_violin(trim = TRUE, show.legend = TRUE) +
stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
theme_bw() +
theme(legend.position = "none", axis.title=element_text(size=7),
axis.title.x=element_blank(), axis.text=element_text(size=5),
axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(size = 7)) +
ggtitle("Filtered and normalized dataset")The following block in their dataset recreated the matrix without filtering and will use that for differential expression. It is a little hard to follow for me because they subset based on the sample numbers (8 to 28, which if I am not mistaken just drops the healthy samples).
DataNotFiltered_Norm_OP <- calcNormFactors(Txi.lesion.coding.DGEList[,8:28],
method = "TMM")
DataNotFiltered_Norm_log2CPM_OP <- edgeR::cpm(DataNotFiltered_Norm_OP, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM_OP) <- targets.onlypatients$sample
CPM_normData_notfiltered_OP <- 2^(DataNotFiltered_Norm_log2CPM_OP)
#uncomment the next line to produce raw data that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(Txi.lesion.coding$counts, file = "Amorim_GEO_raw.txt", sep = "\t", quote = FALSE)
# Including all the individuals (HS and CL patients) for public domain submission:
DataNotFiltered_Norm <- calcNormFactors(Txi.lesion.coding.DGEList, method = "TMM")
DataNotFiltered_Norm_log2CPM <- edgeR::cpm(DataNotFiltered_Norm, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM) <- targets.lesion$sample
CPM_normData_notfiltered <- 2^(DataNotFiltered_Norm_log2CPM)
#uncomment the next line to produce the normalized data file that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(DataNotFiltered_Norm_log2CPM, "Amorim_GEO_normalized.txt", sep = "\t", quote = FALSE)The following block generated a couple of the figures in the paper and comprise a pretty straightforward PCA. I am going to make a following block containing the same image with the cure/fail visualization using the same method/data.
pca.res <- prcomp(t(Txi.lesion.coding.DGEList.LogCPM.filtered.norm), scale.=F, retx=T)
pc.var <- pca.res$sdev^2
pc.per <- round(pc.var/sum(pc.var)*100, 1)
data.frame <- as.data.frame(pca.res$x)
# Calculate distance between samples by permanova:
allsamples.dist <- vegdist(t(2^Txi.lesion.coding.DGEList.LogCPM.filtered.norm),
method = "bray")
vegan <- adonis2(allsamples.dist~targets.lesion$disease,
data=targets.lesion,
permutations = 999, method="bray")
targets.lesion$disease
ggplot(data.frame, aes(x=PC1, y=PC2, color=factor(targets.lesion$disease))) +
geom_point(size=5, shape=20) +
theme_calc() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 15, vjust = 0.5),
axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
legend.position="none") +
scale_color_manual(values = c("#073F80","#EB512C")) +
annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
vegan[1,5]), size=3, fontface="bold") +
xlab(paste("PC1 -",pc.per[1],"%")) +
ylab(paste("PC2 -",pc.per[2],"%")) +
xlim(-200,110)I just realized that somewhere along the way in creating this container, I messed up this analysis pretty badly:
When I originally did this on my workstation I had an actual 1:1 comparison and saw that our results were quite similar. I need to bring that back into this in order to show that neither we nor they are crazy people.
Either way, I think the main takeaway is that their dataset does not spend much time looking at cure/fail but instead control/infected for a reason.
Note, the fun aspects of the experiment (time to cure, size of lesion, etc) are not annotated in the metadata provided by SRA, but instead may be found in the capsule kindly provided by the lab. As a result, I copied that file into the sample_sheets/ directory and have added it to the expressionset. There is an important caveat, though: I did not include the non-diseased samples for this comparison; as a result the disease metadata factor is boring (e.g. it is only cutaneous).
external_cf[["accession"]] <- pData(external_cf)[["sample"]]
disease_factor <- pData(external_cf)[["disease"]]
table(disease_factor)## disease_factor
## cutaneous
## 21
external_disease <- set_expt_conditions(external_cf, fact = disease_factor)## The numbers of samples by condition are:
##
## cutaneous
## 21
external_l2cpm <- normalize_expt(external_cf, filter = TRUE,
convert = "cpm", transform = "log2")## Removing 7327 low-count genes (14154 remaining).
## transform_counts: Found 165 values equal to 0, adding 1 to the matrix.
plot_pca(external_l2cpm, plot_labels = "repel")## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.
Use the following block if you wish to bring together SRA-downloaded data with the experimental design from the Scott paper. It requires running the blocks above in which I loaded the capsule-derived metadata.
test <- pData(external_cf)
test_import <- as.data.frame(import)
test_import[["accession"]] <- pData(external_cf[["accession"]])
test_merged <- merge(test, import, by = "accession")This is real comparison point to their cure/fail analysis.
I am just copy/pasting their code again, but changing the color factor so that cure is purple, failure is red, and na(uninfected) is black.
The following plot should be the first direct comparison point between the two analysis pipelines. Thus, if you look back a few block at my invocation of plot_pca(external_norm), you will see a green/orange plot which is functionally identical if you note:
With those caveats in mind, it is trivial to find the same relationshipes in the samples. E.g. the bottom red/purple individual samples are in the same relative position as my top orange/green pair. the same 4 samples are relative x-axis outliers (my right green, their left purple). The last 6 samples (my orange, their red) are all in the relative orientation.
I think I can further prove the similarity of our inputs via a direct comparison of the datastructures: Txi.lesion.coding.DGEList.LogCPM.filtered.norm (ugh what a name) vs. external_cf. In order to make that comparison, I need to rename my rows to the genecard IDs and the columns.
their_norm_exprs <- Txi.lesion.coding.DGEList.LogCPM.filtered.norm
my_hgnc_ids <- make.names(fData(external_cf)[["hgnc_symbol"]], unique = TRUE)
my_renamed <- set_expt_genenames(external_cf, ids = my_hgnc_ids)
my_norm <- normalize_expt(my_renamed, filter = TRUE, transform = "log2", convert = "cpm")
my_norm_exprs <- as.data.frame(exprs(my_norm))
our_exprs <- merge(their_norm_exprs, my_norm_exprs, by = "row.names")
rownames(our_exprs) <- our_exprs[["Row.names"]]
our_exprs[["Row.names"]] <- NULL
dim(our_exprs)
## I fully expected a correlation heatmap of the combined
## data to show a set of paired samples across the board.
## That is absolutely not true.
correlations <- plot_corheat(our_exprs)
correlations[["scatter"]]
correlations[["plot"]]color_fact <- factor(targets.lesion$treatment_outcome)
levels(color_fact)
## Added by atb to see cure/fail on the same dataset
ggplot(data.frame, aes(x=PC1, y=PC2, color=color_fact)) +
geom_point(size=5, shape=20) +
theme_calc() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 15, vjust = 0.5),
axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
legend.position="none") +
scale_color_manual(values = c("purple", "red","black")) +
annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
vegan[1,5]), size=3, fontface="bold") +
xlab(paste("PC1 -",pc.per[1],"%")) +
ylab(paste("PC2 -",pc.per[2],"%")) +
xlim(-200,110)The following is their comparison of healthy tissue vs. CL lesion and Failure vs. Cure. I am going to follow it with my analagous examination using limma. Note, each of the pairs of variables created in the following block is xxx followed by xxx.treat; the former is healthy vs lesion and the latter is the fail vs cure set.
# Model matrices:
# CL lesions vs. HS:
design.lesion <- model.matrix(~0 + disease.lesion)
colnames(design.lesion) <- levels(disease.lesion)
# Failure vs. Cure:
design.lesion.treatment <- model.matrix(~0 + treatment.lesion)
colnames(design.lesion.treatment) <- levels(treatment.lesion)
myDGEList.lesion.coding <- DGEList(calcNorm1$counts)
myDGEList.OP.NotFil <- DGEList(CPM_normData_notfiltered_OP)
# Model mean-variance trend and fit linear model to data.
# Use VOOM function from Limma package to model the mean-variance relationship
normData.lesion.coding <- voom(myDGEList.lesion.coding, design.lesion)
normData.OP.NotFil <- voom(myDGEList.OP.NotFil, design.lesion.treatment)
colnames(normData.lesion.coding) <- targets.lesion$sample
colnames(normData.OP.NotFil) <- targets.onlypatients$sample
# fit a linear model to your data
fit.lesion.coding <- lmFit(normData.lesion.coding, design.lesion)
fit.lesion.coding.treatment <- lmFit(normData.OP.NotFil, design.lesion.treatment)
# contrast matrix
contrast.matrix.lesion <- makeContrasts(CL.vs.CON = cutaneous - control,
levels=design.lesion)
contrast.matrix.lesion.treat <- makeContrasts(failure.vs.cure = failure - cure,
levels=design.lesion.treatment)
# extract the linear model fit
fits.lesion.coding <- contrasts.fit(fit.lesion.coding,
contrast.matrix.lesion)
fits.lesion.coding.treat <- contrasts.fit(fit.lesion.coding.treatment,
contrast.matrix.lesion.treat)
# get bayesian stats for your linear model fit
ebFit.lesion.coding <- eBayes(fits.lesion.coding)
ebFit.lesion.coding.treat <- eBayes(fits.lesion.coding.treat)
# TopTable ----
allHits.lesion.coding <- topTable(ebFit.lesion.coding,
adjust ="BH", coef=1,
number=34935, sort.by="logFC")
allHits.lesion.coding.treat <- topTable(ebFit.lesion.coding.treat,
adjust ="BH", coef=1,
number=34776, sort.by="logFC")
myTopHits <- rownames_to_column(allHits.lesion.coding, "geneID")
myTopHits.treat <- rownames_to_column(allHits.lesion.coding.treat, "geneID")
# mutate the format of numeric values:
myTopHits <- mutate(myTopHits, log10Pval = round(-log10(adj.P.Val),2),
adj.P.Val = round(adj.P.Val, 2),
B = round(B, 2),
AveExpr = round(AveExpr, 2),
t = round(t, 2),
logFC = round(logFC, 2),
geneID = geneID)
myTopHits.treat <- mutate(myTopHits.treat, log10Pval = round(-log10(adj.P.Val),2),
adj.P.Val = round(adj.P.Val, 2),
B = round(B, 2),
AveExpr = round(AveExpr, 2),
t = round(t, 2),
logFC = round(logFC, 2),
geneID = geneID)
#save(myTopHits, file = "myTopHits")
#save(myTopHits.treat, file = "myTopHits.treat")my_filt <- normalize_expt(my_renamed, filter = "simple")
limma_cf <- limma_pairwise(my_filt, model_batch = FALSE)
my_table <- limma_cf[["all_tables"]][["failure_vs_cure"]]
their_table <- myTopHits.treat
dim(my_table)
dim(myTopHits.treat)
our_table <- merge(my_table, myTopHits.treat, by.x = "row.names", by.y = "geneID")
dim(our_table)
comparison <- plot_linear_scatter(our_table[, c("logFC.x", "logFC.y")])
comparison$scatter
comparison$correlation
comparison$lm_modelOk, so there is a constituitive difference in our results, and it is significant. What does that mean for the set of genes observed?
With that said, in my most recent manual run of this, the results are quite good, I got a 0.75 correlation; I bet the primary outliers (on the axes) are just genes for which we got different gene<->tx mappings due to me using hisat and their usage of kallisto.
I guess I can test this hypothesis by just swapping in their counts into my data structure.
test_counts <- as.data.frame(myDGEList.lesion.coding[["counts"]])
test_counts[["host_HS01"]] <- NULL
test_counts[["host_HS02"]] <- NULL
test_counts[["host_HS03"]] <- NULL
test_counts[["host_HS04"]] <- NULL
test_counts[["host_HS05"]] <- NULL
test_counts[["host_HS06"]] <- NULL
test_counts[["host_HS07"]] <- NULL
dim(test_counts)
dim(exprs(my_test))
## Oh, that surprises me, the kallisto data has ~ 6k fewer genes?only_tmrc3 <- subset_expt(tmrc3_external, subset = "condition=='Colombia'") %>%
set_expt_conditions(fact = "finaloutcome")## subset_expt(): There were 39, now there are 18 samples.
## The numbers of samples by condition are:
##
## failure cure
## 5 13
only_tmrc3_de <- all_pairwise(only_tmrc3, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## failure cure
## 5 13
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
only_tmrc3_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7781
## basic_vs_dream 0.9001
## basic_vs_ebseq 0.7965
## basic_vs_edger 0.8963
## basic_vs_limma 0.9061
## basic_vs_noiseq 0.9366
## deseq_vs_dream 0.7191
## deseq_vs_ebseq 0.8921
## deseq_vs_edger 0.9247
## deseq_vs_limma 0.7154
## deseq_vs_noiseq 0.7816
## dream_vs_ebseq 0.7621
## dream_vs_edger 0.8361
## dream_vs_limma 0.9890
## dream_vs_noiseq 0.8748
## ebseq_vs_edger 0.9223
## ebseq_vs_limma 0.7494
## ebseq_vs_noiseq 0.8417
## edger_vs_limma 0.8311
## edger_vs_noiseq 0.8987
## limma_vs_noiseq 0.8629
only_tmrc3_table <- combine_de_tables(only_tmrc3_de, scale_p = TRUE)## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
only_tmrc3_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 27 26 28 15
## limma_sigup limma_sigdown
## 1 1 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
only_tmrc3_top100 <- extract_significant_genes(only_tmrc3_table, n = 100)
only_tmrc3_up <- only_tmrc3_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
only_tmrc3_down <- only_tmrc3_top100[["deseq"]][["downs"]][["failure_vs_cure"]]
tmrc3_external_de <- all_pairwise(tmrc3_external, model_batch = "svaseq",
filter = "simple",
methods = methods)##
## Brazil Colombia
## 21 18
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tmrc3_external_table <- combine_de_tables(
tmrc3_external_de, scale_p = TRUE,
excel = "excel/tmrc3_scott_biopsies.xlsx")## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_sig <- extract_significant_genes(
tmrc3_external_table, excel = "excel/tmrc3_scott_biopsies_sig.xlsx")
tmrc3_external_cf <- set_expt_conditions(tmrc3_external, fact = "finaloutcome")## The numbers of samples by condition are:
##
## failure cure
## 12 27
tmrc3_external_cf <- set_expt_batches(tmrc3_external_cf, fact = "lab")## The number of samples by batch are:
##
## Brazil Colombia
## 21 18
tmrc3_external_cf_norm <- normalize_expt(tmrc3_external_cf, filter = TRUE,
norm = "quant", convert = "cpm", transform = "log2")## Removing 6904 low-count genes (14577 remaining).
## transform_counts: Found 18 values equal to 0, adding 1 to the matrix.
plot_pca(tmrc3_external_cf_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.
tmrc3_external_cf_nb <- normalize_expt(tmrc3_external_cf, filter = TRUE,
batch = "svaseq", convert = "cpm", transform = "log2")## Removing 6904 low-count genes (14577 remaining).
## Setting 1515 low elements to zero.
## transform_counts: Found 1515 values equal to 0, adding 1 to the matrix.
plot_pca(tmrc3_external_cf_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.
tmrc3_external_cf_de <- all_pairwise(tmrc3_external_cf, model_batch = "svaseq",
filter = TRUE,
methods = methods)##
## failure cure
## 12 27
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
tmrc3_external_cf_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
## falr_vs_cr
## basic_vs_deseq 0.7725
## basic_vs_dream 0.9080
## basic_vs_ebseq 0.8250
## basic_vs_edger 0.8165
## basic_vs_limma 0.9167
## basic_vs_noiseq 0.9416
## deseq_vs_dream 0.8238
## deseq_vs_ebseq 0.9092
## deseq_vs_edger 0.9500
## deseq_vs_limma 0.7961
## deseq_vs_noiseq 0.8259
## dream_vs_ebseq 0.8159
## dream_vs_edger 0.8854
## dream_vs_limma 0.9769
## dream_vs_noiseq 0.8648
## ebseq_vs_edger 0.9177
## ebseq_vs_limma 0.7869
## ebseq_vs_noiseq 0.9009
## edger_vs_limma 0.8568
## edger_vs_noiseq 0.8677
## limma_vs_noiseq 0.8497
tmrc3_external_cf_table <- combine_de_tables(
tmrc3_external_cf_de, scale_p = TRUE,
excel = "excel/tmrc3_scott_cf_table.xlsx")## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_cf_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure 37 127 38 91
## limma_sigup limma_sigdown
## 1 7 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
tmrc3_external_cf_sig <- extract_significant_genes(
tmrc3_external_cf_table, excel = "excel/tmrc3_scott_cf_sig.xlsx")
tmrc3_external_cf_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure 7 0 38 91 37 127
## ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure 3 6 0 0
tmrc3_external_species <- set_expt_conditions(tmrc3_external, fact = "ParasiteSpecies") %>%
set_expt_colors(color_choices[["parasite"]])## The numbers of samples by condition are:
##
## lvbraziliensis lvpanamensis notapplicable
## 22 14 3
## Warning in set_expt_colors(., color_choices[["parasite"]]): Colors for the
## following categories are not being used: lvguyanensis.
Let us look at the top/bottom 100 genes of these two datasets and see if they have any similarities.
Note to self, set up s4 dispatch on compare_de_tables!
compared <- compare_de_tables(only_tmrc3_table, external_table, first_table = 1, second_table = 1)
compared$scattercompared$correlation##
## Pearson's product-moment correlation
##
## data: df[[xcol]] and df[[ycol]]
## t = 14, df = 13240, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1033 0.1368
## sample estimates:
## cor
## 0.1201
I assume this request came out of the review process, but I am not quite sure where to put it. If I understand it correctly, the goal is to look across visits for combinations of cure and fail (not fail/cure, but v2/v1) and across cell types.
Thus, in order to do this, I will need to combine those three parameters or set up a more complex model to handle this.
t_cellvisitcf <- set_expt_conditions(t_clinical_nobiop, fact = "cell_visit_cf")## The numbers of samples by condition are:
##
## eosinophils_1_cure eosinophils_1_failure eosinophils_2_cure
## 5 3 6
## eosinophils_2_failure eosinophils_3_cure eosinophils_3_failure
## 3 6 3
## monocytes_1_cure monocytes_1_failure monocytes_2_cure
## 8 8 7
## monocytes_2_failure monocytes_3_cure monocytes_3_failure
## 6 6 7
## neutrophils_1_cure neutrophils_1_failure neutrophils_2_cure
## 8 8 7
## neutrophils_2_failure neutrophils_3_cure neutrophils_3_failure
## 6 5 7
t_cellvisitcf_de <- all_pairwise(t_cellvisitcf, keepers = visittype_contrasts,
model_batch = "svaseq", filter = TRUE,
methods = methods)##
## eosinophils_1_cure eosinophils_1_failure eosinophils_2_cure
## 5 3 6
## eosinophils_2_failure eosinophils_3_cure eosinophils_3_failure
## 3 6 3
## monocytes_1_cure monocytes_1_failure monocytes_2_cure
## 8 8 7
## monocytes_2_failure monocytes_3_cure monocytes_3_failure
## 6 6 7
## neutrophils_1_cure neutrophils_1_failure neutrophils_2_cure
## 8 8 7
## neutrophils_2_failure neutrophils_3_cure neutrophils_3_failure
## 6 5 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and edger,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and edger,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and edger,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and edger, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and edger, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and edger, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and edger, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and edger, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and edger, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and edger, neutrophils_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and limma,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and limma,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and limma,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and limma, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and limma, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and limma, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and limma, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and limma, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and limma, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and limma, neutrophils_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and noiseq, neutrophils_1_cure_vs_eosinophils_1_cure failed.
t_cellvisitcf_de## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cellvisitcf_mono_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_mono, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_mono_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown
## 1 monocytes_2_cure_vs_monocytes_1_cure 0 2
## 2 monocytes_2_failure_vs_monocytes_1_failure 2 2
## 3 monocytes_3_cure_vs_monocytes_1_cure 1 3
## 4 monocytes_3_failure_vs_monocytes_1_failure 1 3
## edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 0 0 0 0
## 2 1 1 1 0
## 3 0 0 0 0
## 4 0 0 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cellvisitcf_mono_sig <- extract_significant_genes(
t_cellvisitcf_mono_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_mono_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_mono_cure 0 0 0 0 0 2
## v2v1_mono_failure 1 0 1 1 2 2
## v3v1_mono_cure 0 0 0 0 1 3
## v3v1_mono_failure 0 0 0 0 1 3
## ebseq_up ebseq_down basic_up basic_down
## v2v1_mono_cure 0 0 0 0
## v2v1_mono_failure 3 3 0 0
## v3v1_mono_cure 2 1 0 0
## v3v1_mono_failure 0 1 0 0
t_cellvisitcf_neut_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_ne, scale_p = TRUE,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_neut_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown
## 1 neutrophils_2_cure_vs_neutrophils_1_cure 85 132
## 2 neutrophils_2_failure_vs_neutrophils_1_failure 127 150
## 3 neutrophils_3_cure_vs_neutrophils_1_cure 105 195
## 4 neutrophils_3_failure_vs_neutrophils_1_failure 87 24
## edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 75 118 90 31
## 2 116 175 105 42
## 3 110 157 114 39
## 4 77 20 56 36
## Plot describing unique/shared genes in a differential expression table.
t_cellvisitcf_neut_sig <- extract_significant_genes(
t_cellvisitcf_neut_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_neut_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_ne_cure 90 31 75 118 85 132
## v2v1_ne_failure 105 42 116 175 127 150
## v3v1_ne_cure 114 39 110 157 105 195
## v3v1_ne_failure 56 36 77 20 87 24
## ebseq_up ebseq_down basic_up basic_down
## v2v1_ne_cure 24 16 3 2
## v2v1_ne_failure 75 10 6 1
## v3v1_ne_cure 44 8 0 0
## v3v1_ne_failure 17 5 0 0
t_cellvisitcf_eo_table <- combine_de_tables(
t_cellvisitcf_de, keepers = visittype_contrasts_eo,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_table_sva-v{ver}.xlsx"))## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_eo_table## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown
## 1 eosinophils_2_cure_vs_eosinophils_1_cure 5 1
## 2 eosinophils_2_failure_vs_eosinophils_1_failure 1 5
## 3 eosinophils_3_cure_vs_eosinophils_1_cure 9 1
## 4 eosinophils_3_failure_vs_eosinophils_1_failure 0 8
## edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 1
## 4 0 0 0 0
## Plot describing unique/shared genes in a differential expression table.
t_cellvisitcf_eo_sig <- extract_significant_genes(
t_cellvisitcf_eo_table,
excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_eo_sig## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_eo_cure 0 0 0 0 5 1
## v2v1_eo_failure 0 0 0 0 1 5
## v3v1_eo_cure 0 1 0 0 9 1
## v3v1_eo_failure 0 0 0 0 0 8
## ebseq_up ebseq_down basic_up basic_down
## v2v1_eo_cure 1 0 0 0
## v2v1_eo_failure 4 2 0 0
## v3v1_eo_cure 1 1 0 0
## v3v1_eo_failure 17 5 0 0
tmp <- loadme(filename = savefile)