TMRC3 202501: Differential Expression analyses, Tumaco only.

atb

2025-01-02

1 Changelog

  • 202412: Reorganizing the lme work
  • 202411: Working on the addition of linear mixed models.
  • 202406: Added an explicit comparison of different model constructions using our most variable cell type, the neutrophils.
  • 202406: Working entirely out of the container now, separated GSE/GSEA analyses, added a full treatment with clusterProfiler; I am not currently writing the cp results out as xlsx files until/unless someone expresses interest in them.
  • 202309: Disabled GSVA analyses until/unless we get permission to include the mSigDB 7.5.1 release (what I used). I will simplify the filenames so that one may easily drop in a downloaded copy of the data and run hose blocks. Until then, I guess you (fictitious reader) will have to trust me when I say those blocks all work? (Also, GSVA was moved to a separate document)
  • 202309: Moved all gene set enrichment analyses to 04lrt_gsea_gsva.Rmd
  • 202309 next day: Moving gene set enrichment back because it adds too much complexity to save/reload the DE results for gProfiler and friends.
  • Still hunting for messed up colors, changed input data to match new version.

2 Notes/TODOs for 202412+

  • What do we think about dream’s adjusted p-value results?
  • Create tables of the mlm results as xlsx files, do not bother pulling them into the tables with deseq etc. ** 5 tables: monocyte, neutrophil, eosinophil, all, all+sva
  • Create scatter plots showing similarities between p-values perhaps and z-scores, and logFC.
  • Perform GO etc with mlm results.

3 Introduction

The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document. Most of the actual work is via the function ‘all_pairwise()’; the word ‘all’ in the name does a lot of work; it is responsible for performing all possible pairwise contrasts using all possible methods for which I have sufficient understanding to be able to write a reasonably robust pairwise function. Currently this is limited to:

  • DESeq2 (Love, Huber, and Anders (2014)): Our ‘default’
  • edgeR (McCarthy, Chen, and Smyth (2012)): shares a close conceptual lineage with DESeq2 I think.
  • limma (Ritchie et al. (2015)): along with voom this provides a nicely robust set of tools.
  • EBseq (Leng et al. (2013)): I think it is not as robust as the previous entries, but I like using it because it is an almost purely bayesian method and as such provides a different perspective on any dataset.
  • Noiseq (Tarazona et al. (2015)): I noticed this method relatively recently and was sufficiently intrigued that I threw a method together using it. The authors appear to me to be looking to understand a lot of the questions on which I spend a lot of time.
  • Dream (Hoffman and Roussos (2020)): I mostly like this because it uses variancePartition, which I think is a really nice toy when trying to understand what is going on in a dataset.
  • basic is my own, explicitly uninformed analysis. It is my ‘negative control’ method because, if something agrees entirely with it, then I know that all the fancy math and statistics performed by that method worked out just the same as some doofus (me) just log2 subtracting the expression values. It is not quite that basic, but pretty close.

The first 3 methods allow one to add surrogate variable estimates to the model when performing the differential expression analyses. Noiseq handles surrogates using its own heuristics, EBSeq is inimicable to that kind of model, and I explicitly chose to not make that possible for basic. I am uncertain at this time how the random effect factors used with dream interact with surrogates from sva. With that in mind, in most instances I usually deal with surrogates/batches in one of a few ways:

  1. If the data is absurdly pretty, do nothing (pretty much only for well-controlled bacterial data).
  2. Add a known batch factor to the model (the default).
  3. Try to ensure the data is suitable and invoke sva
    1. to acquire estimates and add them to the model.
  4. If the data has a known batch factor and it is particularly pathological, use the combat implementation in sva. As a general rule I do not like this option because it is data destructive.

The last two options are handled via a function named ‘all_adjusters’ in hpgltools which is responsible for ensuring that the data is sane for the assumptions made by each method and invokes each method (hopefully) properly. It returns both modified counts and model estimates when possible and has implementations for a fair number of methods in this realm. sva is my favorite by a pretty big margin, though I do sometimes use RUV (Risso et al. (2014)) and of course, in writing this document I stumbled into another interesting contender: (Molania et al. (2023)) all_adjusters() also has implementations of every example/method I got out of the papers for sva (e.g. ssva/fsva), isva, smartsva, and some others.

I have been changing hpgltools so that it is now possible to trivially pass arbitrarily complex models to the various methods; with the caveat that there is no good way currently to mix fixed effects and random effects across methods; so I am running dream separately and adding it to the result of all_pairwise post-facto.

3.1 Define contrasts for DE analyses

Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus tc_cf_contrasts is expected to be used for the Tumaco+Cali data and provide a series of cure/fail comparisons which (to the extent possible) across both locations. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.

  • Our primary question: fail/cure: Any excel file written using this contrast will get a single worksheet comparing fail/cure.
  • Compare fail/cure for each visit: This takes a more granular view of the previous contrast. If one is so-inclined, one could compare results from the following contrast against the previous and following contrast to learn about the dynamics of the healing (or not) process.
  • All samples by visit: This is effectively the opposite of the previous and compares all samples of visit x against visit y.
  • Visit 1 vs everything else: When I first did the previous set of contrasts I quickly realized that visits 2 and 3 are relatively similar and that it may be possible to gain a little power and learn a little more by combining them.
  • Directly compare celltypes: We have three clinical cell types in the data and the differences among them are quite interesting.
  • Ethnicities: We also have three ethnic groups in the data, though there are some wacky confounded variables when considering them through the lense of cure/fail; so any results comparing them should be treated with caution.
  • Powerless visits+celltype+cf: This is a last-minute addition requested by Maria Adelaida. I assume it was suggested by a reviewer, though I do not recall seeing anything in the reviews which made this request. The number of samples we have in the data just barely supports these contrasts, and given the strength of all the various surrogates, I would be somewhat reluctant to trust any genes deemed DE in them without some other evidence. It should be noted that this is the intellectual counterpoint to the critique from a different reviewer, that artifically merging factors like this is problematic (I personally tend to agree with the later argument more than the former with the caveat that the added complexity (with respect to what is actually typed by the person (me)) can be a problem. Thus I tend to do the thing which is explicitly less statistically correct (but I can also show pretty definitively that the results are very nearly identical) in order to make it easier to show that no mistakes were made. E.g. tension between ‘correctness’ and ‘robustness’.
t_cf_contrast <- list(
  "outcome" = c("tumaco_failure", "tumaco_cure"))
cf_contrast <- list(
  "outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
  "v1cf" = c("v1_failure", "v1_cure"),
  "v2cf" = c("v2_failure", "v2_cure"),
  "v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
  "v2v1" = c("c2", "c1"),
  "v3v1" = c("c3", "c1"),
  "v3v2" = c("c3", "c2"))
visit_v1later <- list(
  "later_vs_first" = c("later", "first"))
celltypes <- list(
  "eo_mono" = c("eosinophils", "monocytes"),
  "ne_mono" = c("neutrophils", "monocytes"),
  "eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
  "mestizo_indigenous" = c("mestiza", "indigena"),
  "mestizo_afrocol" = c("mestiza", "afrocol"),
  "indigenous_afrocol" = c("indigena", "afrocol"))
outcometype_contrasts <- list(
  "monocyte_cf" = c("failure_monocytes", "cure_monocytes"),
  "neutrophil_cf" = c("failure_neutrophils", "cure_neutrophils"),
  "eosinophil_cf" = c("failure_eosinophils", "cure_eosinophils"))
visittype_contrasts_mono <- list(
  "v2v1_mono_cure" = c("monocytes_2_cure", "monocytes_1_cure"),
  "v2v1_mono_failure" = c("monocytes_2_failure", "monocytes_1_failure"),
  "v3v1_mono_cure" = c("monocytes_3_cure", "monocytes_1_cure"),
  "v3v1_mono_failure" = c("monocytes_3_failure", "monocytes_1_failure"))
visittype_contrasts_eo <- list(
  "v2v1_eo_cure" = c("eosinophils_2_cure", "eosinophils_1_cure"),
  "v2v1_eo_failure" = c("eosinophils_2_failure", "eosinophils_1_failure"),
  "v3v1_eo_cure" = c("eosinophils_3_cure", "eosinophils_1_cure"),
  "v3v1_eo_failure" = c("eosinophils_3_failure", "eosinophils_1_failure"))
visittype_contrasts_ne <- list(
  "v2v1_ne_cure" = c("neutrophils_2_cure", "neutrophils_1_cure"),
  "v2v1_ne_failure" = c("neutrophils_2_failure", "neutrophils_1_failure"),
  "v3v1_ne_cure" = c("neutrophils_3_cure", "neutrophils_1_cure"),
  "v3v1_ne_failure" = c("neutrophils_3_failure", "neutrophils_1_failure"))
visittype_contrasts <- c(visittype_contrasts_mono,
                         visittype_contrasts_eo,
                         visittype_contrasts_ne)

3.2 Gene Set Enrichment / over representation

Previously, the over representation analyses (e.g. GO and friends) followed each DE analysis during this document. I recently mentally severed my conception of GO analyses into two camps: over representation analyses in which one provides a group of genes deemed significant in some way and asks if there are known categories which contain these genes more than one would expect at random. In contrast, I am defining gene set enrichment analyses explcitly as the process of passing all genes with their metric of choice (logFC, exprs, whatever) and asking if the distribution of all genes is significant with respect to the categories. With that in mind, I added a series of explicitly GSEA analyses in my later iterations of these documents so that both ways of thinking are provided.

However, I moved those analyses to a separate document (05enrichment.Rmd) in the hopes of improving their organization.

4 Only Tumaco samples

Start over, this time with only the samples from Tumaco. We currently are assuming these will prove to be the only analyses used for final interpretation. This is primarily because we have insufficient samples which failed treatment from Cali. There is one disadvantage when using these samples: they had to travel further than the samples taken in Cali and there is significant variance observed between the two locations and we cannot discern its source. In the worst case scenario (one which I think unlikely), the variance is caused by degraded RNA during transit. We do know that the samples were well-stored in RNALater and frozen/etc, so I am inclined to discount that possibility. (Also, looking at the reads in IGV they don’t ‘look’ degreaded to me.) I think a more compelling difference lies in the different population demographics observed in the two locations. Actually, now that I have typed these sentences out, I think I can semi-test this hypothesis by looking at the set of DE genes between the two locations and compare that result to the Tumaco (and/or Cali) ethnicity comparison which is most representative of the ethnicity differences between them. If I get it into my head to try this, I will need to load the DE tables from the 03differential_expression_both.Rmd document; so I am most likely to try it out in the 07var_coef document, which was mostly written by Theresa and is already examining some similar questions.

4.1 All samples

Start by considering all Tumaco cell types. Note that in this case we only use SVA, primarily because I am not certain what would be an appropriate batch factor, perhaps visit?

t_cf_clinical_de_sva <- all_pairwise(t_clinical, model_batch = "svaseq",
                                     filter = TRUE,
                                     methods = methods)
## 
##    cure failure 
##      67      56
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_clinical <- t_cf_clinical_de_sva[["input"]]
t_cf_clinical_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.8242
## basic_vs_dream      0.8735
## basic_vs_ebseq      0.6341
## basic_vs_edger      0.8285
## basic_vs_limma      0.8704
## basic_vs_noiseq     0.8582
## deseq_vs_dream      0.8464
## deseq_vs_ebseq      0.6981
## deseq_vs_edger      0.9845
## deseq_vs_limma      0.8063
## deseq_vs_noiseq     0.9062
## dream_vs_ebseq      0.7273
## dream_vs_edger      0.8442
## dream_vs_limma      0.9405
## dream_vs_noiseq     0.7922
## ebseq_vs_edger      0.6715
## ebseq_vs_limma      0.6054
## ebseq_vs_noiseq     0.6921
## edger_vs_limma      0.8177
## edger_vs_noiseq     0.9027
## limma_vs_noiseq     0.7652
t_cf_clinical_table_sva <- combine_de_tables(
  t_cf_clinical_de_sva, keepers = cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure          94           183         103           159
##   limma_sigup limma_sigdown
## 1          50            38
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_clinical_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]

t_cf_clinical_sig_sva <- extract_significant_genes(
  t_cf_clinical_table_sva,
  excel = glue("{cf_prefix}/All_Samples/t_clinical_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       50         38      103        159       94        183        0
##         ebseq_down basic_up basic_down
## outcome         49       29          6

dim(t_cf_clinical_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 94 77
dim(t_cf_clinical_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 183  77

Repeat without the biopsies.

t_cf_clinicalnb_de_sva <- all_pairwise(t_clinical_nobiop, model_batch = "svaseq",
                                       filter = TRUE,
                                       methods = methods)
## 
##    cure failure 
##      58      51
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_clinical_nobiop <- t_cf_clinicalnb_de_sva[["input"]]
t_cf_clinicalnb_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.8266
## basic_vs_dream      0.8478
## basic_vs_ebseq      0.7405
## basic_vs_edger      0.8367
## basic_vs_limma      0.8571
## basic_vs_noiseq     0.8916
## deseq_vs_dream      0.8452
## deseq_vs_ebseq      0.8187
## deseq_vs_edger      0.9964
## deseq_vs_limma      0.8463
## deseq_vs_noiseq     0.8874
## dream_vs_ebseq      0.7810
## dream_vs_edger      0.8487
## dream_vs_limma      0.9851
## dream_vs_noiseq     0.7767
## ebseq_vs_edger      0.8142
## ebseq_vs_limma      0.7814
## ebseq_vs_noiseq     0.8561
## edger_vs_limma      0.8506
## edger_vs_noiseq     0.8933
## limma_vs_noiseq     0.7865
t_cf_clinicalnb_table_sva <- combine_de_tables(
  t_cf_clinicalnb_de_sva, keepers = cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinicalnb_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure         140            75         142            67
##   limma_sigup limma_sigdown
## 1          54            46
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_clinicalnb_table_sva[["plots"]][["outcome"]][["deseq_ma_plots"]]

t_cf_clinicalnb_sig_sva <- extract_significant_genes(
  t_cf_clinicalnb_table_sva,
  excel = glue("{cf_prefix}/All_Samples/t_clinical_nobiop_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinicalnb_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       54         46      142         67      140         75        1
##         ebseq_down basic_up basic_down
## outcome          7       83         30

dim(t_cf_clinicalnb_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 140  84
dim(t_cf_clinicalnb_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 75 84

As the data structure’s name suggests, the above comparison seeks to learn if there are fail/cure differences discernable across all clinical celltypes in samples taken in Tumaco.

The set of steps taken in this previous block will be essentially repeated for every set of contrasts and way of mixing/matching the data and follows the path:

  1. Run all_pairwise to run deseq and friends using surogate estimates provided by sva when appropriate/possible. This creates an unwieldy datastructure containing the results from all methods and all contrasts as a series of nested lists.
  2. Mash them together with combine_de_tables, use the ‘keepers’ argument to define the desired numerators/denominators, and write the tables to the file provided in the ‘excel’ argument.
  3. Yank out the ‘significant’ genes and send them to a separate excel document. In all cases, ‘significant’ is the set with a |log2FC| >= 1.0 and adjusted p-value <= 0.05. This reminds me, one of the reviewers mentioned a set of international guidelines for significant genes, I thought I basically know what I am doing, but this caught me completely unaware. If anyone ever reads this (no one will, let us be honest) I would love to know. The closest thing I found is: (Chung et al. (2021)), but I do not think it really addresses this idea (I have not yet read it carefully).

These datastructures are all exposed to various functions in hpgltools which allow one to poke/compare them; I am not a fan of Excel, but I think the xlsx documents it creates are pretty decent, too.

5 Visit comparisons

Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.

tv1_vs_later <- all_pairwise(t_v1vs, model_batch = "svaseq",
                             filter = TRUE,
                             methods = methods)
## 
## first later 
##    40    69
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_v1vs <- tv1_vs_later[["input"]]
tv1_vs_later
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 ltr_vs_frs
## basic_vs_deseq      0.7946
## basic_vs_dream      0.8022
## basic_vs_ebseq      0.7513
## basic_vs_edger      0.7983
## basic_vs_limma      0.8133
## basic_vs_noiseq     0.8895
## deseq_vs_dream      0.8498
## deseq_vs_ebseq      0.7809
## deseq_vs_edger      0.9983
## deseq_vs_limma      0.8394
## deseq_vs_noiseq     0.8587
## dream_vs_ebseq      0.8100
## dream_vs_edger      0.8564
## dream_vs_limma      0.9717
## dream_vs_noiseq     0.7516
## ebseq_vs_edger      0.7868
## ebseq_vs_limma      0.7791
## ebseq_vs_noiseq     0.8284
## edger_vs_limma      0.8457
## edger_vs_noiseq     0.8626
## limma_vs_noiseq     0.7433
tv1_vs_later_table <- combine_de_tables(
  tv1_vs_later, keepers = visit_v1later, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_tables-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
tv1_vs_later_table
## A set of combined differential expression results.
##            table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 later_vs_first          24             7          22             7
##   limma_sigup limma_sigdown
## 1          23             7
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tv1_vs_later_sig <- extract_significant_genes(
  tv1_vs_later_table,
  excel = glue("{xlsx_prefix}/DE_Visits/tv1_vs_later_sig-v{ver}.xlsx"))
tv1_vs_later_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                limma_up limma_down edger_up edger_down deseq_up deseq_down
## later_vs_first       23          7       22          7       24          7
##                ebseq_up ebseq_down basic_up basic_down
## later_vs_first        0          0        0          3

6 Sex comparison

There is an important caveat when considering the sex of people in the study: there are very few females who failed. As a result I primarily concerned with the cure samples male/female.

t_sex <- subset_expt(tc_sex, subset = "clinic == 'tumaco'")
## subset_expt(): There were 184, now there are 123 samples.
t_sex
## A modified expressionSet containing 19952  and 123 sample. There are 164 metadata columns and 15 annotation columns.
## The primary condition is comprised of:
## female, male.
## Its current state is: raw(data).
t_sex_de <- all_pairwise(t_sex, model_batch = "svaseq", methods = methods,
                         filter = TRUE)
## 
## female   male 
##     22    101
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_sex <- t_sex_de[["input"]]
t_sex_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 mal_vs_fml
## basic_vs_deseq      0.8703
## basic_vs_dream      0.9363
## basic_vs_ebseq      0.7161
## basic_vs_edger      0.8748
## basic_vs_limma      0.9481
## basic_vs_noiseq     0.8530
## deseq_vs_dream      0.8815
## deseq_vs_ebseq      0.7608
## deseq_vs_edger      0.9909
## deseq_vs_limma      0.8596
## deseq_vs_noiseq     0.9120
## dream_vs_ebseq      0.7985
## dream_vs_edger      0.8862
## dream_vs_limma      0.9769
## dream_vs_noiseq     0.8273
## ebseq_vs_edger      0.7802
## ebseq_vs_limma      0.7762
## ebseq_vs_noiseq     0.7579
## edger_vs_limma      0.8663
## edger_vs_noiseq     0.9116
## limma_vs_noiseq     0.8154
t_sex_table <- combine_de_tables(
  t_sex_de, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_sex_table
## A set of combined differential expression results.
##            table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female         129            96         116            95
##   limma_sigup limma_sigdown
## 1          54            74
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_sex_sig <- extract_significant_genes(
  t_sex_table, excel = glue("{xlsx_prefix}/Gene_Set_Enrichment/t_sex_sig-v{ver}.xlsx"))
t_sex_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female       54         74      116         95      129         96
##                ebseq_up ebseq_down basic_up basic_down
## male_vs_female       12         13       18         11

In the following block I removed the failed people so that the comparison makes actual sense.

tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")
## subset_expt(): There were 184, now there are 122 samples.
t_sex_cure <- subset_expt(tc_sex_cure, subset = "clinic == 'tumaco'")
## subset_expt(): There were 122, now there are 67 samples.
t_sex_cure_de <- all_pairwise(t_sex_cure, model_batch = "svaseq",
                              filter = TRUE,
                              methods = methods)
## 
## female   male 
##     13     54
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_sex_cure <- t_sex_cure_de[["input"]]
t_sex_cure_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 mal_vs_fml
## basic_vs_deseq      0.7995
## basic_vs_dream      0.9214
## basic_vs_ebseq      0.6679
## basic_vs_edger      0.8474
## basic_vs_limma      0.9284
## basic_vs_noiseq     0.8792
## deseq_vs_dream      0.8093
## deseq_vs_ebseq      0.7225
## deseq_vs_edger      0.9294
## deseq_vs_limma      0.7804
## deseq_vs_noiseq     0.8453
## dream_vs_ebseq      0.7812
## dream_vs_edger      0.8625
## dream_vs_limma      0.9698
## dream_vs_noiseq     0.8411
## ebseq_vs_edger      0.7687
## ebseq_vs_limma      0.7446
## ebseq_vs_noiseq     0.7109
## edger_vs_limma      0.8380
## edger_vs_noiseq     0.8881
## limma_vs_noiseq     0.8149
t_sex_cure_table <- combine_de_tables(
  t_sex_cure_de, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_sex_cure_table
## A set of combined differential expression results.
##            table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female         176           134         162           143
##   limma_sigup limma_sigdown
## 1          64           108
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_sex_cure_sig <- extract_significant_genes(
  t_sex_cure_table, excel = glue("{xlsx_prefix}/DE_Sex/t_sex_cure_sig-v{ver}.xlsx"))
t_sex_cure_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female       64        108      162        143      176        134
##                ebseq_up ebseq_down basic_up basic_down
## male_vs_female       11         15       14          5

7 Ethnicity comparisons

In a fashion similar to the putative sex comparisons; there are few/no fails for one ethnicity. In addition, the observed ethnicities are very different for the two clinics. This makes comparisons of the ethnicities tricky.

t_ethnicity_de <- all_pairwise(t_etnia_expt, model_batch = "svaseq",
                               filter = TRUE,
                               methods = methods)
## 
##  afrocol indigena  mestiza 
##       76       19       28
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_etnia_expt <- t_ethnicity_de[["input"]]
t_ethnicity_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_ethnicity_table <- combine_de_tables(
  t_ethnicity_de, keepers = ethnicity_contrasts, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_ethnicity_table
## A set of combined differential expression results.
##                 table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena          83            97          67           108
## 2  mestiza_vs_afrocol          57            92          52            96
## 3 indigena_vs_afrocol         165           236         187           216
##   limma_sigup limma_sigdown
## 1          58            56
## 2          42            53
## 3         165           147
## Plot describing unique/shared genes in a differential expression table.

t_ethnicity_sig <- extract_significant_genes(
  t_ethnicity_table, according_to = "deseq",
  excel = glue("{xlsx_prefix}/DE_Ethnicity/t_ethnicity_sig-v{ver}.xlsx"))
t_ethnicity_sig
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    deseq_up deseq_down
## mestizo_indigenous       83         97
## mestizo_afrocol          57         92
## indigenous_afrocol      165        236

8 Separate the Tumaco data by visit

One of the most compelling ideas in the data is the opportunity to find genes in the first visit which may help predict the likelihood that a person will respond well to treatment. The following block will therefore look at cure/fail from Tumaco at visit 1.

8.1 Cure/Fail, Tumaco Visit 1

t_cf_clinical_v1_de_sva <- all_pairwise(tv1_samples, model_batch = "svaseq",
                                        filter = TRUE,
                                        methods = methods)
## 
##    cure failure 
##      30      24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tv1_samples <- t_cf_clinical_v1_de_sva[["input"]]
t_cf_clinical_v1_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.6955
## basic_vs_dream      0.7310
## basic_vs_ebseq      0.6519
## basic_vs_edger      0.7228
## basic_vs_limma      0.6886
## basic_vs_noiseq     0.8245
## deseq_vs_dream      0.7917
## deseq_vs_ebseq      0.7127
## deseq_vs_edger      0.9537
## deseq_vs_limma      0.7398
## deseq_vs_noiseq     0.7815
## dream_vs_ebseq      0.6921
## dream_vs_edger      0.8274
## dream_vs_limma      0.9332
## dream_vs_noiseq     0.6911
## ebseq_vs_edger      0.6798
## ebseq_vs_limma      0.5529
## ebseq_vs_noiseq     0.7747
## edger_vs_limma      0.7829
## edger_vs_noiseq     0.7899
## limma_vs_noiseq     0.5978
t_cf_clinical_v1_table_sva <- combine_de_tables(
  t_cf_clinical_v1_de_sva, keepers = cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v1_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure          27            75          28            55
##   limma_sigup limma_sigdown
## 1           3             3
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_clinical_v1_sig_sva <- extract_significant_genes(
  t_cf_clinical_v1_table_sva,
  excel = glue("{cf_prefix}/Visits/t_clinical_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v1_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        3          3       28         55       27         75        0
##         ebseq_down basic_up basic_down
## outcome         37        0          0

dim(t_cf_clinical_v1_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 27 84
dim(t_cf_clinical_v1_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 75 84

8.2 Cure/Fail, Tumaco Visit 2

The visit 2 and visit 3 samples are interesting because they provide an opportunity to see if we can observe changes in response in the middle and end of treatment…

t_cf_clinical_v2_de_sva <- all_pairwise(tv2_samples, model_batch = "svaseq",
                                        filter = TRUE,
                                        methods = methods)
## 
##    cure failure 
##      20      15
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tv2_samples <- t_cf_clinical_v2_de_sva[["input"]]
t_cf_clinical_v2_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.7689
## basic_vs_dream      0.7173
## basic_vs_ebseq      0.7215
## basic_vs_edger      0.7701
## basic_vs_limma      0.7404
## basic_vs_noiseq     0.8528
## deseq_vs_dream      0.8053
## deseq_vs_ebseq      0.7893
## deseq_vs_edger      0.9986
## deseq_vs_limma      0.8138
## deseq_vs_noiseq     0.8412
## dream_vs_ebseq      0.6823
## dream_vs_edger      0.8077
## dream_vs_limma      0.9633
## dream_vs_noiseq     0.6034
## ebseq_vs_edger      0.7929
## ebseq_vs_limma      0.6902
## ebseq_vs_noiseq     0.8218
## edger_vs_limma      0.8162
## edger_vs_noiseq     0.8401
## limma_vs_noiseq     0.6291
t_cf_clinical_v2_table_sva <- combine_de_tables(
  t_cf_clinical_v2_de_sva, keepers = cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v2_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure          51            15          50            11
##   limma_sigup limma_sigdown
## 1           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_clinical_v2_sig_sva <- extract_significant_genes(
  t_cf_clinical_v2_table_sva,
  excel = glue("{cf_prefix}/Visits/t_clinical_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v2_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       50         11       51         15        0
##         ebseq_down basic_up basic_down
## outcome          0        0          0

dim(t_cf_clinical_v2_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 51 84
dim(t_cf_clinical_v2_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 15 84

8.3 Cure/Fail, Tumaco Visit 3

Repeat for visit 3

t_cf_clinical_v3_de_sva <- all_pairwise(tv3_samples, model_batch = "svaseq",
                                        filter = TRUE,
                                        methods = methods)
## 
##    cure failure 
##      17      17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tv3_samples <- t_cf_clinical_v3_de_sva[["input"]]
t_cf_clinical_v3_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.7969
## basic_vs_dream      0.8072
## basic_vs_ebseq      0.7585
## basic_vs_edger      0.8030
## basic_vs_limma      0.8193
## basic_vs_noiseq     0.8988
## deseq_vs_dream      0.8559
## deseq_vs_ebseq      0.8006
## deseq_vs_edger      0.9978
## deseq_vs_limma      0.8530
## deseq_vs_noiseq     0.8716
## dream_vs_ebseq      0.7661
## dream_vs_edger      0.8635
## dream_vs_limma      0.9817
## dream_vs_noiseq     0.7378
## ebseq_vs_edger      0.8040
## ebseq_vs_limma      0.7614
## ebseq_vs_noiseq     0.8465
## edger_vs_limma      0.8605
## edger_vs_noiseq     0.8769
## limma_vs_noiseq     0.7409
t_cf_clinical_v3_table_sva <- combine_de_tables(
  t_cf_clinical_v3_de_sva, keepers = cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_clinical_v3_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure         120            61         120            50
##   limma_sigup limma_sigdown
## 1           3             1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_clinical_v3_sig_sva <- extract_significant_genes(
  t_cf_clinical_v3_table_sva,
  excel = glue("{cf_prefix}/Visits/t_clinical_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_clinical_v3_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        3          1      120         50      120         61        0
##         ebseq_down basic_up basic_down
## outcome          0        0          0

dim(t_cf_clinical_v3_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 120  84
dim(t_cf_clinical_v3_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 61 84

9 By cell type

Now let us switch our view to each individual cell type collected. The hope here is that we will be able to learn some cell-specific differences in the response for people who did(not) respond well.

9.1 Cure/Fail, Biopsies

A primary hypothesis/assumption that we have held for quite a while with this data: the biopsy samples, given that they are comprised of hetergeneous tissue types as well as a mix of healthy and infected tissue; are unlikely to be very information rich vis a vis cure/fail. The following block seems to support that; we observe very few genes in the biopsies.

I therefore did not spend the time invoking other models.

t_cf_biopsy_de_sva <- all_pairwise(t_biopsies, model_batch = "svaseq",
                                   filter = TRUE,
                                   methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              9              5
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_biopsies <- t_cf_biopsy_de_sva[["input"]]
t_cf_biopsy_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8164
## basic_vs_dream      0.8519
## basic_vs_ebseq      0.8011
## basic_vs_edger      0.8809
## basic_vs_limma      0.8497
## basic_vs_noiseq     0.9162
## deseq_vs_dream      0.7992
## deseq_vs_ebseq      0.8628
## deseq_vs_edger      0.9516
## deseq_vs_limma      0.7927
## deseq_vs_noiseq     0.8685
## dream_vs_ebseq      0.7538
## dream_vs_edger      0.8689
## dream_vs_limma      0.9937
## dream_vs_noiseq     0.7760
## ebseq_vs_edger      0.8843
## ebseq_vs_limma      0.7354
## ebseq_vs_noiseq     0.8872
## edger_vs_limma      0.8628
## edger_vs_noiseq     0.9181
## limma_vs_noiseq     0.7668
t_cf_biopsy_table_sva <- combine_de_tables(
  t_cf_biopsy_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Biopsies/t_biopsy_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_biopsy_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          17            11          19
##   edger_sigdown limma_sigup limma_sigdown
## 1            15           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_biopsy_sig_sva <- extract_significant_genes(
  t_cf_biopsy_table_sva,
  excel = glue("{cf_prefix}/Biopsies/t_cf_biopsy_sig_sva-v{ver}.xlsx"))
t_cf_biopsy_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       19         15       17         11       11
##         ebseq_down basic_up basic_down
## outcome         57        0          0

dim(t_cf_biopsy_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 17 84
dim(t_cf_biopsy_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 11 84

9.2 Cure/Fail, Monocytes

Same question, but this time looking at monocytes. In addition, this comparison was done twice, once using SVA and once using visit as a batch factor.

I have been using this block to ensure that changed I have been making to the hpgltools do not change the analysis results. Thus the comment with a few logFC values; those are the first 6 observed DESeq2 logFC values in my last result before I made some changes to hpgltools in order to be able to work with random effect models.

t_cf_monocyte_de_sva <- all_pairwise(t_monocytes, model_batch = "svaseq",
                                     filter = TRUE,
                                     methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             21             21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## The svs are added to the expressionset during all_pairwise.
t_monocytes <- t_cf_monocyte_de_sva[["input"]]
t_cf_monocyte_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8506
## basic_vs_dream      0.9183
## basic_vs_ebseq      0.8470
## basic_vs_edger      0.8560
## basic_vs_limma      0.9210
## basic_vs_noiseq     0.9525
## deseq_vs_dream      0.8713
## deseq_vs_ebseq      0.8556
## deseq_vs_edger      0.9989
## deseq_vs_limma      0.8614
## deseq_vs_noiseq     0.8955
## dream_vs_ebseq      0.7883
## dream_vs_edger      0.8755
## dream_vs_limma      0.9910
## dream_vs_noiseq     0.8827
## ebseq_vs_edger      0.8563
## ebseq_vs_limma      0.7794
## ebseq_vs_noiseq     0.8874
## edger_vs_limma      0.8663
## edger_vs_noiseq     0.9000
## limma_vs_noiseq     0.8720
t_cf_monocyte_table_sva <- combine_de_tables(
  t_cf_monocyte_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          60            52          56
##   edger_sigdown limma_sigup limma_sigdown
## 1            51          11            34
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

head(t_cf_monocyte_table_sva[["data"]][["outcome"]][["deseq_logfc"]])
## [1]  0.33760 -0.07193  0.09665 -0.09082 -0.13500  0.23270
## The first few values in my pre-change result set are:
## 0.338, -0.072, 0.097, -0.091, -0.135, 0.233
t_cf_monocyte_sig_sva <- extract_significant_genes(
  t_cf_monocyte_table_sva,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       11         34       56         51       60         52        0
##         ebseq_down basic_up basic_down
## outcome         23      168        197

dim(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 60 84
dim(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 52 84
t_cf_monocyte_de_batchvisit <- all_pairwise(t_monocytes, model_batch = TRUE,
                                            filter = TRUE,
                                            methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             21             21 
## 
##  3  2  1 
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_monocyte_de_batchvisit
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8505
## basic_vs_dream      0.9414
## basic_vs_ebseq      0.8470
## basic_vs_edger      0.8540
## basic_vs_limma      0.9509
## basic_vs_noiseq     0.9525
## deseq_vs_dream      0.8178
## deseq_vs_ebseq      0.9932
## deseq_vs_edger      0.9998
## deseq_vs_limma      0.8120
## deseq_vs_noiseq     0.8857
## dream_vs_ebseq      0.8016
## dream_vs_edger      0.8202
## dream_vs_limma      0.9819
## dream_vs_noiseq     0.9085
## ebseq_vs_edger      0.9935
## ebseq_vs_limma      0.7952
## ebseq_vs_noiseq     0.8874
## edger_vs_limma      0.8150
## edger_vs_noiseq     0.8884
## limma_vs_noiseq     0.9004
t_cf_monocyte_table_batchvisit <- combine_de_tables(
  t_cf_monocyte_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_table_batchvisit-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_table_batchvisit
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          43            93          47
##   edger_sigdown limma_sigup limma_sigdown
## 1           105           6            13
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_monocyte_sig_batchvisit <- extract_significant_genes(
  t_cf_monocyte_table_batchvisit,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_monocyte_sig_batchvisit
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        6         13       47        105       43         93        0
##         ebseq_down basic_up basic_down
## outcome         23      168        197

dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["ups"]][[1]])
## [1] 43 84
dim(t_cf_monocyte_sig_batchvisit[["deseq"]][["downs"]][[1]])
## [1] 93 84

9.3 Individual visits, Monocytes

Now focus in on the monocyte samples on a per-visit basis.

9.3.1 Visit 1

t_cf_monocyte_v1_de_sva <- all_pairwise(tv1_monocytes, model_batch = "svaseq",
                                        filter = TRUE,
                                        methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              8              8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tv1_monocytes <- t_cf_monocyte_v1_de_sva[["input"]]
t_cf_monocyte_v1_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8866
## basic_vs_dream      0.9307
## basic_vs_ebseq      0.9060
## basic_vs_edger      0.8870
## basic_vs_limma      0.9440
## basic_vs_noiseq     0.9486
## deseq_vs_dream      0.8965
## deseq_vs_ebseq      0.8945
## deseq_vs_edger      0.9999
## deseq_vs_limma      0.8902
## deseq_vs_noiseq     0.9130
## dream_vs_ebseq      0.8365
## dream_vs_edger      0.8965
## dream_vs_limma      0.9832
## dream_vs_noiseq     0.8987
## ebseq_vs_edger      0.8950
## ebseq_vs_limma      0.8280
## ebseq_vs_noiseq     0.9484
## edger_vs_limma      0.8905
## edger_vs_noiseq     0.9134
## limma_vs_noiseq     0.8847
t_cf_monocyte_v1_table_sva <- combine_de_tables(
  t_cf_monocyte_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_monocyte_v1_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          14            52          15
##   edger_sigdown limma_sigup limma_sigdown
## 1            57           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_monocyte_v1_sig_sva <- extract_significant_genes(
  t_cf_monocyte_v1_table_sva,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_monocyte_v1_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       15         57       14         52        0
##         ebseq_down basic_up basic_down
## outcome         15        0          0

dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 14 84
dim(t_cf_monocyte_v1_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 52 84

9.3.2 Monocytes: Compare sva to batch-in-model

sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][[1]],
                           tbl2 = t_cf_monocyte_table_batchvisit[["data"]][[1]],
                           py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc
## These two tables have an aucc value of: 0.694200173169544 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 182, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8633 0.8726
## sample estimates:
##    cor 
## 0.8681

shared_ids <- rownames(t_cf_monocyte_table_sva[["data"]][[1]]) %in%
  rownames(t_cf_monocyte_table_batchvisit[["data"]][[1]])
first <- t_cf_monocyte_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_monocyte_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])
## 
##  Pearson's product-moment correlation
## 
## data:  first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 182, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8633 0.8726
## sample estimates:
##    cor 
## 0.8681

9.4 Neutrophil samples

Switch context to the Neutrophils, once again repeat the analysis using SVA and visit as a batch factor.

t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
                                       filter = TRUE,
                                       methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             20             21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8754
## basic_vs_dream      0.9210
## basic_vs_ebseq      0.8587
## basic_vs_edger      0.8812
## basic_vs_limma      0.9321
## basic_vs_noiseq     0.9457
## deseq_vs_dream      0.8840
## deseq_vs_ebseq      0.9062
## deseq_vs_edger      0.9994
## deseq_vs_limma      0.8742
## deseq_vs_noiseq     0.9367
## dream_vs_ebseq      0.8365
## dream_vs_edger      0.8882
## dream_vs_limma      0.9861
## dream_vs_noiseq     0.8922
## ebseq_vs_edger      0.9068
## ebseq_vs_limma      0.8430
## ebseq_vs_noiseq     0.9212
## edger_vs_limma      0.8784
## edger_vs_noiseq     0.9404
## limma_vs_noiseq     0.8943
t_cf_neutrophil_table_sva <- combine_de_tables(
  t_cf_neutrophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure         130            30         120
##   edger_sigdown limma_sigup limma_sigdown
## 1            27          12            12
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_sig_sva <- extract_significant_genes(
  t_cf_neutrophil_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       12         12      120         27      130         30        7
##         ebseq_down basic_up basic_down
## outcome          7        7          3

dim(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 130  84
dim(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 30 84
t_cf_neutrophil_de_batchvisit <- all_pairwise(t_neutrophils, model_batch = TRUE,
                                              filter = TRUE,
                                              methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             20             21 
## 
##  3  2  1 
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_de_batchvisit
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8644
## basic_vs_dream      0.9574
## basic_vs_ebseq      0.8587
## basic_vs_edger      0.8671
## basic_vs_limma      0.9658
## basic_vs_noiseq     0.9457
## deseq_vs_dream      0.8356
## deseq_vs_ebseq      0.9813
## deseq_vs_edger      0.9999
## deseq_vs_limma      0.8380
## deseq_vs_noiseq     0.9184
## dream_vs_ebseq      0.8264
## dream_vs_edger      0.8377
## dream_vs_limma      0.9840
## dream_vs_noiseq     0.9157
## ebseq_vs_edger      0.9818
## ebseq_vs_limma      0.8284
## ebseq_vs_noiseq     0.9212
## edger_vs_limma      0.8401
## edger_vs_noiseq     0.9204
## limma_vs_noiseq     0.9125
t_cf_neutrophil_table_batchvisit <- combine_de_tables(
  t_cf_neutrophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_table_batchvisit-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_table_batchvisit
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          92            47         101
##   edger_sigdown limma_sigup limma_sigdown
## 1            44           3             1
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_sig_batchvisit <- extract_significant_genes(
  t_cf_neutrophil_table_batchvisit,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_neutrophil_sig_batchvisit
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        3          1      101         44       92         47        7
##         ebseq_down basic_up basic_down
## outcome          7        7          3

dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["ups"]][[1]])
## [1] 92 84
dim(t_cf_neutrophil_sig_batchvisit[["deseq"]][["downs"]][[1]])
## [1] 47 84

9.4.1 Neutrophils by visit

When I did this with the monocytes, I split it up into multiple blocks for each visit. This time I am just going to run them all together.

visitcf_factor <- paste0("v", pData(t_neutrophils)[["visitnumber"]], "_",
                         pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact=visitcf_factor)
## The numbers of samples by condition are:
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          5          7
t_cf_neutrophil_visits_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
                                              filter = TRUE,
                                              methods = methods)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          5          7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_visits_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_neutrophil_visits_table_sva <- combine_de_tables(
  t_cf_neutrophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_visits_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure          12             6           6             6
## 2 v2_failure_vs_v2_cure           2             6           2             3
## 3 v3_failure_vs_v3_cure           2             2           0             2
##   limma_sigup limma_sigdown
## 1           1             0
## 2           0             0
## 3           0             0
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_visits_sig_sva <- extract_significant_genes(
  t_cf_neutrophil_visits_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_visits_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        1          0        6          6       12          6        0
## v2cf        0          0        2          3        2          6        1
## v3cf        0          0        0          2        2          2        2
##      ebseq_down basic_up basic_down
## v1cf          2        0          0
## v2cf          1        0          0
## v3cf          3        0          0

dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 12 84
dim(t_cf_neutrophil_visits_sig_sva[["deseq"]][["downs"]][[1]])
## [1]  6 84

Now V1

t_cf_neutrophil_v1_de_sva <- all_pairwise(tv1_neutrophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              8              8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_v1_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8627
## basic_vs_dream      0.8835
## basic_vs_ebseq      0.8706
## basic_vs_edger      0.8792
## basic_vs_limma      0.8909
## basic_vs_noiseq     0.9312
## deseq_vs_dream      0.8208
## deseq_vs_ebseq      0.9418
## deseq_vs_edger      0.9946
## deseq_vs_limma      0.8180
## deseq_vs_noiseq     0.9183
## dream_vs_ebseq      0.7912
## dream_vs_edger      0.8365
## dream_vs_limma      0.9761
## dream_vs_noiseq     0.8407
## ebseq_vs_edger      0.9421
## ebseq_vs_limma      0.7986
## ebseq_vs_noiseq     0.9456
## edger_vs_limma      0.8319
## edger_vs_noiseq     0.9251
## limma_vs_noiseq     0.8360
t_cf_neutrophil_v1_table_sva <- combine_de_tables(
  t_cf_neutrophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v1_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure           5             8           5
##   edger_sigdown limma_sigup limma_sigdown
## 1            11           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_v1_sig_sva <- extract_significant_genes(
  t_cf_neutrophil_v1_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v1_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0        5         11        5          8        0
##         ebseq_down basic_up basic_down
## outcome          2        0          0

dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["ups"]][[1]])
## [1]  5 84
dim(t_cf_neutrophil_v1_sig_sva[["deseq"]][["downs"]][[1]])
## [1]  8 84

Followed by visit 2.

t_cf_neutrophil_v2_de_sva <- all_pairwise(tv2_neutrophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              7              6
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_v2_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.9021
## basic_vs_dream      0.9514
## basic_vs_ebseq      0.8977
## basic_vs_edger      0.9026
## basic_vs_limma      0.9485
## basic_vs_noiseq     0.9224
## deseq_vs_dream      0.8964
## deseq_vs_ebseq      0.9777
## deseq_vs_edger      0.9986
## deseq_vs_limma      0.8893
## deseq_vs_noiseq     0.9631
## dream_vs_ebseq      0.8740
## dream_vs_edger      0.8948
## dream_vs_limma      0.9938
## dream_vs_noiseq     0.8987
## ebseq_vs_edger      0.9754
## ebseq_vs_limma      0.8654
## ebseq_vs_noiseq     0.9756
## edger_vs_limma      0.8880
## edger_vs_noiseq     0.9613
## limma_vs_noiseq     0.8903
t_cf_neutrophil_v2_table_sva <- combine_de_tables(
  t_cf_neutrophil_v2_de_sva, scale_p = TRUE, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v2_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure           9             3          20
##   edger_sigdown limma_sigup limma_sigdown
## 1             6           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_v2_sig_sva <- extract_significant_genes(
  t_cf_neutrophil_v2_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v2_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       20          6        9          3        1
##         ebseq_down basic_up basic_down
## outcome          1        0          0

dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["ups"]][[1]])
## [1]  9 84
dim(t_cf_neutrophil_v2_sig_sva[["deseq"]][["downs"]][[1]])
## [1]  3 84

and visit 3.

t_cf_neutrophil_v3_de_sva <- all_pairwise(tv3_neutrophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              5              7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_neutrophil_v3_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.7528
## basic_vs_dream      0.7868
## basic_vs_ebseq      0.8738
## basic_vs_edger      0.7514
## basic_vs_limma      0.7952
## basic_vs_noiseq     0.9212
## deseq_vs_dream      0.8919
## deseq_vs_ebseq      0.7550
## deseq_vs_edger      0.9993
## deseq_vs_limma      0.8849
## deseq_vs_noiseq     0.8275
## dream_vs_ebseq      0.7659
## dream_vs_edger      0.8932
## dream_vs_limma      0.9848
## dream_vs_noiseq     0.7798
## ebseq_vs_edger      0.7594
## ebseq_vs_limma      0.7499
## ebseq_vs_noiseq     0.9516
## edger_vs_limma      0.8859
## edger_vs_noiseq     0.8291
## limma_vs_noiseq     0.7658
t_cf_neutrophil_v3_table_sva <- combine_de_tables(
  t_cf_neutrophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_neutrophil_v3_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure           5             1           5
##   edger_sigdown limma_sigup limma_sigdown
## 1             1           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_neutrophil_v3_sig_sva <- extract_significant_genes(
  t_cf_neutrophil_v3_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_neutrophil_v3_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0        5          1        5          1        2
##         ebseq_down basic_up basic_down
## outcome          3        0          0

dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["ups"]][[1]])
## [1]  5 84
dim(t_cf_neutrophil_v3_sig_sva[["deseq"]][["downs"]][[1]])
## [1]  1 84

9.4.2 Neutrophils: Compare sva to batch-in-model

sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][[1]],
                           tbl2 = t_cf_neutrophil_table_batchvisit[["data"]][[1]],
                           py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc
## These two tables have an aucc value of: 0.673209505652166 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 209, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9060 0.9131
## sample estimates:
##    cor 
## 0.9096

shared_ids <- rownames(t_cf_neutrophil_table_sva[["data"]][[1]]) %in%
  rownames(t_cf_neutrophil_table_batchvisit[["data"]][[1]])
first <- t_cf_neutrophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_neutrophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])
## 
##  Pearson's product-moment correlation
## 
## data:  first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 209, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9060 0.9131
## sample estimates:
##    cor 
## 0.9096

9.5 Eosinophils

This time, with feeling! Repeating the same set of tasks with the eosinophil samples.

t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
                                       filter = TRUE,
                                       methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             17              9
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8488
## basic_vs_dream      0.8573
## basic_vs_ebseq      0.8636
## basic_vs_edger      0.8546
## basic_vs_limma      0.8756
## basic_vs_noiseq     0.9094
## deseq_vs_dream      0.9218
## deseq_vs_ebseq      0.8058
## deseq_vs_edger      0.9973
## deseq_vs_limma      0.9099
## deseq_vs_noiseq     0.8693
## dream_vs_ebseq      0.7957
## dream_vs_edger      0.9290
## dream_vs_limma      0.9842
## dream_vs_noiseq     0.8409
## ebseq_vs_edger      0.8134
## ebseq_vs_limma      0.8005
## ebseq_vs_noiseq     0.8986
## edger_vs_limma      0.9174
## edger_vs_noiseq     0.8773
## limma_vs_noiseq     0.8128
t_cf_eosinophil_table_sva <- combine_de_tables(
  t_cf_eosinophil_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure         116            75         112
##   edger_sigdown limma_sigup limma_sigdown
## 1            63          57            34
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_sig_sva <- extract_significant_genes(
  t_cf_eosinophil_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       57         34      112         63      116         75        7
##         ebseq_down basic_up basic_down
## outcome         33        0          0

dim(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 116  84
dim(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 75 84
knitr::kable(head(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000198178 ENSG00000198178 ENST00000537530 10 1 C-type lectin domain family 4 member C [Source:HGNC Symbol;Acc:HGNC:13258] protein_coding 267 12 - 7729415 7751605 CLEC4C CLEC4C ENSG00000198178.1 510.5 3.920 0.1398 5.537 2e-03 4.768 0.0117 2.072 0.8248 5.152 0.1934 4.225 0.0191 2.247 1 2.0400 -2.764 8.784 8.946 0.0012 4.804 193.80 1.3030 4.249 0e+00 9.617 4.0806 -1.5610 5.030 0.0000 1.3490 4.205 89.396 375.93 188.58 4.095 0.8248 0.1752 2.2170 5.953 0.0147 -1.4190 4.4420 0.0002 0.1256 366.62 77.257 221.94 1.036 0.9139 0.0861 0.0158 -1.2450 0.0123 -1.3020 0.0017 -1.2450 0.1912 -1.2450 0.1527 0.8604 1 13.140 0.9751 -1.451 4.982 4.880e-03 9.796e-04 4.956e-03 7.107e-05
ENSG00000187569 ENSG00000187569 ENST00000345088 3 3 developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199] protein_coding 480 12 + 7711433 7717559 DPPA3 DPPA3 ENSG00000187569.3 480 3.662 0.1989 5.448 7e-03 3.969 0.0408 4.587 0.0605 4.721 0.0547 3.504 0.0470 3.622 1 -0.6872 -4.301 7.263 2.839 0.0035 3.614 23.10 1.4300 3.810 1e-04 5.866 0.4183 -3.6360 3.937 0.0006 -1.1340 24.038 2.470 59.59 22.24 21.893 0.0605 0.9395 -0.5913 9.779 0.0018 -3.3720 3.6990 0.0011 -1.6550 56.07 4.555 30.31 2.207 0.9935 0.0065 0.0339 -1.2420 0.0334 -1.3000 0.0049 -1.2420 0.0550 -1.2420 0.7668 5.8330 1 9.882 0.9751 -1.704 4.443 4.098e-01 9.225e-02 9.866e-04 6.647e-07
ENSG00000136235 ENSG00000136235 ENST00000479625 16 1 glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462] protein_coding undefined 7 + 23235967 23275108 GPNMB GPNMB ENSG00000136235.1 1447.5 2.102 0.4987 5.410 4e-04 4.617 0.0900 5.629 0.8546 5.374 0.0001 3.867 0.1754 4.475 1 -1.1190 -3.695 12.101 2.665 0.0621 2.576 53.03 1.1380 4.752 0e+00 6.906 1.4965 -3.2100 3.221 0.0035 -2.1580 49.486 2.881 143.07 51.41 39.921 0.8546 0.1454 0.4580 25.540 0.0000 -3.2370 2.5210 0.0184 -3.2570 125.19 5.631 65.41 2.264 0.9978 0.0022 0.1386 -1.1860 0.0918 -1.2910 0.0002 -1.1860 0.0001 -1.1860 0.1253 0.6664 1 7.044 0.6634 -1.718 4.798 5.245e-02 1.093e-02 6.124e-03 1.125e-04
ENSG00000089012 ENSG00000089012 ENST00000497407 14 2 signal regulatory protein gamma [Source:HGNC Symbol;Acc:HGNC:15757] protein_coding undefined 20 - 1629152 1657779 SIRPG SIRPG ENSG00000089012.2 880.8 1.974 0.5427 4.040 0e+00 1.758 0.6538 5.912 0.7384 4.018 0.0000 1.598 0.6625 5.479 0 0.8317 -1.681 13.876 1.355 0.0805 2.513 272.50 0.7310 5.526 0e+00 7.574 3.5336 -1.1950 1.093 0.2846 -4.9740 60.217 12.007 723.63 258.34 50.266 0.7384 0.2616 2.7060 32.750 0.0000 -1.1480 0.9807 0.3360 -5.0020 771.37 17.288 394.33 3.129 1.0000 0.0000 0.5902 -0.1572 0.5723 -0.3745 0.0000 -0.1572 0.0000 -0.1572 0.2245 1.4220 1 6.872 0.0000 -1.725 3.148 1.579e+00 5.016e-01 1.120e-01 3.763e-02
ENSG00000089127 ENSG00000089127 ENST00000540589 13 2 2’-5’-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086] protein_coding 68 12 + 112906783 112933222 OAS1 OAS1 ENSG00000089127.2 682.8 3.284 0.2669 3.933 0e+00 3.518 0.0562 4.691 0.0845 3.943 0.0000 3.339 0.0596 4.036 1 1.9510 -1.301 8.237 1.116 0.0092 3.252 184.60 0.5478 7.180 0e+00 7.841 3.9081 -0.5641 3.632 0.0012 -0.9652 25.834 18.535 479.09 177.96 23.947 0.0845 0.9155 2.1560 44.580 0.0000 -0.4596 3.4950 0.0018 -1.3000 410.17 25.003 217.58 2.621 0.9894 0.0106 0.0493 -1.2400 0.0529 -1.2980 0.0000 -1.2400 0.0000 -1.2400 0.7325 5.6770 1 8.893 0.9751 -1.691 3.722 1.794e-02 4.820e-03 5.900e-04 1.044e-06
ENSG00000137959 ENSG00000137959 ENST00000450498 16 1 interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817] protein_coding 699 1 + 78619922 78646145 IFI44L IFI44L ENSG00000137959.1 783.333333333333 3.909 0.1645 3.828 0e+00 3.369 0.0199 4.022 0.7568 3.831 0.0000 3.443 0.0123 4.334 0 5.5560 1.793 6.584 3.318 0.0020 3.763 1932.00 0.5401 7.087 0e+00 11.304 7.4755 2.8960 4.483 0.0001 1.0980 16.246 295.965 4808.31 1857.93 14.896 0.7568 0.2432 5.4900 57.400 0.0000 3.0090 4.7380 0.0001 1.6850 5616.14 278.525 2947.33 3.056 1.0000 0.0000 0.0135 -1.2450 0.0216 -1.3020 0.0000 -1.2450 0.0000 -1.2450 0.2506 1.3030 1 10.290 0.0000 -1.725 3.691 2.660e-03 7.207e-04 2.392e-05 1.716e-09
knitr::kable(head(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000179344 ENSG00000179344 ENST00000399084 16 5 major histocompatibility complex, class II, DQ beta 1 [Source:HGNC Symbol;Acc:HGNC:4944] protein_coding 786 6 - 32659467 32668383 HLA-DQB1 HLA-DQB1 ENSG00000179344.5 645.5 -5.227 0.0515 -5.676 0.0000 -7.319 0.0138 -4.507 0.0008 -5.668 0.0000 -7.612 0.0155 -5.899 0 0.0597 5.6120 5.9251 7.993 0.0001 -5.552 4151.00 0.8485 -6.689 0.0000 6.782 12.458 3.7070 -4.809 0.0001 1.843 0.0440 6266.46 275.6710 4192.72 0.0418 0.0008 0.9992 6.575 36.280 0.0000 3.5810 -4.604 0.0001 1.291 138.566 8270.46 4204.51 -3.688 1.0000 0.0000 0.0228 -1.245 0.0137 -1.3020 0.0000 -1.245 0.0000 -1.245 0.9901 6.2210 -20730 -15.180 0.0000 -1.725 -5.952 1.301e+00 -2.186e-01 3.393e-05 3.454e-09
ENSG00000112139 ENSG00000112139 ENST00000515437 16 5 MAM domain containing glycosylphosphatidylinositol anchor 1 [Source:HGNC Symbol;Acc:HGNC:19267] protein_coding 388 6 - 37630679 37699306 MDGA1 MDGA1 ENSG00000112139.5 1438.71428571429 -2.249 0.4206 -5.037 0.0015 -2.682 0.3355 -2.084 0.0001 -4.942 0.0398 -2.844 0.2584 -2.532 1 -3.3850 -0.1629 10.6461 14.783 0.0366 -3.222 149.80 1.1640 -4.327 0.0000 2.708 7.744 -1.4100 -1.996 0.0566 -3.877 0.2358 205.20 48.3754 150.91 0.2301 0.0001 0.9999 1.813 10.640 0.0011 -1.5710 -2.141 0.0421 -3.687 24.552 142.01 83.28 -1.170 0.9195 0.0805 0.1984 -1.109 0.3083 -1.1180 0.0011 -1.109 0.0277 -1.109 1.0000 6.2260 -18990 -8.808 0.9751 -1.468 -3.895 9.304e-01 -2.389e-01 1.440e-02 5.749e-04
ENSG00000203972 ENSG00000203972 ENST00000545705 10 1 glycine-N-acyltransferase like 3 [Source:HGNC Symbol;Acc:HGNC:21349] protein_coding 468 6 + 49499923 49528078 GLYATL3 GLYATL3 ENSG00000203972.1 667.5 -3.493 0.1675 -4.718 0.0498 -2.629 0.3287 -6.257 0.7136 -4.599 0.0363 -2.962 0.1916 -3.352 1 -5.7280 -3.3770 0.5218 6.712 0.0023 -2.351 27.99 1.5560 -3.032 0.0024 -1.881 2.837 -4.7950 -2.016 0.0544 -3.941 0.0131 42.18 0.5417 27.77 0.0138 0.7136 0.2864 -0.443 10.870 0.0010 -4.5450 -2.444 0.0218 -3.577 2.858 29.19 16.02 -1.461 0.9575 0.0425 0.1465 -1.175 0.3346 -1.1250 0.0361 -1.175 0.0247 -1.175 0.2995 1.5840 -9924 -6.426 0.9751 -1.589 -3.901 8.898e-02 -2.281e-02 8.413e-03 1.355e-04
ENSG00000196526 ENSG00000196526 ENST00000358461 10 6 actin filament associated protein 1 [Source:HGNC Symbol;Acc:HGNC:24017] protein_coding 2193 4 - 7758714 7939926 AFAP1 AFAP1 ENSG00000196526.6 1911 -2.168 0.4335 -3.294 0.0252 -2.375 0.3793 -3.967 0.0000 -3.293 0.0574 -2.538 0.2974 -3.791 1 0.5888 2.6700 2.1638 11.578 0.0405 -2.081 982.20 0.9856 -3.342 0.0008 6.891 10.185 1.7240 -1.864 0.0739 -4.410 0.0640 1485.49 95.0002 1004.17 0.0623 0.0000 1.0000 4.492 9.604 0.0019 1.8040 -1.999 0.0565 -4.138 128.406 1777.17 952.79 -2.550 0.9896 0.0104 0.3036 -1.062 0.3570 -1.0610 0.0280 -1.062 0.0576 -1.062 1.0000 6.2270 -3031 -5.688 0.9751 -1.692 -2.956 1.507e-01 -5.097e-02 1.976e-02 1.013e-03
ENSG00000175592 ENSG00000175592 ENST00000312562 9 7 FOS like 1, AP-1 transcription factor subunit [Source:HGNC Symbol;Acc:HGNC:13718] protein_coding 816 11 - 65892049 65900573 FOSL1 FOSL1 ENSG00000175592.7 496 -2.045 0.4818 -3.097 0.0000 -2.039 0.1876 -2.017 0.9748 -3.081 0.0000 -2.221 0.1738 -1.557 1 0.1522 1.8020 3.6253 4.212 0.0561 -1.650 267.80 0.5692 -5.441 0.0000 5.087 8.184 1.1760 -2.584 0.0158 -3.011 0.2471 363.07 89.7162 268.44 0.2384 0.9748 0.0252 2.640 30.640 0.0000 1.0720 -2.528 0.0181 -3.082 93.436 274.89 184.16 -1.214 0.9249 0.0751 0.1335 -1.187 0.1854 -1.2500 0.0000 -1.187 0.0000 -1.187 0.0310 -0.1156 -2206 -4.510 0.9751 -1.485 -2.752 2.007e-01 -7.295e-02 6.027e-03 1.090e-04
ENSG00000122877 ENSG00000122877 ENST00000637191 16 1 early growth response 2 [Source:HGNC Symbol;Acc:HGNC:3239] protein_coding 418 10 - 62811996 62919900 EGR2 EGR2 ENSG00000122877.1 1140.25 -1.878 0.5187 -2.789 0.0117 -1.437 0.4815 -2.589 0.8231 -2.779 0.0110 -2.011 0.2749 -1.495 1 -1.2330 -0.0550 0.8328 5.118 0.0731 -1.178 96.53 0.7679 -3.632 0.0003 3.932 6.721 -0.6343 -1.561 0.1308 -4.503 0.1663 136.07 22.6139 96.80 0.1592 0.8231 0.1769 1.188 14.120 0.0002 -0.7511 -2.078 0.0480 -3.767 29.831 84.08 56.96 -1.112 0.9087 0.0913 0.2254 -1.090 0.4497 -0.8758 0.0088 -1.090 0.0073 -1.090 0.1454 0.8716 -1025 -3.219 0.9751 -1.434 -2.514 2.515e-01 -1.000e-01 1.616e-02 7.617e-04

Repeat with batch in the model.

t_cf_eosinophil_de_batchvisit <- all_pairwise(t_eosinophils, model_batch = TRUE,
                                              filter = TRUE,
                                              methods = methods)
## 
##    tumaco_cure tumaco_failure 
##             17              9 
## 
## 3 2 1 
## 9 9 8
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_de_batchvisit
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8961
## basic_vs_dream      0.9176
## basic_vs_ebseq      0.8636
## basic_vs_edger      0.8977
## basic_vs_limma      0.9676
## basic_vs_noiseq     0.9094
## deseq_vs_dream      0.8493
## deseq_vs_ebseq      0.9519
## deseq_vs_edger      0.9998
## deseq_vs_limma      0.8678
## deseq_vs_noiseq     0.9024
## dream_vs_ebseq      0.8133
## dream_vs_edger      0.8517
## dream_vs_limma      0.9469
## dream_vs_noiseq     0.9304
## ebseq_vs_edger      0.9559
## ebseq_vs_limma      0.8328
## ebseq_vs_noiseq     0.8986
## edger_vs_limma      0.8696
## edger_vs_noiseq     0.9056
## limma_vs_noiseq     0.8816
t_cf_eosinophil_table_batchvisit <- combine_de_tables(
  t_cf_eosinophil_de_batchvisit, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_table_batchvisit-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_table_batchvisit
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          99            35         103
##   edger_sigdown limma_sigup limma_sigdown
## 1            24          35            15
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_sig_batchvisit <- extract_significant_genes(
  t_cf_eosinophil_table_batchvisit,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_cf_sig_batchvisit-v{ver}.xlsx"))
t_cf_eosinophil_sig_batchvisit
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome       35         15      103         24       99         35        7
##         ebseq_down basic_up basic_down
## outcome         33        0          0

dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]])
## [1] 99 84
dim(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]])
## [1] 35 84
knitr::kable(head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["ups"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000165949 ENSG00000165949 ENST00000611954 12 4 interferon alpha inducible protein 27 [Source:HGNC Symbol;Acc:HGNC:5397] protein_coding 180 14 + 94104836 94116698 IFI27 IFI27 ENSG00000165949.4 287.454545454545 3.323 0.2504 5.668 0e+00 4.793 0.0485 5.470 0.8279 5.624 0e+00 4.397 0.0591 3.963 1.000 -0.1291 -3.3160 7.379 1.696 0.0077 3.186 54.63 0.8188 6.923 0 7.505 1.8370 -2.7330 4.106 0.0003 -0.8668 44.34 3.243 144.20 52.03 35.61 0.8279 0.1721 0.4882 47.95 0 -2.6550 3.858 0.0007 -1.1070 118.68 7.612 63.15 1.930 0.9823 0.0177 0.0453 -1.3740 0.0350 -1.422 0e+00 -1.3740 0e+00 -1.3740 0.1420 0.8403 1 8.712 0.9751 -1.668 5.230 0.000e+00 0.000e+00 2.264e-04 1.537e-07
ENSG00000187569 ENSG00000187569 ENST00000345088 3 3 developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199] protein_coding 480 12 + 7711433 7717559 DPPA3 DPPA3 ENSG00000187569.3 480 3.662 0.1989 5.537 4e-04 4.906 0.0485 4.587 0.0605 5.386 1e-04 4.404 0.0351 3.622 1.000 -0.6872 -4.3010 7.263 2.839 0.0035 3.614 23.10 1.1660 4.747 0 5.915 0.3782 -3.6360 4.124 0.0003 -1.1500 24.04 2.470 59.59 22.24 21.89 0.0605 0.9395 -0.6406 25.31 0 -3.3720 4.263 0.0002 -0.7244 56.07 4.555 30.31 2.207 0.9935 0.0065 0.0300 -1.3750 0.0334 -1.422 3e-04 -1.3750 1e-04 -1.3750 0.7668 5.8330 1 9.882 0.9751 -1.704 5.088 3.906e-02 7.677e-03 7.965e-05 1.843e-08
ENSG00000136235 ENSG00000136235 ENST00000479625 16 1 glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462] protein_coding undefined 7 + 23235967 23275108 GPNMB GPNMB ENSG00000136235.1 1447.5 2.102 0.4987 5.426 2e-04 3.031 0.3933 5.629 0.8546 5.360 0e+00 2.104 0.5947 4.475 1.000 -1.1190 -3.6950 12.101 2.665 0.0621 2.576 53.03 1.0740 5.053 0 7.515 2.0897 -3.2100 2.161 0.0400 -3.6970 49.49 2.881 143.07 51.41 39.92 0.8546 0.1454 0.4259 29.50 0 -3.2370 1.413 0.1695 -4.4990 125.19 5.631 65.41 2.264 0.9978 0.0022 0.5334 -0.8155 0.2802 -1.291 1e-04 -0.8155 0e+00 -0.8155 0.1253 0.6664 1 7.044 0.6634 -1.718 4.060 2.227e+00 5.485e-01 5.650e-02 9.577e-03
ENSG00000089127 ENSG00000089127 ENST00000540589 13 2 2’-5’-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086] protein_coding 68 12 + 112906783 112933222 OAS1 OAS1 ENSG00000089127.2 682.8 3.284 0.2669 4.820 0e+00 4.229 0.1407 4.691 0.0845 4.830 0e+00 3.978 0.1341 4.036 1.000 1.9510 -1.3010 8.237 1.116 0.0092 3.252 184.60 0.7144 6.746 0 8.840 4.0197 -0.5641 3.205 0.0035 -1.8560 25.83 18.535 479.09 177.96 23.95 0.0845 0.9155 2.1430 56.76 0 -0.4596 3.160 0.0040 -1.9060 410.17 25.003 217.58 2.621 0.9894 0.0106 0.1062 -1.3630 0.1038 -1.411 0e+00 -1.3630 0e+00 -1.3630 0.7325 5.6770 1 8.893 0.9751 -1.691 4.652 5.441e-02 1.170e-02 1.328e-03 5.293e-06
ENSG00000111335 ENSG00000111335 ENST00000551603 12 1 2’-5’-oligoadenylate synthetase 2 [Source:HGNC Symbol;Acc:HGNC:8087] protein_coding 183 12 + 112978395 113011723 OAS2 OAS2 ENSG00000111335.1 1319.5 2.537 0.3953 4.447 0e+00 3.840 0.1974 4.238 0.4415 4.461 0e+00 3.601 0.2035 3.944 0.943 4.0180 0.8398 12.517 3.025 0.0293 3.178 1028.00 0.8157 5.451 0 11.444 6.9971 1.6630 2.826 0.0089 -2.5810 18.87 137.565 2596.53 988.75 17.44 0.4415 0.5585 4.5830 38.11 0 1.7970 2.779 0.0100 -2.7210 2117.85 137.605 1127.73 2.734 0.9999 0.0001 0.2023 -1.3430 0.1659 -1.393 0e+00 -1.3430 0e+00 -1.3430 0.5743 3.3540 1 8.690 0.0315 -1.725 4.110 6.269e-02 1.525e-02 3.340e-03 3.347e-05
ENSG00000137959 ENSG00000137959 ENST00000450498 16 1 interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817] protein_coding 699 1 + 78619922 78646145 IFI44L IFI44L ENSG00000137959.1 783.333333333333 3.909 0.1645 4.200 0e+00 4.025 0.0544 4.022 0.7568 4.213 0e+00 3.902 0.0422 4.334 0.000 5.5560 1.7930 6.584 3.318 0.0020 3.763 1932.00 0.7590 5.534 0 12.253 8.0528 2.8960 3.985 0.0005 -0.0497 16.25 295.965 4808.31 1857.93 14.90 0.7568 0.2432 5.4890 40.52 0 3.0090 4.118 0.0003 0.2407 5616.14 278.525 2947.33 3.056 1.0000 0.0000 0.0448 -1.3750 0.0666 -1.421 0e+00 -1.3750 0e+00 -1.3750 0.2506 1.3030 1 10.290 0.0000 -1.725 4.204 7.374e-02 1.754e-02 1.151e-04 3.976e-08
knitr::kable(head(t_cf_eosinophil_sig_batchvisit[["deseq"]][["downs"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000189430 ENSG00000189430 ENST00000338835 13 9 natural cytotoxicity triggering receptor 1 [Source:HGNC Symbol;Acc:HGNC:6731] protein_coding 864 19 + 54906148 54916140 NCR1 NCR1 ENSG00000189430.9 798.5 -3.624 0.1472 -5.820 0.0002 -2.5340 0.4295 -5.817 0.0000 -5.752 0.0019 -3.214 0.2894 -3.025 1 -4.4710 -1.0630 1.835 11.567 0.0014 -3.408 93.09 1.1550 -5.038 0e+00 1.4989 7.319 -2.587 -2.0530 0.0502 -3.8380 0.0177 144.85 2.560 95.59 0.0182 0.0000 1.0000 1.0570 19.19 0e+00 -2.621 -2.436 0.0220 -3.3850 5.676 46.20 25.94 -2.1077 0.9822 0.0178 0.1690 -1.303 0.2756 -1.2570 0.0001 -1.303 0.0012 -1.303 0.7824 6.227 -8796 -9.316 0.9751 -1.668 -5.110 1.583e+00 -3.098e-01 7.344e-03 1.615e-04
ENSG00000179344 ENSG00000179344 ENST00000399084 16 5 major histocompatibility complex, class II, DQ beta 1 [Source:HGNC Symbol;Acc:HGNC:4944] protein_coding 786 6 - 32659467 32668383 HLA-DQB1 HLA-DQB1 ENSG00000179344.5 645.5 -5.227 0.0515 -5.667 0.0000 -5.5390 0.0390 -4.507 0.0008 -5.653 0.0006 -5.936 0.0332 -5.899 0 0.0597 5.6120 5.925 7.993 0.0001 -5.552 4151.00 0.8879 -6.382 0e+00 8.4098 14.076 3.707 -4.3850 0.0002 0.9012 0.0440 6266.46 275.671 4192.72 0.0418 0.0008 0.9992 6.5750 21.99 0e+00 3.581 -4.357 0.0002 0.8177 138.566 8270.46 4204.51 -3.6877 1.0000 0.0000 0.0336 -1.376 0.0259 -1.4220 0.0000 -1.376 0.0007 -1.376 0.9901 6.221 -20730 -15.180 0.0000 -1.725 -5.464 1.039e-01 -1.902e-02 6.255e-05 1.123e-08
ENSG00000162669 ENSG00000162669 ENST00000427444 16 1 helicase for meiosis 1 [Source:HGNC Symbol;Acc:HGNC:20193] protein_coding 589 1 - 91260766 91404856 HFM1 HFM1 ENSG00000162669.1 1528.4 -3.444 0.1672 -4.617 0.0012 -1.8420 0.3845 -5.349 0.0008 -4.588 0.0054 -2.665 0.1784 -3.246 1 -3.0190 -0.0765 2.110 8.425 0.0021 -2.942 207.70 1.0200 -4.526 0e+00 4.2354 8.853 -1.469 -2.1860 0.0379 -3.6080 0.0245 332.37 8.142 220.14 0.0241 0.0008 0.9992 2.1480 16.51 0e+00 -1.450 -2.912 0.0073 -2.4740 10.706 101.59 56.15 -1.8950 0.9634 0.0366 0.1410 -1.352 0.2994 -1.2980 0.0012 -1.352 0.0031 -1.352 0.8276 6.222 -5395 -8.042 0.9751 -1.608 -3.922 2.638e-01 -6.727e-02 2.452e-03 1.764e-05
ENSG00000167634 ENSG00000167634 ENST00000328092 12 9 NLR family pyrin domain containing 7 [Source:HGNC Symbol;Acc:HGNC:22947] protein_coding 3030 19 - 54923509 54966312 NLRP7 NLRP7 ENSG00000167634.9 2077.44444444444 -2.682 0.3229 -4.061 0.0071 -0.9355 0.7928 -4.236 0.0168 -4.004 0.0317 -1.875 0.4968 -1.571 1 -4.1970 -2.2500 0.258 8.472 0.0153 -1.947 27.08 1.0170 -3.995 1e-04 0.8671 4.928 -3.312 -0.9051 0.3737 -4.7730 0.0531 41.32 2.183 27.77 0.0550 0.0168 0.9832 -0.7663 12.24 5e-04 -3.363 -1.721 0.0971 -4.2100 5.676 16.86 11.27 -1.1972 0.9233 0.0767 0.3157 -1.055 0.7221 -0.1889 0.0044 -1.055 0.0216 -1.055 0.8001 6.118 -940200 -5.322 0.9751 -1.481 -3.179 1.172e+00 -3.687e-01 3.255e-02 3.126e-03
ENSG00000196526 ENSG00000196526 ENST00000358461 10 6 actin filament associated protein 1 [Source:HGNC Symbol;Acc:HGNC:24017] protein_coding 2193 4 - 7758714 7939926 AFAP1 AFAP1 ENSG00000196526.6 1911 -2.168 0.4335 -3.879 0.0038 -2.0550 0.5347 -3.967 0.0000 -3.877 0.0088 -2.357 0.3886 -3.791 1 0.5888 2.6700 2.164 11.578 0.0405 -2.081 982.20 0.9252 -4.192 0e+00 6.7815 10.660 1.724 -1.7170 0.0978 -4.5770 0.0640 1485.49 95.000 1004.17 0.0623 0.0000 1.0000 4.4950 15.42 1e-04 1.804 -2.067 0.0489 -4.0740 128.406 1777.17 952.79 -2.5503 0.9896 0.0104 0.3539 -1.215 0.4110 -1.1000 0.0041 -1.215 0.0090 -1.215 1.0000 6.227 -3031 -5.688 0.9751 -1.692 -3.320 3.403e-01 -1.025e-01 1.633e-02 7.942e-04
ENSG00000277150 ENSG00000277150 ENST00000622749 1 1 coagulation factor VIII associated 3 [Source:HGNC Symbol;Acc:HGNC:31850] protein_coding 1116 X - 155456914 155458672 F8A3 F8A1 ENSG00000277150.1 1116 -3.020 0.2249 -3.788 0.0153 -0.8834 0.7374 -4.133 0.1777 -3.704 0.0589 -1.712 0.3848 -1.821 1 -4.5610 -2.3170 1.574 6.415 0.0059 -2.244 21.72 1.0170 -3.724 2e-04 1.9848 5.772 -3.641 -1.0960 0.2830 -4.6810 0.0570 34.05 1.931 22.93 0.0606 0.1777 0.8223 -1.5770 10.78 1e-03 -3.506 -2.080 0.0476 -3.8470 5.524 19.52 12.52 -0.9781 0.8722 0.1278 0.2415 -1.219 0.6628 -0.4884 0.0102 -1.219 0.0375 -1.219 0.8350 5.070 -2709000 -6.134 0.9751 -1.317 -2.842 9.067e-01 -3.191e-01 1.628e-02 7.364e-04

Repeat with visit in the condition contrast.

visitcf_factor <- paste0("v", pData(t_eosinophils)[["visitnumber"]], "_",
                         pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)
## The numbers of samples by condition are:
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          5          3          6          3          6          3
t_cf_eosinophil_visits_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
                                              filter = TRUE,
                                              methods = methods)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          5          3          6          3          6          3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_visits_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cf_eosinophil_visits_table_sva <- combine_de_tables(
   t_cf_eosinophil_visits_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_visits_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure           9            11           2             3
## 2 v2_failure_vs_v2_cure           4             3           5             2
## 3 v3_failure_vs_v3_cure          14             7          17             2
##   limma_sigup limma_sigdown
## 1           0             1
## 2           0             0
## 3           0             0
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_visits_sig_sva <- extract_significant_genes(
  t_cf_eosinophil_visits_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_visits_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        0          1        2          3        9         11        4
## v2cf        0          0        5          2        4          3       11
## v3cf        0          0       17          2       14          7        3
##      ebseq_down basic_up basic_down
## v1cf         86        0          0
## v2cf         18        0          0
## v3cf         10        0          0

dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]])
## [1]  9 84
dim(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 11 84
knitr::kable(head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["ups"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000143416 ENSG00000143416 ENST00000443708 21 5 selenium binding protein 1 [Source:HGNC Symbol;Acc:HGNC:10719] protein_coding 372 1 - 151364304 151372707 SELENBP1 SELENBP1 ENSG00000143416.5 727.272727272727 0.6636 0.9946 21.040 0.0292 6.664 0.3331 13.6378 0.7073 6.667 0.8813 6.969 0.2738 4.3217 1 -2.9540 -5.0410 29.4269 0.4130 0.5742 2.0870 20.49 5.0190 4.192 0e+00 -0.4417 -21.477 -4.9810 2.606 0.0160 -3.5460 12746.087 0.00 127.45 47.79 249.156 0.7073 0.2927 -0.7976 2.956 0.0856 -4.6440 3.162 0.0045 -2.8790 53.22 2.661 27.94 1.1523 0.9511 0.0489 0.2738 -1.433 0.3524 -1.373 0.0226 -1.433 0.7929 -1.433 0.2363 0.5755 1 4.3750 0.4164 -1.1210 9.641 2.099e+01 2.178e+00 3.004e-02 2.319e-03
ENSG00000136732 ENSG00000136732 ENST00000459787 16 1 glycophorin C (Gerbich blood group) [Source:HGNC Symbol;Acc:HGNC:4704] protein_coding undefined 2 + 126656133 126696667 GYPC GYPC ENSG00000136732.1 347 1.2540 0.9946 5.298 0.0465 4.031 0.3123 4.0095 0.9653 5.290 0.0505 4.195 0.3253 4.0248 1 4.1470 1.8540 9.5905 0.7295 0.3274 2.2930 601.60 1.3500 3.925 1e-04 12.1164 6.818 2.1910 2.834 0.0095 -2.6190 16.105 198.13 3191.08 1320.49 12.632 0.9653 0.0347 3.8060 17.080 0.0000 2.2010 2.907 0.0082 -2.5120 3558.42 218.611 1888.51 1.4435 0.9490 0.0510 0.3253 -1.421 0.3208 -1.394 0.0418 -1.421 0.0558 -1.421 0.0524 -0.3585 1 4.8060 0.4164 -1.1150 5.055 4.846e-01 9.587e-02 2.758e-03 2.181e-05
ENSG00000136689 ENSG00000136689 ENST00000472292 18 1 interleukin 1 receptor antagonist [Source:HGNC Symbol;Acc:HGNC:6000] protein_coding undefined 2 + 113107214 113134016 IL1RN IL1RN ENSG00000136689.1 484.2 2.1950 0.9946 4.047 0.0292 3.608 0.2141 1.1638 0.6966 4.009 0.0602 3.902 0.2145 2.0500 1 -0.1165 -1.7740 0.6101 1.8351 0.0707 1.6570 43.99 0.9437 4.289 0e+00 6.7104 2.663 -0.8124 3.775 0.0010 -1.3260 2.240 19.86 44.50 29.10 2.161 0.6966 0.3034 0.0573 16.080 0.0001 -0.7526 4.085 0.0005 -0.7619 57.68 13.928 35.80 1.1572 0.9485 0.0515 0.2144 -1.446 0.1971 -1.422 0.0203 -1.446 0.0632 -1.446 0.2493 0.6145 1 3.4740 0.4164 -1.1130 4.002 1.351e-03 3.376e-04 1.883e-04 6.705e-08
ENSG00000169429 ENSG00000169429 ENST00000401931 11 1 C-X-C motif chemokine ligand 8 [Source:HGNC Symbol;Acc:HGNC:6025] protein_coding 288 4 + 73740541 73743716 CXCL8 CXCL8 ENSG00000169429.1 294 0.3227 0.9946 3.654 0.0465 4.434 0.2287 0.6389 0.9741 3.645 0.0448 4.721 0.2145 0.9802 1 0.8932 0.5284 3.2382 0.9915 0.7698 0.3648 281.80 0.9314 3.923 1e-04 8.5916 4.937 0.7983 3.595 0.0016 -1.1320 1.557 80.86 125.92 97.76 1.455 0.9741 0.0259 2.7020 18.170 0.0000 0.8090 3.852 0.0009 -0.6052 147.68 74.856 111.27 0.4793 0.9080 0.0920 0.2144 -1.445 0.2226 -1.420 0.0356 -1.445 0.0480 -1.445 0.0307 -0.3901 1 0.7655 0.4164 -0.9926 3.957 5.385e-01 1.361e-01 3.224e-04 2.176e-07
ENSG00000135862 ENSG00000135862 ENST00000258341 6 5 laminin subunit gamma 1 [Source:HGNC Symbol;Acc:HGNC:6492] protein_coding 4830 1 + 183023420 183145592 LAMC1 LAMC1 ENSG00000135862.5 2471.5 1.4800 0.9946 3.168 0.0488 2.838 0.2799 0.8170 0.8664 3.157 0.1489 2.875 0.2422 1.3755 1 0.7111 -0.8383 1.0858 3.6722 0.1895 1.5490 74.31 0.8152 3.886 1e-04 7.2209 4.053 0.0654 3.164 0.0044 -2.0790 1.762 47.92 84.43 61.61 1.690 0.8664 0.1336 0.7674 12.500 0.0004 0.1736 3.369 0.0028 -1.6790 95.47 36.794 66.13 0.6791 0.9196 0.0804 0.2421 -1.439 0.2733 -1.410 0.0356 -1.439 0.1218 -1.439 0.1172 -0.0003 1 3.2470 0.4164 -1.0270 3.075 2.285e-04 7.430e-05 1.089e-03 2.115e-06
ENSG00000105889 ENSG00000105889 ENST00000424363 15 5 STEAP family member 1B [Source:HGNC Symbol;Acc:HGNC:41907] protein_coding 762 7 - 22419444 22727613 STEAP1B STEAP1B ENSG00000105889.5 794.75 2.0340 0.9946 1.952 0.0339 1.536 0.2141 0.6165 0.9309 1.961 0.1193 1.736 0.1766 0.9709 1 1.9180 1.0580 0.2279 0.5140 0.0900 0.8599 98.64 0.4730 4.127 0e+00 7.9468 5.995 1.0130 3.858 0.0008 -0.5549 1.533 107.00 164.06 128.40 1.513 0.9309 0.0691 1.1870 13.590 0.0002 1.0240 4.730 0.0001 1.2500 232.04 118.381 175.21 0.5638 0.9145 0.0855 0.1766 -1.447 0.1971 -1.422 0.0217 -1.447 0.1035 -1.447 0.0648 -0.2337 1 1.8030 0.4164 -1.0120 1.868 7.500e-03 4.016e-03 1.216e-04 9.417e-09
knitr::kable(head(t_cf_eosinophil_visits_sig_sva[["deseq"]][["downs"]][[1]]))
ensembl_gene_id ensembl_transcript_id version transcript_version description gene_biotype cds_length chromosome_name strand start_position end_position hgnc_symbol uniprot_gn_symbol transcript mean_cds_len basic_logfc basic_adjp deseq_logfc deseq_adjp dream_logfc dream_adjp ebseq_logfc ebseq_adjp edger_logfc edger_adjp limma_logfc limma_adjp noiseq_logfc noiseq_adjp basic_num basic_den basic_numvar basic_denvar basic_t basic_p deseq_basemean deseq_lfcse deseq_stat deseq_p deseq_num deseq_den dream_ave dream_t dream_p dream_b ebseq_fc ebseq_c1mean ebseq_c2mean ebseq_mean ebseq_postfc ebseq_ppee ebseq_ppde edger_logcpm edger_lr edger_p limma_ave limma_t limma_p limma_b noiseq_num noiseq_den noiseq_mean noiseq_theta noiseq_prob noiseq_p limma_adjp_ihw limma_p_zstd dream_adjp_ihw dream_p_zstd deseq_adjp_ihw deseq_p_zstd edger_adjp_ihw edger_p_zstd ebseq_adjp_ihw ebseq_p_zstd basic_adjp_ihw basic_p_zstd noiseq_adjp_ihw noiseq_p_zstd lfc_meta lfc_var lfc_varbymed p_meta p_var
ENSG00000129295 ENSG00000129295 ENST00000522789 9 5 leucine rich repeat containing 6 [Source:HGNC Symbol;Acc:HGNC:16725] protein_coding 621 8 - 132570416 132675592 LRRC6 LRRC6 ENSG00000129295.5 998.375 -2.3670 0.9946 -4.504 0.0210 -4.670 0.2141 -2.7493 0.0409 -4.506 0.0448 -4.525 0.2264 -2.4407 1 0.8385 3.226 1.6048 2.4094 0.0628 -2.3870 357.7 0.9891 -4.554 0e+00 5.009 9.513 2.201 -3.738 0.0011 -0.7583 0.1487 629.8 93.66 428.8 0.1450 0.0409 0.9591 3.048 18.13 0e+00 2.194 -3.609 0.0015 -1.0380 122.7 666.2 394.4 -1.0714 0.9557 0.0443 0.2264 -1.443 0.1609 -1.421 0.0203 -1.443 0.0480 -1.443 0.9875 2.9880 -1964.000 -5.0020 0.4164 -1.1350 -4.671 3.654e-01 -7.823e-02 5.246e-04 7.855e-07
ENSG00000140090 ENSG00000140090 ENST00000526482 17 1 solute carrier family 24 member 4 [Source:HGNC Symbol;Acc:HGNC:10978] protein_coding undefined 14 + 92322581 92501483 SLC24A4 SLC24A4 ENSG00000140090.1 1388.6 -1.0630 0.9946 -3.452 0.0465 -3.501 0.2652 -1.2758 0.9609 -3.445 0.1115 -3.286 0.3003 -0.9902 1 0.4640 1.562 2.0840 1.8659 0.3460 -1.0980 194.0 0.8672 -3.980 1e-04 4.475 7.926 1.560 -3.286 0.0033 -1.7060 0.4130 183.4 75.75 143.0 0.3983 0.9609 0.0391 2.181 14.22 2e-04 1.560 -3.011 0.0064 -2.2390 103.0 204.6 153.8 -0.3861 0.8614 0.1386 0.3003 -1.427 0.2516 -1.414 0.0307 -1.427 0.1035 -1.427 0.0495 -0.3425 -172.600 -2.3000 0.4472 -0.8542 -3.447 4.091e-02 -1.187e-02 2.213e-03 1.320e-05
ENSG00000120049 ENSG00000120049 ENST00000343195 19 8 potassium voltage-gated channel interacting protein 2 [Source:HGNC Symbol;Acc:HGNC:15522] protein_coding 663 10 - 101825974 101843920 KCNIP2 KCNIP2 ENSG00000120049.8 679.2 -1.3150 0.9946 -1.956 0.0001 -2.216 0.0314 -1.1698 0.9280 -1.957 0.0068 -2.038 0.0446 -0.8999 1 2.9440 3.887 1.1125 0.7187 0.2664 -0.9432 629.3 0.3415 -5.729 0e+00 7.752 9.708 3.696 -6.173 0.0000 4.5490 0.4445 814.0 361.80 644.4 0.4356 0.9280 0.0720 3.861 24.77 0e+00 3.710 -6.053 0.0000 4.3070 469.4 875.8 672.6 -0.5744 0.9304 0.0696 0.0446 -1.448 0.0314 -1.425 0.0001 -1.448 0.0076 -1.448 0.0915 -0.2231 -16.900 -1.9760 0.4164 -1.0590 -2.001 8.748e-03 -4.372e-03 1.629e-06 5.185e-12
ENSG00000089335 ENSG00000089335 ENST00000502743 21 5 zinc finger protein 302 [Source:HGNC Symbol;Acc:HGNC:13848] protein_coding 360 19 + 34677639 34686397 ZNF302 ZNF302 ENSG00000089335.5 803.25 -1.5630 0.9946 -1.763 0.0465 -2.202 0.1999 -1.0398 0.5318 -1.763 0.1323 -1.975 0.1762 -0.7208 1 1.9470 2.847 0.7883 0.3437 0.2137 -0.8999 294.4 0.4451 -3.962 1e-04 6.794 8.557 2.642 -4.901 0.0001 1.7480 0.4864 365.2 177.62 294.8 0.4825 0.5318 0.4682 2.775 13.21 3e-04 2.644 -5.118 0.0000 2.2240 253.4 417.7 335.6 -0.4143 0.8741 0.1259 0.1762 -1.448 0.1609 -1.425 0.0287 -1.448 0.1178 -1.448 0.4589 1.2110 -15.230 -1.8850 0.4472 -0.8918 -1.860 2.803e-02 -1.507e-02 1.306e-04 1.664e-08
ENSG00000169330 ENSG00000169330 ENST00000305428 9 8 membrane integral NOTCH2 associated receptor 1 [Source:HGNC Symbol;Acc:HGNC:29172] protein_coding 2751 15 + 79432336 79472304 MINAR1 MINAR1 ENSG00000169330.8 2658 -0.7188 0.9946 -1.621 0.0292 -1.859 0.2141 -0.7262 0.9063 -1.622 0.0505 -1.718 0.2145 -0.3545 1 4.3750 4.695 0.1995 0.6554 0.4993 -0.3195 1482.0 0.3843 -4.218 0e+00 8.983 10.604 4.913 -4.336 0.0003 0.5512 0.6045 1439.1 869.93 1225.7 0.5975 0.9063 0.0937 5.098 17.09 0e+00 4.935 -4.533 0.0002 0.9596 1215.2 1553.7 1384.4 -0.3327 0.8341 0.1659 0.2144 -1.447 1.0000 -1.424 0.0353 -1.447 1.0000 -1.447 0.1190 -0.1447 -1.123 -0.6686 0.4663 -0.7729 -1.659 8.008e-04 -4.828e-04 7.427e-05 5.882e-09
ENSG00000282246 ENSG00000282246 ENST00000596044 1 5 novel protein protein_coding 57 10 + 13610047 13655929 ENSG00000282246.5 257 -0.7066 0.9946 -1.575 0.0465 -1.673 0.2141 -0.7959 0.9518 -1.575 0.1115 -1.490 0.2145 -0.3864 1 3.2860 3.645 0.2675 0.8440 0.5063 -0.3589 556.1 0.4018 -3.921 1e-04 7.883 9.458 3.543 -3.907 0.0007 -0.3918 0.5760 711.5 409.82 598.4 0.5635 0.9518 0.0482 3.688 14.19 2e-04 3.546 -3.824 0.0009 -0.6119 590.2 771.4 680.8 -0.2528 0.7316 0.2684 0.2144 -1.445 0.1971 -1.423 0.0647 -1.445 0.1140 -1.445 0.0630 -0.3095 -2.869 -0.7512 0.5820 -0.4678 -1.556 3.292e-03 -2.115e-03 3.915e-04 2.120e-07

10 Compare to Visit explicitly in the model

As a reminder, there are a few genes of particular interest:

expected_genes <- c("IFI44L", "IFI27", "PRR5", "PRR5-ARHGAP8", "RHCE",
                    "FBXO39", "RSAD2", "SMTNL1", "USP18", "AFAP1")
annot <- fData(t_monocytes)
wanted_idx <- annot[["hgnc_symbol"]] %in% expected_genes
expected_ensg <- rownames(annot)[wanted_idx]

10.1 Monocytes

Either above or below this section I have a nearly identical block which seeks to demonstrate the similarities/difference observed between my preferred/simplified model vs. a more explicitly correct and complex model. If the trend holds from what we observed with the eosinophils and neutrophils, I would expect to see that the results are marginally ‘better’ (as defined by the strength of the perceived interleukin response and raw number of ‘significant’ genes); but I remain worried that this will prove a more brittle and error-prone analysis.

10.1.1 Filter the data and perform svaseq

Start out by extracting the perceived svs via svaseq on the filtered input.

## The original pairwise invocation with sva:
##t_cf_monocyte_de_sva <- all_pairwise(t_monocyte, model_batch = "svaseq",
##                                     filter = TRUE, parallel = FALSE,
##                                     methods = methods)
test_monocytes <- normalize_expt(t_monocytes, filter = "simple")
## Removing 0 low-count genes (10862 remaining).
test_mono_design <- pData(test_monocytes)
test_formula <- as.formula("~ finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_mono_design)
null_formula <- as.formula("~ visitnumber")
null_model <- model.matrix(null_formula, data = test_mono_design)

linear_mtrx <- exprs(test_monocytes)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates
## [1] 2
surrogate_result <- sva::svaseq(
  dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)
## Number of significant surrogate variables is:  2 
## Iteration (out of 5 ):1  2  3  4  5
model_adjust <- as.matrix(surrogate_result[["sv"]])

10.1.2 Add the svs to the data model and create a new DESeq2 dataset

We can now create a new DESeq2 dataset which takes these putative surrogates into account.

colnames(model_adjust) <- paste0("SV", seq_len(chosen_surrogates))
rownames(model_adjust) <- rownames(pData(test_monocytes))
addition_string <- ""
for (sv in colnames(model_adjust)) {
  addition_string <- paste0(addition_string, " + ", sv)
}
longer_model <- as.formula(glue("~ finaloutcome + visitnumber{addition_string}"))
mono_design_svs <- cbind(test_mono_design, model_adjust)

summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
                                             colData = mono_design_svs,
                                             design = longer_model)
## converting counts to integer mode

10.1.3 Run DESeq and compare the results to our previous invocation

In order to compare these and the previous results, I tend to rely on simple correlations and aucc plots. I have been reading the modelr code recently and it looks like there is a suite of other metrics which might be more appropriate.

deseq_run <- DESeq2::DESeq(summarized)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
                                             contrast = c("finaloutcome", "failure", "cure"),
                                             format = "DataFrame"))

big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL

cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 1075, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9952 0.9955
## sample estimates:
##    cor 
## 0.9953
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")],
                                     add_cor = TRUE, add_rsq = TRUE, identity = TRUE,
                                     add_equation = TRUE)
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("DESeq2 log2FC: Visit explicitly in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_monocyte_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

cor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")
## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value
## 
##  Spearman's rank correlation rho
## 
## data:  merged[["padj"]] and merged[["deseq_adjp"]]
## S = 1.3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.9938
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
  xlab("DESeq2 adjp: Visit explicitly in model") +
  ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()
## png 
##   2
adjp_plot

previous_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
  abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10802      60
previous_genes <- rownames(big_table)[previous_sig_idx]

new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
  deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
## A Venn object on 2 sets named
## previous,new 
## 00 10 01 11 
##  0  7 57 53
test_new <- simple_gprofiler(new_genes)
test_new
## A set of ontologies produced by gprofiler using 110
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 9 MF
## 147 BP
## 0 KEGG
## 0 REAC
## 0 WP
## 2 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old
## A set of ontologies produced by gprofiler using 60
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 0 MF
## 44 BP
## 3 KEGG
## 0 REAC
## 2 WP
## 0 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_monocytes), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_new.xlsx")
## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 10865 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/monocyte_visit_in_model_sva_cf_old.xlsx")
## write_xlsx() wrote excel/monocyte_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10865 column: 101.

Are the expected Ensembl gene IDs found in this new set?

sum(new_genes %in% expected_ensg)
## [1] 10

10.2 Eosinophils

We wish to ensure that my model simplification did not do anything incorrect to the data for all three cell types, I already did this for the neutrophils, let us repeat for the eosinophils. I am therefore (mostly) copy/pasting the neutrophil section here.

## The original pairwise invocation with sva:
#t_cf_eosinophil_de_sva <- all_pairwise(t_eosinophils, model_batch = "svaseq",
#                                       filter = TRUE, parallel=FALSE, methods = methods)
test_eosinophils <- normalize_expt(t_eosinophils, filter = "simple")
## Removing 2652 low-count genes (17300 remaining).
test_eo_design <- pData(test_eosinophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_eo_design)
null_formula <- as.formula("~ 0 + visitnumber")
null_model <- model.matrix(null_formula, data = test_eo_design)

linear_mtrx <- exprs(test_eosinophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates
## [1] 3
surrogate_result <- sva::svaseq(
  dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)
## Number of significant surrogate variables is:  3 
## Iteration (out of 5 ):1  2  3  4  5
model_adjust <- as.matrix(surrogate_result[["sv"]])

colnames(model_adjust) <- c("SV1", "SV2", "SV3")
rownames(model_adjust) <- rownames(pData(test_eosinophils))
longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3")
eo_design_svs <- cbind(test_eo_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
                                             colData = eo_design_svs,
                                             design = longer_model)
## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
                                             contrast = c("finaloutcome", "failure", "cure"),
                                             format = "DataFrame"))

big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL

cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 228, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9084 0.9149
## sample estimates:
##    cor 
## 0.9117
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("DESeq2 log2FC: Visit explicitly in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_eosinophil_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

cor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")
## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value
## 
##  Spearman's rank correlation rho
## 
## data:  merged[["padj"]] and merged[["deseq_adjp"]]
## S = 3.5e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.8214
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
  xlab("DESeq2 adjp: Visit explicitly in model") +
  ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_eosinophil_adjp.svg")
adjp_plot
dev.off()
## png 
##   2
adjp_plot

previous_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
  abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10416     116
previous_genes <- rownames(big_table)[previous_sig_idx]

new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
  deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
## A Venn object on 2 sets named
## previous,new 
##  00  10  01  11 
##   0  38 193  78
test_new <- simple_gprofiler(new_genes)
test_new
## A set of ontologies produced by gprofiler using 271
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 11 MF
## 186 BP
## 4 KEGG
## 6 REAC
## 5 WP
## 7 TF
## 0 MIRNA
## 1 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old
## A set of ontologies produced by gprofiler using 116
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 26 MF
## 112 BP
## 3 KEGG
## 7 REAC
## 4 WP
## 72 TF
## 1 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_eosinophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_new.xlsx")
## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17303 column: 23.
old_annotated <- merge(fData(t_eosinophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/eosinophil_visit_in_model_sva_cf_old.xlsx")
## write_xlsx() wrote excel/eosinophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 10535 column: 101.

Check our genes of particular interest

sum(new_genes %in% expected_ensg)
## [1] 5

Not quite as similar as the monocyte data.

10.3 Neutrophils

## The original pairwise invocation with sva:
## t_cf_neutrophil_de_sva <- all_pairwise(t_neutrophils, model_batch = "svaseq",
##                                        parallel = parallel, filter = TRUE,
##                                        methods = methods)
test_neutrophils <- normalize_expt(t_neutrophils, filter = "simple")
## Removing 2652 low-count genes (17300 remaining).
test_neut_design <- pData(test_neutrophils)
test_formula <- as.formula("~ 0 + finaloutcome + visitnumber")
test_model <- model.matrix(test_formula, data = test_neut_design)
## Note to self: double-check that the following line is correct.
null_formula <- as.formula("~ 0 + visitnumber")
## null_model <- test_model[, c(1, 2)]
null_model <- model.matrix(null_formula, data = test_neut_design)

linear_mtrx <- exprs(test_neutrophils)
l2_mtrx <- log2(linear_mtrx + 1)
chosen_surrogates <- sva::num.sv(dat = l2_mtrx, mod = test_model)
chosen_surrogates
## [1] 4
surrogate_result <- sva::svaseq(
  dat = linear_mtrx, n.sv = chosen_surrogates, mod = test_model, mod0 = null_model)
## Number of significant surrogate variables is:  4 
## Iteration (out of 5 ):1  2  3  4  5
model_adjust <- as.matrix(surrogate_result[["sv"]])

## I don't think the following is actually required, but it is weird to just have this
## unnamed matrix hangingout.
## Set the columns to the SV#s
colnames(model_adjust) <- c("SV1", "SV2", "SV3", "SV4")
## Set the rows the sample IDs
rownames(model_adjust) <- rownames(pData(test_neutrophils))

longer_model <- as.formula("~ finaloutcome + visitnumber + SV1 + SV2 + SV3 + SV4")
neut_design_svs <- cbind(test_neut_design, model_adjust)
summarized <- DESeq2::DESeqDataSetFromMatrix(countData = linear_mtrx,
                                             colData = neut_design_svs,
                                             design = longer_model)
## converting counts to integer mode
deseq_run <- DESeq2::DESeq(summarized)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
deseq_table <- as.data.frame(DESeq2::results(object = deseq_run,
                                             contrast = c("finaloutcome", "failure", "cure"),
                                             format = "DataFrame"))

## We should be able to directly compare this to the the deseq columns from the above
## data structure named: t_cf_neutrophil_table_sva

big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
only_deseq <- big_table[, c("deseq_logfc", "deseq_adjp")]
merged <- merge(deseq_table, only_deseq, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL

cor_value <- cor.test(merged[["log2FoldChange"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["log2FoldChange"]] and merged[["deseq_logfc"]]
## t = 393, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9706 0.9729
## sample estimates:
##    cor 
## 0.9718
logfc_plotter <- plot_linear_scatter(merged[, c("log2FoldChange", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("DESeq2 log2FC: Visit explicitly in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_neutrophil_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

cor_value <- cor.test(merged[["padj"]], merged[["deseq_adjp"]], method = "spearman")
## Warning in cor.test.default(merged[["padj"]], merged[["deseq_adjp"]], method =
## "spearman"): Cannot compute exact p-value with ties
cor_value
## 
##  Spearman's rank correlation rho
## 
## data:  merged[["padj"]] and merged[["deseq_adjp"]]
## S = 1e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.9202
adjp_plotter <- plot_linear_scatter(merged[, c("padj", "deseq_adjp")])
adjp_plot <- adjp_plotter[["scatter"]] +
  xlab("DESeq2 adjp: Visit explicitly in model") +
  ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_neutrophil_adjp.svg")
adjp_plot
dev.off()
## png 
##   2
adjp_plot

previous_sig_idx <- big_table[["deseq_adjp"]] <= 0.05 &
  abs(big_table[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical    8971     130
previous_genes <- rownames(big_table)[previous_sig_idx]

new_sig_idx <- abs(deseq_table[["log2FoldChange"]]) >= 1.0 &
  deseq_table[["padj"]] < 0.05
new_genes <- rownames(deseq_table)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
## A Venn object on 2 sets named
## previous,new 
## 00 10 01 11 
##  0 51 92 79
test_new <- simple_gprofiler(new_genes)
test_new
## A set of ontologies produced by gprofiler using 171
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 1 MF
## 12 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 3 TF
## 2 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
test_old <- simple_gprofiler(previous_genes)
test_old
## A set of ontologies produced by gprofiler using 130
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 4 MF
## 67 BP
## 0 KEGG
## 5 REAC
## 2 WP
## 57 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
new_annotated <- merge(fData(t_neutrophils), deseq_table, by = "row.names")
rownames(new_annotated) <- new_annotated[["Row.names"]]
new_annotated[["Row.names"]] <- NULL
write_xlsx(data = new_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_new.xlsx")
## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_new.xlsx.
## The cursor is on sheet first, row: 17303 column: 23.
old_annotated <- merge(fData(t_neutrophils), big_table, by = "row.names")
rownames(old_annotated) <- old_annotated[["Row.names"]]
old_annotated[["Row.names"]] <- NULL
write_xlsx(data = old_annotated, excel = "excel/neutrophil_visit_in_model_sva_cf_old.xlsx")
## write_xlsx() wrote excel/neutrophil_visit_in_model_sva_cf_old.xlsx.
## The cursor is on sheet first, row: 9104 column: 101.

Once again, see how many of our favorite genes are here

sum(new_genes %in% expected_ensg)
## [1] 8

11 Mixed linear models

When the above work was reviewed for publication, one concern raised arose because we are not considering the variance of each person in the contrasts above and are potentially over-representing the significance/power of the results because the models we are using do not include the donor. My previous understanding was that it is sufficient to include visit in the model because that would result in a model matrix which separates samples from each person; but I am now reasonably certain this is incorrect.

Therefore, the previous couple of blocks I now think are not approaching this problem correctly. We spent some time talking with Neal and discussing the various models and methods we employed. He made a series of suggestions about ways which might prove more correct. It seems that a mixed linear model is the most appropriate method for this type of query. I think I can perform that with limma, via voom. Let us try and see what happens. After doing some reading, I think the most appropriate way to perform this is to use dream() from varianceParition, which is cool because I really like it.

As I write this, we are reasonably certain that a mixed linear model provides a statistically correct framework for representing our expression data as a function of finaloutcome, visit, and person, e.g:

exprs ~ finaloutcome + visit + (1|donor)

In our discussions surrounding the various ways to compare/contrast the various results with/out the mixed linear model; there were a few primary goals laid out by Maria Adelaida and Neal. The goal is to observe if/how well our previous analyses agree with results obtained using a mixed linear model. There are a couple of caveats:

  1. The mlm is not available for data in a negative binomial distribution. Ergo, DESeq2/EdgeR are out a priori. This is a little sad because we have generally relied upon DESeq2 results. However, I do routinely compare DESeq2 to voom->limma and am usually impressed at the degree of similarity.
  2. mlm analyses are significantly more computationally expensive. When I have played with them via variancePartition in the past I have run my very nice machine OoM on more than a few occasions. This is important, because I want to have everything in my container, but I cannot expect any else’s computer to have > 200G RAM. I can definitely lower the parallel processing requirements to save memory, but then these will take forever (well, probably a couple days to a week).

So, with that in mind, Maria Adelaida, Najib, and Neal focused on repeating a useful subset of the analyses using the mlm and comparing them to our extant results rather than re-implementing everything. The following are the things they suggested are the most important comparison points:

  1. Repeat this process, clean it up for: monocytes/neutrophils
  2. Compare the results when using models which are (note that this way of writing fixes the slope of each donor’s model but allows the intercept to change):
    1. ~ finaloutcome + visitnumber + (1|donor)
    2. ~ finaloutcome + visitnumber
    3. ~ finaloutcome + (1|donor)
  3. Compare the results from limma for a,b,c (really, they asked me to only focus on a,b; I wanted to compare c as well)
  4. Extract the set of ‘significant’ genes via logFC/pvalue for all of the above and see the shared/unique genes.

I have already written a skeleton function ‘dream_pairwise()’ as a sibling to my other *_pairwise() functions. I think that with some minor modifications (or maybe none at all, when I wrote it I was thinking about fun models that variancePartition supports) it can accept the mixed linear model of interest.

11.1 Using a mixed linear model with dream

In the following block, the mixed formula will get passed to dream. I set the code to use the first element (after the intercept) as the ‘condition’ factor. Thus if I had made the model ‘~ 0 + visitnumber + finaloutcome + (1|donor)’, it would compare visits.

The dream_pairwise() function is responsible for making sure the variancePartition replacement functions are used for things like voom, lmfit, ebayes, and toptable. Strangely, some of them will automatically fall back to limma’s functions if there is no random-effect in the model, but others will not. As a result, I have a check and invoke the appropriate functions explicitly in dream_pairwise().

mixed_fstring <- "~ 0 + finaloutcome + visitnumber + (1|donor)"
mixed_form <- as.formula(mixed_fstring)
get_formula_factors(mixed_form)
## $type
## [1] "cellmeans"
## 
## $interaction
## [1] FALSE
## 
## $mixed
## [1] TRUE
## 
## $mixers
## [1] "1"     "donor"
## 
## $cellmeans_intercept
## [1] "0"
## 
## $factors
## [1] "finaloutcome" "visitnumber"  "donor"       
## 
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_xlsx <- write_de_table(mixed_eosinophil_de, type = "limma",
                                           excel = glue("excel/mixed_eosinophil_table-v{ver}.xlsx"))

mixed_monocyte_de <- dream_pairwise(t_monocytes, alt_model = mixed_form)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_xlsx <- write_de_table(mixed_monocyte_de, type = "limma",
                                         excel = glue("excel/mixed_monocyte_table-v{ver}.xlsx"))

mixed_neutrophil_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_xlsx <- write_de_table(mixed_neutrophil_de, type = "limma",
                                           excel = glue("excel/mixed_neutrophil_table-v{ver}.xlsx"))

11.2 Using the same method without the mixed model

In other words, the following invocations will go much faster and likely be nearly (or completely) identical to the results from limma using the same model since the ‘mixed_fstring_fv’ does not have a random effect.

mixed_fstring_fv <- "~ 0 + finaloutcome + visitnumber"
mixed_form_fv <- as.formula(mixed_fstring_fv)
get_formula_factors(mixed_form_fv)
## $type
## [1] "cellmeans"
## 
## $interaction
## [1] FALSE
## 
## $mixed
## [1] FALSE
## 
## $cellmeans_intercept
## [1] "0"
## 
## $factors
## [1] "finaloutcome" "visitnumber" 
## 
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fv_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fv)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_eosinophil_de_nodonor_xlsx <- write_de_table(mixed_eosinophil_fv_de, type = "limma",
                                                   excel = glue("excel/mixed_eosinophil_nodonor_table-v{ver}.xlsx"))

mixed_monocyte_fv_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fv)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_de_nodonor_xlsx <- write_de_table(mixed_monocyte_fv_de, type = "limma",
                                                 excel = glue("excel/mixed_monocyte_nodonor_table-v{ver}.xlsx"))

mixed_neutrophil_fv_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fv)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_de_nodonor_xlsx <- write_de_table(mixed_neutrophil_fv_de, type = "limma",
                                                   excel = glue("excel/mixed_neutrophil_nodonor_table-v{ver}.xlsx"))

11.3 Comparing the results

There are a couple observations here which are important and/or troubling:

  1. Using the mlm results in no genes with a significant FDR adjusted p-value. This supports the hypothesis that we over-represented the significance of the data in our original analysis I think in a pretty compelling fashion.
  2. However, there is this interesting note from the dream documentation: “Since dream uses an estimated degrees of freedom value for each hypothesis test, the degrees of freedom is different for each gene here. Therefore, the t-statistics are not directly comparable since they have different degrees of freedom. In order to be able to compare test statistics, we report z.std which is the p-value transformed into a signed z-score. This can be used for downstream analysis.”
  3. I spent some time reading the R markdown documents at https://github.com/GabrielHoffman/dream_analysis.git which accompany the paper ((hoffmanDreamPowerfulDifferential2021?)) and found that there is only one instance in which they make use of adjusted p-values; at the very end of the iPSC data. In addition they only use the zstd metric when pulling gene sets for comparing against GO categories. In all other instances, the metric used for significance is the ‘raw’ p-value.

11.3.1 A little function to print overlaps

Najib asked if I would compare the set of overlapping genes observed with the various significance metrics provided. I think I should write a little function to do this because there are ample opportunities for typeos.

deseq_df <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
deseq_gene_idx <- abs(deseq_df[["deseq_logfc"]]) >= 1.0 &
  deseq_df[["deseq_adjp"]] <= 0.05
deseq_symb <- annot[deseq_gene_idx, "hgnc_symbol"]
deseq_symb
##   [1] "SYN1"            "TENM1"           "LTF"             "PHLDB1"         
##   [5] "SLAMF7"          "SCML1"           "BCAR1"           "TRIP13"         
##   [9] "SLC12A1"         "ADAMTS2"         "GP6"             "SIGLEC1"        
##  [13] "SIRPG"           "CHKB"            "IL2RB"           "CTSG"           
##  [17] "PLEK2"           "NTSR1"           "MSLN"            "FZD3"           
##  [21] "TULP2"           "HAS1"            "GSDME"           "PRUNE2"         
##  [25] "PALD1"           "UNC5B"           "CCL8"            "FOLR1"          
##  [29] "RAD51AP1"        "PRLR"            "OTOF"            "IL1R2"          
##  [33] "IL1R1"           "CD274"           "PRB2"            "MAPK8IP1"       
##  [37] "CXCR4"           "IL1B"            "LAMP5"           "MTUS1"          
##  [41] "IDO1"            "TMTC1"           "RSAD2"           "HRK"            
##  [45] "IL6"             "THBS1"           "IFI44L"          "ADAMTS10"       
##  [49] "GPR174"          "LCN2"            "TENM4"           "CD8A"           
##  [53] "PGM5"            "TBC1D24"         "TRIM58"          "HESX1"          
##  [57] "CAMP"            "SAP30"           "CFAP47"          "AQP3"           
##  [61] "HECTD2"          "IFI27"           "C15orf48"        "LAIR2"          
##  [65] "ANGPTL4"         "RAB3IL1"         "DDIT4"           "KIF5C"          
##  [69] "COL3A1"          "RNF150"          "HTRA3"           "S1PR1"          
##  [73] "LGALS4"          "OLR1"            "JUP"             "HOXB2"          
##  [77] "SH3PXD2B"        "FBXW8"           "FBXO39"          "HLA-DQB1"       
##  [81] "OR6C2"           "CSMD1"           "EFHC2"           "OLFML1"         
##  [85] "USP18"           "RGPD2"           "PRR5"            "RHCE"           
##  [89] "AKR1C3"          "AFAP1"           "MMP1"            "HLA-DQA1"       
##  [93] "SCAMP5"          "SUCNR1"          "OOEP"            "GLYATL3"        
##  [97] "HLA-DMA"         "C9orf129"        "NT5M"            "POU5F1B"        
## [101] "SMTNL1"          "HLA-DQB2"        "DEFA3"           "UPK3B"          
## [105] "PRR5-ARHGAP8"    "TRNP1"           "MGAM"            "RNASE4"         
## [109] "MRC1"            "LINC02210-CRHR1" "FCGBP"           "CCL3"
deseq_genes <- rownames(annot)[deseq_gene_idx]

overlap_sig <- function(mixed, deseq = deseq_genes, mixed_pcol = "P.Value",
                        annot = fData(t_monocytes), mixed_cutoff = 0.05, direction = "lt",
                        expected = expected_genes) {
  if (direction == "lt") {
    mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
      mixed[[mixed_pcol]] <= mixed_cutoff
  } else {
    mixed_sig_idx <- abs(mixed[["logFC"]]) >= 1.0 &
      mixed[[mixed_pcol]] >= mixed_cutoff
  }
  mixed_genes <- rownames(mixed)[mixed_sig_idx]
  venn_lst <- list(
    "mixed_model" = mixed_genes,
    "DESeq_sva" = deseq)
  mixed_deseq_comp <- Vennerable::Venn(venn_lst)
  Vennerable::plot(mixed_deseq_comp)
  mixed_ensg <- mixed_deseq_comp@IntersectionSets[["11"]]
  overlap_genes <- annot[mixed_ensg, "hgnc_symbol"]
  message("The set of all overlapping genes:")
  print(overlap_genes)
  found_idx <- expected %in% overlap_genes
  message("Overlapping genes in the 10 favorites:")
  print(expected[found_idx])
}

11.3.2 Monocytes

In this block I am looking at the similarities between the mixed model with donor and without donor (which is no longer a mixed model; it is just using the dream functions (which I am pretty sure just fall back to limma when there is not a random effect)).

monocyte_visit_with_donor <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]
monocyte_visit_without_donor <- mixed_monocyte_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(monocyte_visit_with_donor, monocyte_visit_without_donor,
                             px = "adj.P.Val", py = "adj.P.Val",
                             lx = "logFC", ly = "logFC")
donor_aucc
## These two tables have an aucc value of: 0.66601108014503 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 384, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9637 0.9663
## sample estimates:
##   cor 
## 0.965

with_donor_genes <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
  monocyte_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
  monocyte_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(monocyte_visit_with_donor)[with_donor_genes]
donor_z_idx <- abs(monocyte_visit_with_donor[["logFC"]]) >= 1.0 &
  monocyte_visit_with_donor[["z.std"]] >= 1.0
donor_z_genes <- rownames(monocyte_visit_with_donor)[donor_z_idx]

overlap_sig(monocyte_visit_with_donor)
## The set of all overlapping genes:
##  [1] "DEFA3"           "CTSG"            "HRK"             "TENM1"          
##  [5] "PLEK2"           "TMTC1"           "HLA-DQB2"        "IDO1"           
##  [9] "FBXO39"          "PRR5-ARHGAP8"    "TRIM58"          "CCL3"           
## [13] "IL6"             "LAMP5"           "CD274"           "NTSR1"          
## [17] "ANGPTL4"         "DDIT4"           "EFHC2"           "SH3PXD2B"       
## [21] "HECTD2"          "MAPK8IP1"        "SMTNL1"          "PGM5"           
## [25] "OLFML1"          "TBC1D24"         "ADAMTS10"        "LINC02210-CRHR1"
## [29] "FZD3"            "CHKB"            "CXCR4"           "GPR174"         
## [33] "PRLR"            "OOEP"            "NT5M"            "IL1R1"          
## [37] "CAMP"            "FBXW8"
## Overlapping genes in the 10 favorites:

## [1] "PRR5-ARHGAP8" "FBXO39"       "SMTNL1"
overlap_sig(monocyte_visit_with_donor,
            mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)
## The set of all overlapping genes:
##  [1] "POU5F1B"         "HRK"             "COL3A1"          "RGPD2"          
##  [5] "RNF150"          "HLA-DQB2"        "IDO1"            "FBXO39"         
##  [9] "PRR5-ARHGAP8"    "CCL3"            "SUCNR1"          "PRR5"           
## [13] "IL6"             "CD274"           "UPK3B"           "EFHC2"          
## [17] "CSMD1"           "HECTD2"          "MAPK8IP1"        "SMTNL1"         
## [21] "PGM5"            "OLFML1"          "TBC1D24"         "LINC02210-CRHR1"
## [25] "FZD3"            "CHKB"            "IFI44L"          "GPR174"         
## [29] "PRLR"            "OOEP"            "HLA-DMA"         "PRB2"           
## [33] "FBXW8"
## Overlapping genes in the 10 favorites:

## [1] "IFI44L"       "PRR5"         "PRR5-ARHGAP8" "FBXO39"       "SMTNL1"

I would have sworn that the 2.0 z-score set was much larger than the p-value set and included all of the 10 genes. Apparently I was very wrong.

11.3.3 Neutrophils

Now examine the various models for the neutrophil samples.

neutrophil_visit_with_donor <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]
neutrophil_visit_without_donor <- mixed_neutrophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(neutrophil_visit_with_donor, neutrophil_visit_without_donor,
                             px = "adj.P.Val", py = "adj.P.Val",
                             lx = "logFC", ly = "logFC")
donor_aucc
## These two tables have an aucc value of: 0.544934636423573 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 742, df = 19950, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9819 0.9828
## sample estimates:
##    cor 
## 0.9824

with_donor_genes <- abs(neutrophil_visit_with_donor[["logFC"]]) >= 1.0 &
  neutrophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(neutrophil_visit_without_donor[["logFC"]]) >= 1.0 &
  neutrophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(neutrophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(neutrophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
  "with_donor" = donor_genes,
  "with_visit" = visit_genes)
Vennerable::Venn(venn_lst)
## A Venn object on 2 sets named
## with_donor,with_visit 
##  00  10  01  11 
##   0   0  15 214
overlap_sig(neutrophil_visit_with_donor)
## The set of all overlapping genes:
##  [1] "PRR5-ARHGAP8" "TENM1"        "AFAP1"        "PRR5"         "HRK"         
##  [6] "IFI44L"       "CHKB"         "FBXW8"        "IDO1"         "SMTNL1"      
## [11] "POU5F1B"      "OLR1"         "AKR1C3"
## Overlapping genes in the 10 favorites:

## [1] "IFI44L"       "PRR5"         "PRR5-ARHGAP8" "SMTNL1"       "AFAP1"
overlap_sig(neutrophil_visit_with_donor,
            mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)
## The set of all overlapping genes:
##  [1] "PRR5-ARHGAP8" "PRR5"         "OTOF"         "SIGLEC1"      "HRK"         
##  [6] "IFI44L"       "USP18"        "JUP"          "RSAD2"        "CHKB"        
## [11] "FBXW8"        "FBXO39"       "TBC1D24"      "IDO1"         "SMTNL1"      
## [16] "POU5F1B"
## Overlapping genes in the 10 favorites:

## [1] "IFI44L"       "PRR5"         "PRR5-ARHGAP8" "FBXO39"       "RSAD2"       
## [6] "SMTNL1"       "USP18"

11.3.4 Eosinophils

Finally, compare for the eosinophil samples.

eosinophil_visit_with_donor <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]
eosinophil_visit_without_donor <- mixed_eosinophil_fv_de[["all_tables"]][["failure_vs_cure"]]
donor_aucc <- calculate_aucc(eosinophil_visit_with_donor, eosinophil_visit_without_donor,
                             px = "adj.P.Val", py = "adj.P.Val",
                             lx = "logFC", ly = "logFC")
donor_aucc
## These two tables have an aucc value of: 0.90324615179282 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 746, df = 19950, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.982 0.983
## sample estimates:
##    cor 
## 0.9825

with_donor_genes <- abs(eosinophil_visit_with_donor[["logFC"]]) >= 1.0 &
  eosinophil_visit_with_donor[["P.Value"]] <= 0.05
without_donor_genes <- abs(eosinophil_visit_without_donor[["logFC"]]) >= 1.0 &
  eosinophil_visit_with_donor[["P.Value"]] <= 0.05
donor_genes <- rownames(eosinophil_visit_with_donor)[with_donor_genes]
visit_genes <- rownames(eosinophil_visit_with_donor)[without_donor_genes]
venn_lst <- list(
  "with_donor" = donor_genes,
  "with_visit" = visit_genes)
Vennerable::Venn(venn_lst)
## A Venn object on 2 sets named
## with_donor,with_visit 
##   00   10   01   11 
##    0    0   26 3709
overlap_sig(eosinophil_visit_with_donor)
## The set of all overlapping genes:
##  [1] "HLA-DQB1" "IFI27"    "SIGLEC1"  "CFAP47"   "OOEP"     "SMTNL1"  
##  [7] "MGAM"     "RNASE4"   "CD274"    "MSLN"
## Overlapping genes in the 10 favorites:

## [1] "IFI27"  "SMTNL1"
overlap_sig(eosinophil_visit_with_donor,
            mixed_pcol = "z.std", direction = "gt", mixed_cutoff = 1.5)
## The set of all overlapping genes:
##  [1] "IFI27"        "PRR5-ARHGAP8" "IFI44L"       "SIGLEC1"      "OTOF"        
##  [6] "CFAP47"       "FBXO39"       "OOEP"         "SMTNL1"       "USP18"       
## [11] "RSAD2"        "CD274"        "HESX1"
## Overlapping genes in the 10 favorites:

## [1] "IFI44L"       "IFI27"        "PRR5-ARHGAP8" "FBXO39"       "RSAD2"       
## [6] "SMTNL1"       "USP18"

Compare back to deseq with SVA and with SVA+visit and see how they look with respect to the dream invocation without the random donor effect.

deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
                             px = "deseq_adjp", py = "P.Value",
                             lx = "deseq_logfc", ly = "logFC")
deseq_aucc
## These two tables have an aucc value of: 0.163279700348566 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 41, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3869 0.4223
## sample estimates:
##    cor 
## 0.4048

deseq_genes_idx <- abs(merged[["deseq_logfc"]]) >= 1.0 &
  merged[["deseq_adjp"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
  monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
  "with_donor" = deseq_genes,
  "with_visit" = visit_genes)
Vennerable::Venn(venn_lst)
## A Venn object on 2 sets named
## with_donor,with_visit 
##  00  10  01  11 
##   0 152  44   8

This time we are comparing back to the monocyte results which did not include the random donor effect.

deseq_aucc <- calculate_aucc(merged, monocyte_visit_without_donor,
                             px = "log2FoldChange", py = "padj",
                             lx = "adj.P.Val", ly = "logFC")
deseq_aucc
## These two tables have an aucc value of: 0.359950025998036 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = -32, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3471 -0.3093
## sample estimates:
##     cor 
## -0.3283

deseq_genes_idx <- abs(merged[["log2FoldChange"]]) >= 1.0 &
  merged[["padj"]] <= 0.05
without_donor_genes_idx <- abs(monocyte_visit_without_donor[["logFC"]]) >= 1.0 &
  monocyte_visit_with_donor[["P.Value"]] <= 0.05
deseq_genes <- rownames(merged)[deseq_genes_idx]
visit_genes <- rownames(monocyte_visit_with_donor)[without_donor_genes_idx]
venn_lst <- list(
  "with_donor" = deseq_genes,
  "with_visit" = visit_genes)
Vennerable::Venn(venn_lst)
## A Venn object on 2 sets named
## with_donor,with_visit 
##  00  10  01  11 
##   0 106  45   7

This is the orthologous approach: include a random effect for donor and ignore the visit effect.

mixed_fstring_fd <- "~ 0 + finaloutcome + (1|donor)"
mixed_form_fd <- as.formula(mixed_fstring_fd)
get_formula_factors(mixed_form_fd)
## $type
## [1] "cellmeans"
## 
## $interaction
## [1] FALSE
## 
## $mixed
## [1] TRUE
## 
## $mixers
## [1] "1"     "donor"
## 
## $cellmeans_intercept
## [1] "0"
## 
## $factors
## [1] "finaloutcome" "donor"       
## 
## $contrast
## [1] "finaloutcome"
mixed_eosinophil_fd_de <- dream_pairwise(t_eosinophils, alt_model = mixed_form_fd)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_monocyte_fd_de <- dream_pairwise(t_monocytes, alt_model = mixed_form_fd)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
mixed_neutrophil_fd_de <- dream_pairwise(t_neutrophils, alt_model = mixed_form_fd)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.

11.3.5 Compare monocytes

Now see how these results compare against our previous results…

monocyte_dream_result <- mixed_monocyte_de[["all_tables"]][["failure_vs_cure"]]

big_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, monocyte_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 184, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8655 0.8746
## sample estimates:
##    cor 
## 0.8701
t_cf_monocyte_de_sva[["dream"]] <- mixed_monocyte_de
test <- combine_de_tables(
  t_cf_monocyte_de_sva, scale_p = TRUE,
  excel = "excel/test_monocyte_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = monocyte_dream_result,
                            px = "deseq_adjp", py = "adj.P.Val",
                            lx = "deseq_logfc", ly = "logFC")

logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("Dream log2FC with (1|donor) and visit in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_monocyte_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

previous_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10802      60
previous_genes <- rownames(merged)[previous_sig_idx]

new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10812      50
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]
##                 ensembl_gene_id ensembl_transcript_id version
## ENSG00000100288 ENSG00000100288       ENST00000479003      19
## ENSG00000104290 ENSG00000104290       ENST00000537916      11
## ENSG00000113494 ENSG00000113494       ENST00000231423      17
## ENSG00000120217 ENSG00000120217       ENST00000492923      14
## ENSG00000121653 ENSG00000121653       ENST00000395629      11
## ENSG00000131203 ENSG00000131203       ENST00000519154      13
## ENSG00000135116 ENSG00000135116       ENST00000586941       9
## ENSG00000136244 ENSG00000136244       ENST00000401630      12
## ENSG00000147138 ENSG00000147138       ENST00000645147       2
## ENSG00000154330 ENSG00000154330       ENST00000396392      13
## ENSG00000162065 ENSG00000162065       ENST00000627285      14
## ENSG00000165338 ENSG00000165338       ENST00000498446      16
## ENSG00000174989 ENSG00000174989       ENST00000455858      13
## ENSG00000177294 ENSG00000177294       ENST00000572251       7
## ENSG00000183690 ENSG00000183690       ENST00000343571      13
## ENSG00000183801 ENSG00000183801       ENST00000329293       8
## ENSG00000203907 ENSG00000203907       ENST00000441145       9
## ENSG00000214872 ENSG00000214872       ENST00000399154       8
## ENSG00000232629 ENSG00000232629       ENST00000427449       9
## ENSG00000248405 ENSG00000248405       ENST00000361473      10
## ENSG00000263715 ENSG00000263715       ENST00000587305       7
## ENSG00000277632 ENSG00000277632       ENST00000613922       2
##                 transcript_version
## ENSG00000100288                  5
## ENSG00000104290                  2
## ENSG00000113494                  7
## ENSG00000120217                  1
## ENSG00000121653                  2
## ENSG00000131203                  5
## ENSG00000135116                  1
## ENSG00000136244                  7
## ENSG00000147138                  1
## ENSG00000154330                  5
## ENSG00000162065                  1
## ENSG00000165338                  1
## ENSG00000174989                  2
## ENSG00000177294                  1
## ENSG00000183690                  3
## ENSG00000183801                  4
## ENSG00000203907                  1
## ENSG00000214872                  3
## ENSG00000232629                  1
## ENSG00000248405                  9
## ENSG00000263715                  1
## ENSG00000277632                  2
##                                                                                                 description
## ENSG00000100288                                      choline kinase beta [Source:HGNC Symbol;Acc:HGNC:1938]
## ENSG00000104290                                frizzled class receptor 3 [Source:HGNC Symbol;Acc:HGNC:4041]
## ENSG00000113494                                       prolactin receptor [Source:HGNC Symbol;Acc:HGNC:9446]
## ENSG00000120217                                          CD274 molecule [Source:HGNC Symbol;Acc:HGNC:17635]
## ENSG00000121653 mitogen-activated protein kinase 8 interacting protein 1 [Source:HGNC Symbol;Acc:HGNC:6882]
## ENSG00000131203                            indoleamine 2,3-dioxygenase 1 [Source:HGNC Symbol;Acc:HGNC:6059]
## ENSG00000135116                       harakiri, BCL2 interacting protein [Source:HGNC Symbol;Acc:HGNC:5185]
## ENSG00000136244                                            interleukin 6 [Source:HGNC Symbol;Acc:HGNC:6018]
## ENSG00000147138                          G protein-coupled receptor 174 [Source:HGNC Symbol;Acc:HGNC:30245]
## ENSG00000154330                                     phosphoglucomutase 5 [Source:HGNC Symbol;Acc:HGNC:8908]
## ENSG00000162065                            TBC1 domain family member 24 [Source:HGNC Symbol;Acc:HGNC:29203]
## ENSG00000165338               HECT domain E3 ubiquitin protein ligase 2 [Source:HGNC Symbol;Acc:HGNC:26736]
## ENSG00000174989                 F-box and WD repeat domain containing 8 [Source:HGNC Symbol;Acc:HGNC:13597]
## ENSG00000177294                                        F-box protein 39 [Source:HGNC Symbol;Acc:HGNC:28565]
## ENSG00000183690                             EF-hand domain containing 2 [Source:HGNC Symbol;Acc:HGNC:26233]
## ENSG00000183801                                     olfactomedin like 1 [Source:HGNC Symbol;Acc:HGNC:24473]
## ENSG00000203907                                oocyte expressed protein [Source:HGNC Symbol;Acc:HGNC:21382]
## ENSG00000214872                                       smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000232629    major histocompatibility complex, class II, DQ beta 2 [Source:HGNC Symbol;Acc:HGNC:4945]
## ENSG00000248405                                PRR5-ARHGAP8 readthrough [Source:HGNC Symbol;Acc:HGNC:34512]
## ENSG00000263715                             LINC02210-CRHR1 readthrough [Source:HGNC Symbol;Acc:HGNC:51483]
## ENSG00000277632                            C-C motif chemokine ligand 3 [Source:HGNC Symbol;Acc:HGNC:10627]
##                   gene_biotype cds_length chromosome_name strand start_position
## ENSG00000100288 protein_coding  undefined              22      -       50578949
## ENSG00000104290 protein_coding       2001               8      +       28494205
## ENSG00000113494 protein_coding       1131               5      -       35048756
## ENSG00000120217 protein_coding  undefined               9      +        5450503
## ENSG00000121653 protein_coding       2106              11      +       45885651
## ENSG00000131203 protein_coding        540               8      +       39902275
## ENSG00000135116 protein_coding  undefined              12      -      116856144
## ENSG00000136244 protein_coding        570               7      +       22725884
## ENSG00000147138 protein_coding       1002               X      +       79144663
## ENSG00000154330 protein_coding       1164               9      +       68328308
## ENSG00000162065 protein_coding       1662              16      +        2475051
## ENSG00000165338 protein_coding  undefined              10      +       91409280
## ENSG00000174989 protein_coding       1599              12      +      116910950
## ENSG00000177294 protein_coding        151              17      +        6776215
## ENSG00000183690 protein_coding  undefined               X      -       44147872
## ENSG00000183801 protein_coding       1209              11      +        7485388
## ENSG00000203907 protein_coding        201               6      -       73368555
## ENSG00000214872 protein_coding       1374              11      +       57542641
## ENSG00000232629 protein_coding        656               6      -       32756098
## ENSG00000248405 protein_coding       1695              22      +       44702233
## ENSG00000263715 protein_coding  undefined              17      +       45620344
## ENSG00000277632 protein_coding        279              17      -       36088256
##                 end_position     hgnc_symbol uniprot_gn_symbol
## ENSG00000100288     50601455            CHKB              CHKB
## ENSG00000104290     28574267            FZD3              FZD3
## ENSG00000113494     35230487            PRLR              PRLR
## ENSG00000120217      5470566           CD274             CD274
## ENSG00000121653     45906465        MAPK8IP1          MAPK8IP1
## ENSG00000131203     39928790            IDO1              IDO1
## ENSG00000135116    116881441             HRK               HRK
## ENSG00000136244     22732002             IL6               IL6
## ENSG00000147138     79175315          GPR174            GPR174
## ENSG00000154330     68531061            PGM5              PGM5
## ENSG00000162065      2509560         TBC1D24           TBC1D24
## ENSG00000165338     91514829          HECTD2            HECTD2
## ENSG00000174989    117031148           FBXW8             FBXW8
## ENSG00000177294      6797101          FBXO39            FBXO39
## ENSG00000183690     44343672           EFHC2             EFHC2
## ENSG00000183801      7511377          OLFML1            OLFML1
## ENSG00000203907     73395133            OOEP              OOEP
## ENSG00000214872     57550274          SMTNL1            SMTNL1
## ENSG00000232629     32763532        HLA-DQB2          HLA-DQB2
## ENSG00000248405     44862706    PRR5-ARHGAP8      PRR5-ARHGAP8
## ENSG00000263715     45835826 LINC02210-CRHR1             CRHR1
## ENSG00000277632     36090169            CCL3              CCL3
##                        transcript     mean_cds_len
## ENSG00000100288 ENSG00000100288.5             1188
## ENSG00000104290 ENSG00000104290.2 1369.33333333333
## ENSG00000113494 ENSG00000113494.7 753.421052631579
## ENSG00000120217 ENSG00000120217.1              702
## ENSG00000121653 ENSG00000121653.2             2121
## ENSG00000131203 ENSG00000131203.5 642.166666666667
## ENSG00000135116 ENSG00000135116.1            166.5
## ENSG00000136244 ENSG00000136244.7 526.142857142857
## ENSG00000147138 ENSG00000147138.1             1002
## ENSG00000154330 ENSG00000154330.5 1161.33333333333
## ENSG00000162065 ENSG00000162065.1             1332
## ENSG00000165338 ENSG00000165338.1           1329.8
## ENSG00000174989 ENSG00000174989.2             1627
## ENSG00000177294 ENSG00000177294.1 530.333333333333
## ENSG00000183690 ENSG00000183690.3             2250
## ENSG00000183801 ENSG00000183801.4            698.5
## ENSG00000203907 ENSG00000203907.1              312
## ENSG00000214872 ENSG00000214872.3           1429.5
## ENSG00000232629 ENSG00000232629.1           740.75
## ENSG00000248405 ENSG00000248405.9 1713.66666666667
## ENSG00000263715 ENSG00000263715.1              723
## ENSG00000277632 ENSG00000277632.2              279
Vennerable::plot(compare)

11.3.6 Neutrophils

neutrophil_dream_result <- mixed_neutrophil_de[["all_tables"]][["failure_vs_cure"]]

big_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, neutrophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 175, df = 9099, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8727 0.8821
## sample estimates:
##    cor 
## 0.8775
t_cf_neutrophil_de_sva[["dream"]] <- mixed_neutrophil_de
test <- combine_de_tables(
  t_cf_neutrophil_de_sva, scale_p = TRUE,
  excel = "excel/test_neutrophil_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = neutrophil_dream_result,
                            px = "deseq_adjp", py = "adj.P.Val",
                            lx = "deseq_logfc", ly = "logFC")

logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("Dream log2FC with (1|donor) and visit in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_neutrophil_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

previous_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical    8971     130
previous_genes <- rownames(merged)[previous_sig_idx]

new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)
##    Mode   FALSE    TRUE 
## logical    9025      76
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

annot <- fData(t_neutrophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]
##                 ensembl_gene_id ensembl_transcript_id version
## ENSG00000020129 ENSG00000020129       ENST00000356090      16
## ENSG00000023171 ENSG00000023171       ENST00000532581      18
## ENSG00000078098 ENSG00000078098       ENST00000480044      14
## ENSG00000101342 ENSG00000101342       ENST00000602922      10
## ENSG00000106804 ENSG00000106804       ENST00000480188       8
## ENSG00000108387 ENSG00000108387       ENST00000321691      15
## ENSG00000108771 ENSG00000108771       ENST00000590637      13
## ENSG00000120008 ENSG00000120008       ENST00000605178      16
## ENSG00000120675 ENSG00000120675       ENST00000379221       6
## ENSG00000122729 ENSG00000122729       ENST00000309951      19
## ENSG00000131203 ENSG00000131203       ENST00000519154      13
## ENSG00000133106 ENSG00000133106       ENST00000313640      14
## ENSG00000134326 ENSG00000134326       ENST00000458098      11
## ENSG00000134809 ENSG00000134809       ENST00000257245       9
## ENSG00000135116 ENSG00000135116       ENST00000586941       9
## ENSG00000136514 ENSG00000136514       ENST00000259030       3
## ENSG00000137628 ENSG00000137628       ENST00000513997      17
## ENSG00000137959 ENSG00000137959       ENST00000450498      16
## ENSG00000138646 ENSG00000138646       ENST00000502913       9
## ENSG00000141664 ENSG00000141664       ENST00000585873      10
## ENSG00000145244 ENSG00000145244       ENST00000610355      12
## ENSG00000146205 ENSG00000146205       ENST00000481071      13
## ENSG00000151692 ENSG00000151692       ENST00000432850      15
## ENSG00000152056 ENSG00000152056       ENST00000444408      17
## ENSG00000155158 ENSG00000155158       ENST00000512701      20
## ENSG00000155363 ENSG00000155363       ENST00000468624      18
## ENSG00000160469 ENSG00000160469       ENST00000585418      17
## ENSG00000160932 ENSG00000160932       ENST00000521182      11
## ENSG00000162772 ENSG00000162772       ENST00000492118      17
## ENSG00000163644 ENSG00000163644       ENST00000514204      15
## ENSG00000164125 ENSG00000164125       ENST00000592057      15
## ENSG00000164136 ENSG00000164136       ENST00000296545      17
## ENSG00000166257 ENSG00000166257       ENST00000392770       9
## ENSG00000167014 ENSG00000167014       ENST00000557864      11
## ENSG00000170448 ENSG00000170448       ENST00000464756      12
## ENSG00000171365 ENSG00000171365       ENST00000642885      17
## ENSG00000172159 ENSG00000172159       ENST00000376438      16
## ENSG00000172716 ENSG00000172716       ENST00000591682      16
## ENSG00000174989 ENSG00000174989       ENST00000455858      13
## ENSG00000179044 ENSG00000179044       ENST00000564324      16
## ENSG00000180061 ENSG00000180061       ENST00000585918      10
## ENSG00000186654 ENSG00000186654       ENST00000403696      21
## ENSG00000187608 ENSG00000187608       ENST00000624697      10
## ENSG00000188157 ENSG00000188157       ENST00000620552      15
## ENSG00000188290 ENSG00000188290       ENST00000304952      10
## ENSG00000196141 ENSG00000196141       ENST00000421573      14
## ENSG00000196369 ENSG00000196369       ENST00000494534      11
## ENSG00000196405 ENSG00000196405       ENST00000555048      13
## ENSG00000198087 ENSG00000198087       ENST00000479857       7
## ENSG00000214872 ENSG00000214872       ENST00000399154       8
## ENSG00000228696 ENSG00000228696       ENST00000575960       9
## ENSG00000248405 ENSG00000248405       ENST00000361473      10
## ENSG00000269720 ENSG00000269720       ENST00000597169       2
##                 transcript_version
## ENSG00000020129                  8
## ENSG00000023171                  5
## ENSG00000078098                  5
## ENSG00000101342                  5
## ENSG00000106804                  1
## ENSG00000108387                  3
## ENSG00000108771                  1
## ENSG00000120008                  5
## ENSG00000120675                  4
## ENSG00000122729                  8
## ENSG00000131203                  5
## ENSG00000133106                 11
## ENSG00000134326                  5
## ENSG00000134809                  9
## ENSG00000135116                  1
## ENSG00000136514                  3
## ENSG00000137628                  1
## ENSG00000137959                  1
## ENSG00000138646                  1
## ENSG00000141664                  5
## ENSG00000145244                  4
## ENSG00000146205                  1
## ENSG00000151692                  1
## ENSG00000152056                  1
## ENSG00000155158                  6
## ENSG00000155363                  5
## ENSG00000160469                  1
## ENSG00000160932                  5
## ENSG00000162772                  2
## ENSG00000163644                  1
## ENSG00000164125                  1
## ENSG00000164136                 11
## ENSG00000166257                  6
## ENSG00000167014                  1
## ENSG00000170448                  6
## ENSG00000171365                  1
## ENSG00000172159                  5
## ENSG00000172716                  5
## ENSG00000174989                  2
## ENSG00000179044                  5
## ENSG00000180061                  5
## ENSG00000186654                  5
## ENSG00000187608                  4
## ENSG00000188157                  4
## ENSG00000188290                 10
## ENSG00000196141                  5
## ENSG00000196369                  1
## ENSG00000196405                  5
## ENSG00000198087                  1
## ENSG00000214872                  3
## ENSG00000228696                  5
## ENSG00000248405                  9
## ENSG00000269720                  1
##                                                                                                      description
## ENSG00000020129                                                neurochondrin [Source:HGNC Symbol;Acc:HGNC:17597]
## ENSG00000023171                                    GRAM domain containing 1B [Source:HGNC Symbol;Acc:HGNC:29214]
## ENSG00000078098                           fibroblast activation protein alpha [Source:HGNC Symbol;Acc:HGNC:3590]
## ENSG00000101342                      TBC/LysM-associated domain containing 2 [Source:HGNC Symbol;Acc:HGNC:16112]
## ENSG00000106804                                                 complement C5 [Source:HGNC Symbol;Acc:HGNC:1331]
## ENSG00000108387                                                      septin 4 [Source:HGNC Symbol;Acc:HGNC:9165]
## ENSG00000108771                                         DExH-box helicase 58 [Source:HGNC Symbol;Acc:HGNC:29517]
## ENSG00000120008                                          WD repeat domain 11 [Source:HGNC Symbol;Acc:HGNC:13831]
## ENSG00000120675            DnaJ heat shock protein family (Hsp40) member C15 [Source:HGNC Symbol;Acc:HGNC:20325]
## ENSG00000122729                                                    aconitase 1 [Source:HGNC Symbol;Acc:HGNC:117]
## ENSG00000131203                                 indoleamine 2,3-dioxygenase 1 [Source:HGNC Symbol;Acc:HGNC:6059]
## ENSG00000133106                             epithelial stromal interaction 1 [Source:HGNC Symbol;Acc:HGNC:16465]
## ENSG00000134326                      cytidine/uridine monophosphate kinase 2 [Source:HGNC Symbol;Acc:HGNC:27015]
## ENSG00000134809               translocase of inner mitochondrial membrane 10 [Source:HGNC Symbol;Acc:HGNC:11814]
## ENSG00000135116                            harakiri, BCL2 interacting protein [Source:HGNC Symbol;Acc:HGNC:5185]
## ENSG00000136514                               receptor transporter protein 4 [Source:HGNC Symbol;Acc:HGNC:23992]
## ENSG00000137628                                       DExD/H-box helicase 60 [Source:HGNC Symbol;Acc:HGNC:25942]
## ENSG00000137959                           interferon induced protein 44 like [Source:HGNC Symbol;Acc:HGNC:17817]
## ENSG00000138646 HECT and RLD domain containing E3 ubiquitin protein ligase 5 [Source:HGNC Symbol;Acc:HGNC:24368]
## ENSG00000141664                           zinc finger CCHC-type containing 2 [Source:HGNC Symbol;Acc:HGNC:22916]
## ENSG00000145244                                      corin, serine peptidase [Source:HGNC Symbol;Acc:HGNC:19012]
## ENSG00000146205                                                  anoctamin 7 [Source:HGNC Symbol;Acc:HGNC:31677]
## ENSG00000151692                                     ring finger protein 144A [Source:HGNC Symbol;Acc:HGNC:20457]
## ENSG00000152056            adaptor related protein complex 1 subunit sigma 3 [Source:HGNC Symbol;Acc:HGNC:18971]
## ENSG00000155158                          tetratricopeptide repeat domain 39B [Source:HGNC Symbol;Acc:HGNC:23704]
## ENSG00000155363                               Mov10 RISC complex RNA helicase [Source:HGNC Symbol;Acc:HGNC:7200]
## ENSG00000160469                                 BR serine/threonine kinase 1 [Source:HGNC Symbol;Acc:HGNC:18994]
## ENSG00000160932                          lymphocyte antigen 6 family member E [Source:HGNC Symbol;Acc:HGNC:6727]
## ENSG00000162772                              activating transcription factor 3 [Source:HGNC Symbol;Acc:HGNC:785]
## ENSG00000163644                  protein phosphatase, Mg2+/Mn2+ dependent 1K [Source:HGNC Symbol;Acc:HGNC:25415]
## ENSG00000164125                                   golgi associated kinase 1B [Source:HGNC Symbol;Acc:HGNC:25312]
## ENSG00000164136                                                interleukin 15 [Source:HGNC Symbol;Acc:HGNC:5977]
## ENSG00000166257                  sodium voltage-gated channel beta subunit 3 [Source:HGNC Symbol;Acc:HGNC:20665]
## ENSG00000167014          telomere repeat binding bouquet formation protein 2 [Source:HGNC Symbol;Acc:HGNC:28520]
## ENSG00000170448           nuclear transcription factor, X-box binding like 1 [Source:HGNC Symbol;Acc:HGNC:18726]
## ENSG00000171365                              chloride voltage-gated channel 5 [Source:HGNC Symbol;Acc:HGNC:2023]
## ENSG00000172159                                     FERM domain containing 3 [Source:HGNC Symbol;Acc:HGNC:24125]
## ENSG00000172716                                    schlafen family member 11 [Source:HGNC Symbol;Acc:HGNC:26633]
## ENSG00000174989                      F-box and WD repeat domain containing 8 [Source:HGNC Symbol;Acc:HGNC:13597]
## ENSG00000179044                           exocyst complex component 3 like 1 [Source:HGNC Symbol;Acc:HGNC:27540]
## ENSG00000180061                                   transmembrane protein 150B [Source:HGNC Symbol;Acc:HGNC:34415]
## ENSG00000186654                                               proline rich 5 [Source:HGNC Symbol;Acc:HGNC:31682]
## ENSG00000187608                                 ISG15 ubiquitin like modifier [Source:HGNC Symbol;Acc:HGNC:4053]
## ENSG00000188157                                                          agrin [Source:HGNC Symbol;Acc:HGNC:329]
## ENSG00000188290                       hes family bHLH transcription factor 4 [Source:HGNC Symbol;Acc:HGNC:24149]
## ENSG00000196141                spermatogenesis associated serine rich 2 like [Source:HGNC Symbol;Acc:HGNC:24574]
## ENSG00000196369                   SLIT-ROBO Rho GTPase activating protein 2B [Source:HGNC Symbol;Acc:HGNC:35237]
## ENSG00000196405                                               Enah/Vasp-like [Source:HGNC Symbol;Acc:HGNC:20234]
## ENSG00000198087                                       CD2 associated protein [Source:HGNC Symbol;Acc:HGNC:14258]
## ENSG00000214872                                            smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000228696                      ADP ribosylation factor like GTPase 17B [Source:HGNC Symbol;Acc:HGNC:32387]
## ENSG00000248405                                     PRR5-ARHGAP8 readthrough [Source:HGNC Symbol;Acc:HGNC:34512]
## ENSG00000269720                            coiled-coil domain containing 194 [Source:HGNC Symbol;Acc:HGNC:53438]
##                   gene_biotype cds_length chromosome_name strand start_position
## ENSG00000020129 protein_coding       2190               1      +       35557473
## ENSG00000023171 protein_coding  undefined              11      +      123358428
## ENSG00000078098 protein_coding  undefined               2      -      162170684
## ENSG00000101342 protein_coding        648              20      +       36876121
## ENSG00000106804 protein_coding  undefined               9      -      120952335
## ENSG00000108387 protein_coding       1713              17      -       58520250
## ENSG00000108771 protein_coding  undefined              17      -       42101404
## ENSG00000120008 protein_coding  undefined              10      +      120851305
## ENSG00000120675 protein_coding        453              13      +       43023203
## ENSG00000122729 protein_coding       2670               9      +       32384603
## ENSG00000131203 protein_coding        540               8      +       39902275
## ENSG00000133106 protein_coding       1233              13      -       42886388
## ENSG00000134326 protein_coding       1101               2      -        6840570
## ENSG00000134809 protein_coding        273              11      -       57528464
## ENSG00000135116 protein_coding  undefined              12      -      116856144
## ENSG00000136514 protein_coding        741               3      +      187368385
## ENSG00000137628 protein_coding        314               4      -      168216294
## ENSG00000137959 protein_coding        699               1      +       78619922
## ENSG00000138646 protein_coding  undefined               4      +       88457119
## ENSG00000141664 protein_coding       3295              18      +       62523025
## ENSG00000145244 protein_coding       2817               4      -       47593999
## ENSG00000146205 protein_coding  undefined               2      +      241188509
## ENSG00000151692 protein_coding        794               2      +        6917412
## ENSG00000152056 protein_coding        170               2      -      223751686
## ENSG00000155158 protein_coding       2049               9      -       15163622
## ENSG00000155363 protein_coding  undefined               1      +      112673141
## ENSG00000160469 protein_coding       1032              19      +       55282072
## ENSG00000160932 protein_coding        126               8      +      143017982
## ENSG00000162772 protein_coding  undefined               1      +      212565334
## ENSG00000163644 protein_coding        549               4      -       88257620
## ENSG00000164125 protein_coding        996               4      -      158124474
## ENSG00000164136 protein_coding        489               4      +      141636583
## ENSG00000166257 protein_coding        648              11      -      123629187
## ENSG00000167014 protein_coding        131              15      +       44956687
## ENSG00000170448 protein_coding       2202               4      -       47847233
## ENSG00000171365 protein_coding       2241               X      +       49922596
## ENSG00000172159 protein_coding       1671               9      -       83242990
## ENSG00000172716 protein_coding        369              17      -       35350305
## ENSG00000174989 protein_coding       1599              12      +      116910950
## ENSG00000179044 protein_coding        243              16      -       67184379
## ENSG00000180061 protein_coding        457              19      -       55312801
## ENSG00000186654 protein_coding        532              22      +       44668547
## ENSG00000187608 protein_coding        474               1      +        1001138
## ENSG00000188157 protein_coding       5793               1      +        1020120
## ENSG00000188290 protein_coding        666               1      -         998962
## ENSG00000196141 protein_coding        463               2      +      200305881
## ENSG00000196369 protein_coding  undefined               1      -      144887265
## ENSG00000196405 protein_coding        852              14      +       99971449
## ENSG00000198087 protein_coding  undefined               6      +       47477789
## ENSG00000214872 protein_coding       1374              11      +       57542641
## ENSG00000228696 protein_coding        546              17      -       46274784
## ENSG00000248405 protein_coding       1695              22      +       44702233
## ENSG00000269720 protein_coding  undefined              19      -       17390509
##                 end_position  hgnc_symbol uniprot_gn_symbol         transcript
## ENSG00000020129     35567274         NCDN              NCDN  ENSG00000020129.8
## ENSG00000023171    123627774      GRAMD1B           GRAMD1B  ENSG00000023171.5
## ENSG00000078098    162245151          FAP               FAP  ENSG00000078098.5
## ENSG00000101342     36894235        TLDC2             TLDC2  ENSG00000101342.5
## ENSG00000106804    121050275           C5                C5  ENSG00000106804.1
## ENSG00000108387     58544368      SEPTIN4          C17orf47  ENSG00000108387.3
## ENSG00000108771     42112714        DHX58             DHX58  ENSG00000108771.1
## ENSG00000120008    120909524        WDR11             WDR11  ENSG00000120008.5
## ENSG00000120675     43114213      DNAJC15           DNAJC15  ENSG00000120675.4
## ENSG00000122729     32454769         ACO1              ACO1  ENSG00000122729.8
## ENSG00000131203     39928790         IDO1              IDO1  ENSG00000131203.5
## ENSG00000133106     42992271       EPSTI1            EPSTI1 ENSG00000133106.11
## ENSG00000134326      6866635        CMPK2             CMPK2  ENSG00000134326.5
## ENSG00000134809     57530803       TIMM10            TIMM10  ENSG00000134809.9
## ENSG00000135116    116881441          HRK               HRK  ENSG00000135116.1
## ENSG00000136514    187372076         RTP4              RTP4  ENSG00000136514.3
## ENSG00000137628    168318804        DDX60             DDX60  ENSG00000137628.1
## ENSG00000137959     78646145       IFI44L            IFI44L  ENSG00000137959.1
## ENSG00000138646     88506163        HERC5             HERC5  ENSG00000138646.1
## ENSG00000141664     62587709       ZCCHC2            ZCCHC2  ENSG00000141664.5
## ENSG00000145244     47838106        CORIN             CORIN  ENSG00000145244.4
## ENSG00000146205    241225377         ANO7              ANO7  ENSG00000146205.1
## ENSG00000151692      7068286      RNF144A           RNF144A  ENSG00000151692.1
## ENSG00000152056    223838027        AP1S3             AP1S3  ENSG00000152056.1
## ENSG00000155158     15307360       TTC39B            TTC39B  ENSG00000155158.6
## ENSG00000155363    112700746        MOV10             MOV10  ENSG00000155363.5
## ENSG00000160469     55312562        BRSK1             BRSK1  ENSG00000160469.1
## ENSG00000160932    143023832         LY6E              LY6E  ENSG00000160932.5
## ENSG00000162772    212620777         ATF3              ATF3  ENSG00000162772.2
## ENSG00000163644     88284769        PPM1K             PPM1K  ENSG00000163644.1
## ENSG00000164125    158173318       GASK1B            GASK1B  ENSG00000164125.1
## ENSG00000164136    141733987         IL15              IL15 ENSG00000164136.11
## ENSG00000166257    123655244        SCN3B             SCN3B  ENSG00000166257.6
## ENSG00000167014     44979229        TERB2             TERB2  ENSG00000167014.1
## ENSG00000170448     47914667        NFXL1             NFXL1  ENSG00000170448.6
## ENSG00000171365     50099235        CLCN5             CLCN5  ENSG00000171365.1
## ENSG00000172159     83538546        FRMD3             FRMD3  ENSG00000172159.5
## ENSG00000172716     35373701       SLFN11            SLFN11  ENSG00000172716.5
## ENSG00000174989    117031148        FBXW8             FBXW8  ENSG00000174989.2
## ENSG00000179044     67190185      EXOC3L1           EXOC3L1  ENSG00000179044.5
## ENSG00000180061     55334048     TMEM150B          TMEM150B  ENSG00000180061.5
## ENSG00000186654     44737681         PRR5              PRR5  ENSG00000186654.5
## ENSG00000187608      1014540        ISG15             ISG15  ENSG00000187608.4
## ENSG00000188157      1056118         AGRN              AGRN  ENSG00000188157.4
## ENSG00000188290      1000172         HES4              HES4 ENSG00000188290.10
## ENSG00000196141    200482264      SPATS2L           SPATS2L  ENSG00000196141.5
## ENSG00000196369    145095528      SRGAP2B           SRGAP2B  ENSG00000196369.1
## ENSG00000196405    100144236          EVL               EVL  ENSG00000196405.5
## ENSG00000198087     47627263        CD2AP             CD2AP  ENSG00000198087.1
## ENSG00000214872     57550274       SMTNL1            SMTNL1  ENSG00000214872.3
## ENSG00000228696     46361797       ARL17B            ARL17A  ENSG00000228696.5
## ENSG00000248405     44862706 PRR5-ARHGAP8      PRR5-ARHGAP8  ENSG00000248405.9
## ENSG00000269720     17394158      CCDC194           CCDC194  ENSG00000269720.1
##                     mean_cds_len
## ENSG00000020129           1566.4
## ENSG00000023171           1518.2
## ENSG00000078098 1117.57142857143
## ENSG00000101342              460
## ENSG00000106804             5031
## ENSG00000108387 1062.89473684211
## ENSG00000108771              773
## ENSG00000120008          1374.75
## ENSG00000120675              453
## ENSG00000122729             2670
## ENSG00000131203 642.166666666667
## ENSG00000133106            718.4
## ENSG00000134326             1227
## ENSG00000134809              273
## ENSG00000135116            166.5
## ENSG00000136514              741
## ENSG00000137628          1596.75
## ENSG00000137959 783.333333333333
## ENSG00000138646             2532
## ENSG00000141664 1908.16666666667
## ENSG00000145244           2801.5
## ENSG00000146205          1637.25
## ENSG00000151692           500.75
## ENSG00000152056           345.25
## ENSG00000155158           1536.5
## ENSG00000155363             2970
## ENSG00000160469           1293.5
## ENSG00000160932 306.642857142857
## ENSG00000162772 440.555555555556
## ENSG00000163644 512.142857142857
## ENSG00000164125            898.5
## ENSG00000164136            448.5
## ENSG00000166257 560.428571428571
## ENSG00000167014 313.666666666667
## ENSG00000170448           2602.5
## ENSG00000171365 1620.11111111111
## ENSG00000172159           1213.5
## ENSG00000172716            749.8
## ENSG00000174989             1627
## ENSG00000179044           1423.6
## ENSG00000180061 366.833333333333
## ENSG00000186654 833.727272727273
## ENSG00000187608 467.666666666667
## ENSG00000188157           5911.5
## ENSG00000188290              660
## ENSG00000196141 950.952380952381
## ENSG00000196369            951.5
## ENSG00000196405 702.090909090909
## ENSG00000198087             1920
## ENSG00000214872           1429.5
## ENSG00000228696 391.888888888889
## ENSG00000248405 1713.66666666667
## ENSG00000269720              705
Vennerable::plot(compare)

11.3.7 Eosinophils

eosinophil_dream_result <- mixed_eosinophil_de[["all_tables"]][["failure_vs_cure"]]

big_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, eosinophil_dream_result, by = "row.names")
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 177, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8601 0.8697
## sample estimates:
##   cor 
## 0.865
t_cf_eosinophil_de_sva[["dream"]] <- mixed_eosinophil_de
test <- combine_de_tables(
  t_cf_eosinophil_de_sva, scale_p = TRUE,
  excel = "excel/test_eosinophil_combined.xlsx")
test_aucc <- calculate_aucc(big_table, tbl2 = eosinophil_dream_result,
                            px = "deseq_adjp", py = "adj.P.Val",
                            lx = "deseq_logfc", ly = "logFC")

logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("Dream log2FC with (1|donor) and visit in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "figures/compare_cf_and_visit_in_model_eosinophil_logfc.svg")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

previous_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10416     116
previous_genes <- rownames(merged)[previous_sig_idx]

new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10467      65
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

annot <- fData(t_eosinophils)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]
##                 ensembl_gene_id ensembl_transcript_id version
## ENSG00000011105 ENSG00000011105       ENST00000649909      14
## ENSG00000087237 ENSG00000087237       ENST00000379780      12
## ENSG00000089127 ENSG00000089127       ENST00000540589      13
## ENSG00000106211 ENSG00000106211       ENST00000248553       9
## ENSG00000108679 ENSG00000108679       ENST00000592255      13
## ENSG00000108774 ENSG00000108774       ENST00000547517      14
## ENSG00000115267 ENSG00000115267       ENST00000464129       8
## ENSG00000117228 ENSG00000117228       ENST00000495131      10
## ENSG00000118785 ENSG00000118785       ENST00000513981      14
## ENSG00000123689 ENSG00000123689       ENST00000367029       6
## ENSG00000126709 ENSG00000126709       ENST00000339145      15
## ENSG00000130203 ENSG00000130203       ENST00000446996      10
## ENSG00000135047 ENSG00000135047       ENST00000495822      15
## ENSG00000136235 ENSG00000136235       ENST00000479625      16
## ENSG00000136689 ENSG00000136689       ENST00000472292      18
## ENSG00000137965 ENSG00000137965       ENST00000476911      11
## ENSG00000138646 ENSG00000138646       ENST00000502913       9
## ENSG00000141574 ENSG00000141574       ENST00000269389       8
## ENSG00000145431 ENSG00000145431       ENST00000511985      11
## ENSG00000152689 ENSG00000152689       ENST00000490150      18
## ENSG00000152778 ENSG00000152778       ENST00000371795       9
## ENSG00000154451 ENSG00000154451       ENST00000481145      14
## ENSG00000157680 ENSG00000157680       ENST00000424189      15
## ENSG00000160932 ENSG00000160932       ENST00000521182      11
## ENSG00000161055 ENSG00000161055       ENST00000292641       4
## ENSG00000162645 ENSG00000162645       ENST00000370466      13
## ENSG00000162654 ENSG00000162654       ENST00000355754       9
## ENSG00000165949 ENSG00000165949       ENST00000611954      12
## ENSG00000173762 ENSG00000173762       ENST00000312648       8
## ENSG00000174276 ENSG00000174276       ENST00000310597       7
## ENSG00000177989 ENSG00000177989       ENST00000405135      13
## ENSG00000179455 ENSG00000179455       ENST00000649065       9
## ENSG00000187569 ENSG00000187569       ENST00000345088       3
## ENSG00000187608 ENSG00000187608       ENST00000624697      10
## ENSG00000196141 ENSG00000196141       ENST00000421573      14
## ENSG00000198178 ENSG00000198178       ENST00000537530      10
## ENSG00000198286 ENSG00000198286       ENST00000355508       9
## ENSG00000203907 ENSG00000203907       ENST00000441145       9
## ENSG00000204291 ENSG00000204291       ENST00000471477      11
## ENSG00000204632 ENSG00000204632       ENST00000428701      11
## ENSG00000214872 ENSG00000214872       ENST00000399154       8
## ENSG00000221963 ENSG00000221963       ENST00000409652       6
## ENSG00000231767 ENSG00000231767       ENST00000443181       4
## ENSG00000269720 ENSG00000269720       ENST00000597169       2
##                 transcript_version
## ENSG00000011105                  1
## ENSG00000087237                  6
## ENSG00000089127                  2
## ENSG00000106211                  7
## ENSG00000108679                  1
## ENSG00000108774                  5
## ENSG00000115267                  1
## ENSG00000117228                  1
## ENSG00000118785                  5
## ENSG00000123689                  5
## ENSG00000126709                  8
## ENSG00000130203                  5
## ENSG00000135047                  1
## ENSG00000136235                  1
## ENSG00000136689                  1
## ENSG00000137965                  1
## ENSG00000138646                  1
## ENSG00000141574                  8
## ENSG00000145431                  1
## ENSG00000152689                  1
## ENSG00000152778                  5
## ENSG00000154451                  1
## ENSG00000157680                  6
## ENSG00000160932                  5
## ENSG00000161055                  4
## ENSG00000162645                  4
## ENSG00000162654                  7
## ENSG00000165949                  4
## ENSG00000173762                  8
## ENSG00000174276                  6
## ENSG00000177989                  5
## ENSG00000179455                  1
## ENSG00000187569                  3
## ENSG00000187608                  4
## ENSG00000196141                  5
## ENSG00000198178                  1
## ENSG00000198286                  3
## ENSG00000203907                  1
## ENSG00000204291                  1
## ENSG00000204632                  5
## ENSG00000214872                  3
## ENSG00000221963                  5
## ENSG00000231767                  2
## ENSG00000269720                  1
##                                                                                                      description
## ENSG00000011105                                                tetraspanin 9 [Source:HGNC Symbol;Acc:HGNC:21640]
## ENSG00000087237                            cholesteryl ester transfer protein [Source:HGNC Symbol;Acc:HGNC:1869]
## ENSG00000089127                             2'-5'-oligoadenylate synthetase 1 [Source:HGNC Symbol;Acc:HGNC:8086]
## ENSG00000106211                  heat shock protein family B (small) member 1 [Source:HGNC Symbol;Acc:HGNC:5246]
## ENSG00000108679                                    galectin 3 binding protein [Source:HGNC Symbol;Acc:HGNC:6564]
## ENSG00000108774                             RAB5C, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:9785]
## ENSG00000115267                  interferon induced with helicase C domain 1 [Source:HGNC Symbol;Acc:HGNC:18873]
## ENSG00000117228                                   guanylate binding protein 1 [Source:HGNC Symbol;Acc:HGNC:4182]
## ENSG00000118785                                    secreted phosphoprotein 1 [Source:HGNC Symbol;Acc:HGNC:11255]
## ENSG00000123689                                               G0/G1 switch 2 [Source:HGNC Symbol;Acc:HGNC:30229]
## ENSG00000126709                          interferon alpha inducible protein 6 [Source:HGNC Symbol;Acc:HGNC:4054]
## ENSG00000130203                                               apolipoprotein E [Source:HGNC Symbol;Acc:HGNC:613]
## ENSG00000135047                                                   cathepsin L [Source:HGNC Symbol;Acc:HGNC:2537]
## ENSG00000136235                                              glycoprotein nmb [Source:HGNC Symbol;Acc:HGNC:4462]
## ENSG00000136689                             interleukin 1 receptor antagonist [Source:HGNC Symbol;Acc:HGNC:6000]
## ENSG00000137965                                interferon induced protein 44 [Source:HGNC Symbol;Acc:HGNC:16938]
## ENSG00000138646 HECT and RLD domain containing E3 ubiquitin protein ligase 5 [Source:HGNC Symbol;Acc:HGNC:24368]
## ENSG00000141574                                 secreted and transmembrane 1 [Source:HGNC Symbol;Acc:HGNC:10707]
## ENSG00000145431                              platelet derived growth factor C [Source:HGNC Symbol;Acc:HGNC:8801]
## ENSG00000152689                               RAS guanyl releasing protein 3 [Source:HGNC Symbol;Acc:HGNC:14545]
## ENSG00000152778  interferon induced protein with tetratricopeptide repeats 5 [Source:HGNC Symbol;Acc:HGNC:13328]
## ENSG00000154451                                  guanylate binding protein 5 [Source:HGNC Symbol;Acc:HGNC:19895]
## ENSG00000157680                                    diacylglycerol kinase iota [Source:HGNC Symbol;Acc:HGNC:2855]
## ENSG00000160932                          lymphocyte antigen 6 family member E [Source:HGNC Symbol;Acc:HGNC:6727]
## ENSG00000161055                             secretoglobin family 3A member 1 [Source:HGNC Symbol;Acc:HGNC:18384]
## ENSG00000162645                                   guanylate binding protein 2 [Source:HGNC Symbol;Acc:HGNC:4183]
## ENSG00000162654                                  guanylate binding protein 4 [Source:HGNC Symbol;Acc:HGNC:20480]
## ENSG00000165949                         interferon alpha inducible protein 27 [Source:HGNC Symbol;Acc:HGNC:5397]
## ENSG00000173762                                                  CD7 molecule [Source:HGNC Symbol;Acc:HGNC:1695]
## ENSG00000174276                             zinc finger HIT-type containing 2 [Source:HGNC Symbol;Acc:HGNC:1177]
## ENSG00000177989                          outer dense fiber of sperm tails 3B [Source:HGNC Symbol;Acc:HGNC:34388]
## ENSG00000179455                                 makorin ring finger protein 3 [Source:HGNC Symbol;Acc:HGNC:7114]
## ENSG00000187569                      developmental pluripotency associated 3 [Source:HGNC Symbol;Acc:HGNC:19199]
## ENSG00000187608                                 ISG15 ubiquitin like modifier [Source:HGNC Symbol;Acc:HGNC:4053]
## ENSG00000196141                spermatogenesis associated serine rich 2 like [Source:HGNC Symbol;Acc:HGNC:24574]
## ENSG00000198178                       C-type lectin domain family 4 member C [Source:HGNC Symbol;Acc:HGNC:13258]
## ENSG00000198286                  caspase recruitment domain family member 11 [Source:HGNC Symbol;Acc:HGNC:16393]
## ENSG00000203907                                     oocyte expressed protein [Source:HGNC Symbol;Acc:HGNC:21382]
## ENSG00000204291                                collagen type XV alpha 1 chain [Source:HGNC Symbol;Acc:HGNC:2192]
## ENSG00000204632                  major histocompatibility complex, class I, G [Source:HGNC Symbol;Acc:HGNC:4964]
## ENSG00000214872                                            smoothelin like 1 [Source:HGNC Symbol;Acc:HGNC:32394]
## ENSG00000221963                                            apolipoprotein L6 [Source:HGNC Symbol;Acc:HGNC:14870]
## ENSG00000231767                                           novel protein similar to ribosomal protein S27a RPS27A
## ENSG00000269720                            coiled-coil domain containing 194 [Source:HGNC Symbol;Acc:HGNC:53438]
##                   gene_biotype cds_length chromosome_name strand start_position
## ENSG00000011105 protein_coding        395              12      +        3077355
## ENSG00000087237 protein_coding       1302              16      +       56961923
## ENSG00000089127 protein_coding         68              12      +      112906783
## ENSG00000106211 protein_coding        618               7      +       76302673
## ENSG00000108679 protein_coding  undefined              17      -       78971238
## ENSG00000108774 protein_coding        750              17      -       42124976
## ENSG00000115267 protein_coding  undefined               2      -      162267074
## ENSG00000117228 protein_coding  undefined               1      -       89052319
## ENSG00000118785 protein_coding  undefined               4      +       87975650
## ENSG00000123689 protein_coding        312               1      +      209675412
## ENSG00000126709 protein_coding        417               1      -       27666064
## ENSG00000130203 protein_coding        648              19      +       44905791
## ENSG00000135047 protein_coding  undefined               9      +       87726109
## ENSG00000136235 protein_coding  undefined               7      +       23235967
## ENSG00000136689 protein_coding  undefined               2      +      113107214
## ENSG00000137965 protein_coding  undefined               1      +       78649796
## ENSG00000138646 protein_coding  undefined               4      +       88457119
## ENSG00000141574 protein_coding        747              17      -       82321024
## ENSG00000145431 protein_coding  undefined               4      -      156760454
## ENSG00000152689 protein_coding  undefined               2      +       33436324
## ENSG00000152778 protein_coding       1449              10      +       89414568
## ENSG00000154451 protein_coding  undefined               1      -       89258950
## ENSG00000157680 protein_coding       3237               7      -      137381037
## ENSG00000160932 protein_coding        126               8      +      143017982
## ENSG00000161055 protein_coding        315               5      -      180590105
## ENSG00000162645 protein_coding       1776               1      -       89106132
## ENSG00000162654 protein_coding       1923               1      -       89181144
## ENSG00000165949 protein_coding        180              14      +       94104836
## ENSG00000173762 protein_coding        723              17      -       82314868
## ENSG00000174276 protein_coding       1212              11      -       65116403
## ENSG00000177989 protein_coding        846              22      -       50529710
## ENSG00000179455 protein_coding        486              15      +       23565674
## ENSG00000187569 protein_coding        480              12      +        7711433
## ENSG00000187608 protein_coding        474               1      +        1001138
## ENSG00000196141 protein_coding        463               2      +      200305881
## ENSG00000198178 protein_coding        267              12      -        7729415
## ENSG00000198286 protein_coding        815               7      -        2906141
## ENSG00000203907 protein_coding        201               6      -       73368555
## ENSG00000204291 protein_coding  undefined               9      +       98943179
## ENSG00000204632 protein_coding       1017               6      +       29826967
## ENSG00000214872 protein_coding       1374              11      +       57542641
## ENSG00000221963 protein_coding       1032              22      +       35648446
## ENSG00000231767 protein_coding        468               1      +      192716132
## ENSG00000269720 protein_coding  undefined              19      -       17390509
##                 end_position hgnc_symbol uniprot_gn_symbol        transcript
## ENSG00000011105      3286564      TSPAN9            TSPAN9 ENSG00000011105.1
## ENSG00000087237     56983845        CETP              CETP ENSG00000087237.6
## ENSG00000089127    112933222        OAS1              OAS1 ENSG00000089127.2
## ENSG00000106211     76304295       HSPB1             HSPB1 ENSG00000106211.7
## ENSG00000108679     78979947    LGALS3BP          LGALS3BP ENSG00000108679.1
## ENSG00000108774     42155044       RAB5C             RAB5C ENSG00000108774.5
## ENSG00000115267    162318684       IFIH1             IFIH1 ENSG00000115267.1
## ENSG00000117228     89065230        GBP1              GBP1 ENSG00000117228.1
## ENSG00000118785     87983426        SPP1              SPP1 ENSG00000118785.5
## ENSG00000123689    209676390        G0S2              G0S2 ENSG00000123689.5
## ENSG00000126709     27672198        IFI6              IFI6 ENSG00000126709.8
## ENSG00000130203     44909393        APOE              APOE ENSG00000130203.5
## ENSG00000135047     87731469        CTSL              CTSL ENSG00000135047.1
## ENSG00000136235     23275108       GPNMB             GPNMB ENSG00000136235.1
## ENSG00000136689    113134016       IL1RN             IL1RN ENSG00000136689.1
## ENSG00000137965     78664078       IFI44             IFI44 ENSG00000137965.1
## ENSG00000138646     88506163       HERC5             HERC5 ENSG00000138646.1
## ENSG00000141574     82334074      SECTM1            SECTM1 ENSG00000141574.8
## ENSG00000145431    156971799       PDGFC             PDGFC ENSG00000145431.1
## ENSG00000152689     33564750     RASGRP3           RASGRP3 ENSG00000152689.1
## ENSG00000152778     89420997       IFIT5             IFIT5 ENSG00000152778.5
## ENSG00000154451     89272804        GBP5              GBP5 ENSG00000154451.1
## ENSG00000157680    137847092        DGKI              DGKI ENSG00000157680.6
## ENSG00000160932    143023832        LY6E              LY6E ENSG00000160932.5
## ENSG00000161055    180591499     SCGB3A1           SCGB3A1 ENSG00000161055.4
## ENSG00000162645     89150456        GBP2              GBP2 ENSG00000162645.4
## ENSG00000162654     89198942        GBP4              GBP4 ENSG00000162654.7
## ENSG00000165949     94116698       IFI27             IFI27 ENSG00000165949.4
## ENSG00000173762     82317608         CD7               CD7 ENSG00000173762.8
## ENSG00000174276     65117701      ZNHIT2            ZNHIT2 ENSG00000174276.6
## ENSG00000177989     50532580       ODF3B             ODF3B ENSG00000177989.5
## ENSG00000179455     23630075       MKRN3             MKRN3 ENSG00000179455.1
## ENSG00000187569      7717559       DPPA3             DPPA3 ENSG00000187569.3
## ENSG00000187608      1014540       ISG15             ISG15 ENSG00000187608.4
## ENSG00000196141    200482264     SPATS2L           SPATS2L ENSG00000196141.5
## ENSG00000198178      7751605      CLEC4C            CLEC4C ENSG00000198178.1
## ENSG00000198286      3043945      CARD11            CARD11 ENSG00000198286.3
## ENSG00000203907     73395133        OOEP              OOEP ENSG00000203907.1
## ENSG00000204291     99070792     COL15A1           COL15A1 ENSG00000204291.1
## ENSG00000204632     29831125       HLA-G             HLA-G ENSG00000204632.5
## ENSG00000214872     57550274      SMTNL1            SMTNL1 ENSG00000214872.3
## ENSG00000221963     35668404       APOL6             APOL6 ENSG00000221963.5
## ENSG00000231767    192716653                               ENSG00000231767.2
## ENSG00000269720     17394158     CCDC194           CCDC194 ENSG00000269720.1
##                     mean_cds_len
## ENSG00000011105 559.833333333333
## ENSG00000087237             1357
## ENSG00000089127            682.8
## ENSG00000106211              431
## ENSG00000108679 445.666666666667
## ENSG00000108774 482.142857142857
## ENSG00000115267             2235
## ENSG00000117228             1779
## ENSG00000118785              863
## ENSG00000123689              312
## ENSG00000126709              405
## ENSG00000130203           766.75
## ENSG00000135047              894
## ENSG00000136235           1447.5
## ENSG00000136689            484.2
## ENSG00000137965 716.666666666667
## ENSG00000138646             2532
## ENSG00000141574          400.375
## ENSG00000145431           550.25
## ENSG00000152689 906.555555555556
## ENSG00000152778             1449
## ENSG00000154451           1605.5
## ENSG00000157680           2916.6
## ENSG00000160932 306.642857142857
## ENSG00000161055              315
## ENSG00000162645             1776
## ENSG00000162654             1923
## ENSG00000165949 287.454545454545
## ENSG00000173762              461
## ENSG00000174276            885.5
## ENSG00000177989            632.5
## ENSG00000179455 547.166666666667
## ENSG00000187569              480
## ENSG00000187608 467.666666666667
## ENSG00000196141 950.952380952381
## ENSG00000198178            510.5
## ENSG00000198286 1499.66666666667
## ENSG00000203907              312
## ENSG00000204291             4146
## ENSG00000204632            835.5
## ENSG00000214872           1429.5
## ENSG00000221963             1032
## ENSG00000231767            469.5
## ENSG00000269720              705
Vennerable::plot(compare)

12 Perform dream with all samples together and a model with all factors

Now that I have performed all of the above, I think it should be possible to have a working analysis using dream that includes celltype, visitnumber, finaloutcome, donor, and perhaps SVs.

mixed_fstring <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor)"
mixed_formula <- as.formula(mixed_fstring)
mixed_fstring_svs <- "~ 0 + finaloutcome + typeofcells + visitnumber + (1|donor) + svaseq_SV1 + svaseq_SV2 + svaseq_SV3 + svaseq_SV4"
mixed_formula_svs <- as.formula(mixed_fstring_svs)
all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula)
mixed_all_celltypes_de_xlsx <- write_de_table(all_dream_de, type = "limma", excel = glue("excel/mixed_all_celltypes_nobiop_table-v{ver}.xlsx"))
all_dream_result <- all_dream_de[["all_tables"]][["failure_vs_cure"]] %>%
  arrange(desc(logFC))
fc_sig_idx <- all_dream_result[["logFC"]] >= 1.0 & all_dream_result[["z.std"]] >= 2.0
dream_sig <- rownames(all_dream_result[fc_sig_idx, ])

svs_all_dream_de <- dream_pairwise(t_clinical_nobiop, alt_model = mixed_formula_svs)
test <- hpgl_padjust(svs_all_dream_de[["all_tables"]][["failure_vs_cure"]]
                     mean_column = "AveExpr", method = "ihw", type = "limma")
t_clinical_outcomecell_fact <- paste0(pData(t_clinical_nobiop)[["finaloutcome"]], "_",
                                      pData(t_clinical_nobiop)[["typeofcells"]])
t_clinical_outcomecell <- t_clinical_nobiop
pData(t_clinical_outcomecell)[["outcomecell"]] <- t_clinical_outcomecell_fact
t_clinical_outcomecell <- set_expt_conditions(t_clinical_outcomecell, fact = "outcomecell")
## The numbers of samples by condition are:
## 
##    cure_eosinophils      cure_monocytes    cure_neutrophils failure_eosinophils 
##                  17                  21                  20                   9 
##   failure_monocytes failure_neutrophils 
##                  21                  21
t_clinical_outcomecell_de <- all_pairwise(t_clinical_outcomecell, keepers = outcometype_contrasts,
                                          model_batch = "svaseq")
## 
##    cure_eosinophils      cure_monocytes    cure_neutrophils failure_eosinophils 
##                  17                  21                  20                   9 
##   failure_monocytes failure_neutrophils 
##                  21                  21
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## edger, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## edger, cure_neutrophils_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## limma, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## limma, cure_neutrophils_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_monocytes_vs_cure_eosinophils and
## noiseq, cure_monocytes_vs_cure_eosinophils failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, cure_neutrophils_vs_cure_eosinophils and
## noiseq, cure_neutrophils_vs_cure_eosinophils failed.

mixed_fstring <- "~ 0 + condition + visitnumber + (1|donor)"
t_clinical_outcomecell_dream <- dream_pairwise(t_clinical_outcomecell,
                                               alt_model = as.formula(mixed_fstring),
                                               keepers = outcometype_contrasts)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
t_clinical_outcomecell_table <- write_de_table(t_clinical_outcomecell_dream,
                                               type = "limma",
                                               excel = glue("excel/mixed_clinical_outcomecell-v{ver}.xlsx"))
big_table <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]]
merged <- merge(big_table, all_dream_result, by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'merge': object 'all_dream_result' not found
rownames(merged) <- merged[["Row.names"]]
merged[["Row.names"]] <- NULL
cor_value <- cor.test(merged[["logFC"]], merged[["deseq_logfc"]])
cor_value
## 
##  Pearson's product-moment correlation
## 
## data:  merged[["logFC"]] and merged[["deseq_logfc"]]
## t = 177, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8601 0.8697
## sample estimates:
##   cor 
## 0.865
test_aucc <- calculate_aucc(big_table, tbl2 = monocyte_dream_result,
                            px = "deseq_adjp", py = "adj.P.Val",
                            lx = "deseq_logfc", ly = "logFC")
test_aucc
## These two tables have an aucc value of: 0.215501890577542 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 80, df = 10860, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5952 0.6190
## sample estimates:
##    cor 
## 0.6072

logfc_plotter <- plot_linear_scatter(merged[, c("logFC", "deseq_logfc")])
logfc_plot <- logfc_plotter[["scatter"]] +
  xlab("Dream log2FC with (1|donor) and visit in model") +
  ylab("DESeq2 log2FC: Default pairwise comparison")
pp(file = "images/compare_cf_and_dream_clinical_samples.png")
logfc_plot
dev.off()
## png 
##   2
logfc_plot

cor_value <- cor.test(merged[["P.Value"]], merged[["deseq_adjp"]], method = "spearman")
## Warning in cor.test.default(merged[["P.Value"]], merged[["deseq_adjp"]], :
## Cannot compute exact p-value with ties
cor_value
## 
##  Spearman's rank correlation rho
## 
## data:  merged[["P.Value"]] and merged[["deseq_adjp"]]
## S = 8.3e+10, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.5745
adjp_plotter <- plot_linear_scatter(merged[, c("P.Value", "deseq_adjp")])
## Warning in lmrob.S(x, y, control = control): S refinements did not converge (to
## refine.tol=1e-07) in 200 (= k.max) steps
## Warning in lmrob.fit(x, y, control, init = init): initial estim. 'init' not
## converged -- will be return()ed basically unchanged
adjp_plot <- adjp_plotter[["scatter"]] +
  xlab("DESeq2 adjp: Dream not-adjusted p-value") +
  ylab("DESeq2 adjp: Default pairwise comparison")
pp(file = "images/compare_cf_and_visit_in_model_monocyte_adjp.svg")
adjp_plot
dev.off()
## png 
##   2
adjp_plot

previous_sig_idx <- merged[["deseq_adjp"]] <= 0.05 & abs(merged[["deseq_logfc"]] >= 1.0)
summary(previous_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10416     116
previous_genes <- rownames(merged)[previous_sig_idx]

new_sig_idx <- abs(merged[["logFC"]]) >= 1.0 & merged[["P.Value"]] < 0.05
summary(new_sig_idx)
##    Mode   FALSE    TRUE 
## logical   10467      65
new_genes <- rownames(merged)[new_sig_idx]
na_idx <- is.na(new_genes)
new_genes <- new_genes[!na_idx]

annot <- fData(t_monocytes)
compare <- Vennerable::Venn(list("previous" = previous_genes, "new" = new_genes))
shared_genes <- compare@IntersectionSets[["11"]]
name_idx <- rownames(annot) %in% shared_genes
annot[name_idx, ]
##  [1] ensembl_gene_id       ensembl_transcript_id version              
##  [4] transcript_version    description           gene_biotype         
##  [7] cds_length            chromosome_name       strand               
## [10] start_position        end_position          hgnc_symbol          
## [13] uniprot_gn_symbol     transcript            mean_cds_len         
## <0 rows> (or 0-length row.names)

Let us use the overlap_sig() from above to see how similar this result is to our DESeq2+SVA.

all_dream_table <- all_dream_de[["all_tables"]][["failure_vs_cure"]]
## Error in eval(expr, envir, enclos): object 'all_dream_de' not found
overlap_sig(all_dream_table)
## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
overlap_sig(all_dream_table, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)
## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
all_dream_table_svs <- svs_all_dream_de[["all_tables"]][["failure_vs_cure"]]
## Error in eval(expr, envir, enclos): object 'svs_all_dream_de' not found
overlap_sig(all_dream_table_svs)
## Error in eval(expr, envir, enclos): object 'all_dream_table_svs' not found
overlap_sig(all_dream_table_svs, direction = "gt", mixed_pcol = "z.std", mixed_cutoff = 1.5)
## Error in eval(expr, envir, enclos): object 'all_dream_table_svs' not found

12.1 Recapitulating the 10 genes of interest

One figure I did not create is a venn diagram showing the overlap of the eosionphil, neutrophil, and monocyte results and the 10 genes shared among them all. At least in theory I should be easily able to create a similar/identical plot.

observed_eosinophils <- c(
  rownames(t_cf_eosinophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_eosinophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_monocytes <- c(
  rownames(t_cf_monocyte_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_monocyte_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_neutrophils <- c(
  rownames(t_cf_neutrophil_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_neutrophil_sig_sva[["deseq"]][["downs"]][["outcome"]]))
venn_input <- list(
  "eosinophil" = observed_eosinophils,
  "monocyte" = observed_monocytes,
  "neutrophils" = observed_neutrophils)
shared <- Vennerable::Venn(venn_input)
shared
## A Venn object on 3 sets named
## eosinophil,monocyte,neutrophils 
## 000 100 010 110 001 101 011 111 
##   0 136  81  10 106  33   9  12
Vennerable::plot(shared)

intersect <- "eosinophil:monocyte:neutrophils"
celltype_upset <- UpSetR::upset(UpSetR::fromList(venn_input), text.scale = 2)
celltype_upset

celltype_shared_genes <- overlap_groups(venn_input)
celltype_geneids <- overlap_geneids(celltype_shared_genes, intersect)
ids <- attr(celltype_shared_genes, "elements")[celltype_shared_genes[[intersect]]]
ids
##       eosinophil4       eosinophil6       eosinophil7       eosinophil9 
## "ENSG00000089012" "ENSG00000137959" "ENSG00000115155" "ENSG00000165949" 
##      eosinophil23      eosinophil24      eosinophil28      eosinophil41 
## "ENSG00000186654" "ENSG00000248405" "ENSG00000188672" "ENSG00000177294" 
##      eosinophil46      eosinophil52      eosinophil54     eosinophil120 
## "ENSG00000134321" "ENSG00000214872" "ENSG00000184979" "ENSG00000196526"
rows <- fData(t_monocytes)[ids, ]
rows[["hgnc_symbol"]]
##  [1] "SIRPG"        "IFI44L"       "OTOF"         "IFI27"        "PRR5"        
##  [6] "PRR5-ARHGAP8" "RHCE"         "FBXO39"       "RSAD2"        "SMTNL1"      
## [11] "USP18"        "AFAP1"

Note to self, when I rendered the html, stupid R ran out of temp files and so did not actually print the darn html document, as a result I modified the render function to try to make sure there is a clean directory in which to work; testing now. If it continues to not work, I will need to remove some of the images created in this document.

13 A question of p-values

Maria Adelaida has asked about the distribution of (non)adjusted p-values produced by the various methods we employed. I use BH by default; so lets take a moment to examine the distribution of p-values and how they get adjusted by BH and a few of the other methods.

dream_pvalues <- all_dream_table[["P.Value"]]
## Error in eval(expr, envir, enclos): object 'all_dream_table' not found
names(dream_pvalues) <- rownames(all_dream_table)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'all_dream_table' not found
deseq_pvalues <- t_cf_clinicalnb_table_sva[["data"]][["outcome"]][["deseq_p"]]
names(deseq_pvalues) <- rownames(t_cf_clinicalnb_table_sva[["data"]][["outcome"]])

## Note, my xlsx files provide these images.
plot_histogram(dream_pvalues)
## Error in eval(expr, envir, enclos): object 'dream_pvalues' not found
plot_histogram(deseq_pvalues)

Immediately we see that the values produced have very different distributions and that, though there are many low p-values produced by dream, they are far fewer than observed by deseq.

Now consider the BH correction; using it, we rank order the p-values from lowest to highest. Then we choose a denominator for every p-value which ranges from 1 to the number of elements in the set of p-values. Finally we take the minimum between 1 and the cumulative minimum of (#pvalues/denominator) * that-pvalue. Written out the process looks like this:

test_pvalues <- deseq_pvalues
idx <- order(test_pvalues)
test_pvalues <- test_pvalues[idx]
num_pvalues <- length(test_pvalues)
new_pvalues <- test_pvalues
for (i in seq_along(test_pvalues)) {
  element <- test_pvalues[i]
  new_pvalues[i] <- min(1, cummin((num_pvalues / i) * element))
}
test_against <- p.adjust(test_pvalues, method = "BH")

So, consider for a moment the first p-values produced by deseq: 1.195e-24, 3.489e-22, 9.612e-22, 4.853e-18, 9.864e-15, 3.275e-14

The new p-values will be the (number of genes / the current position) * the current element

  • (11910 / 1) * 1.195e-24 which is 1.423e-10
  • (11910 / 2) * 3.489e-22 which is 2.078e-18
  • (11910 / 3) * 9.612e-22 which is 3.816e-18
  • (11910 / 4) * 4.853e-18 which is 1.445e-14
  • (11910 / 5) * 9.864e-15 which is 2.350e-11
  • (11910 / 6) * 3.275e-14 which is 6.501e-11

In contrast, consider the first few values from dream ordered in the same fashion: 2.162e-07, 3.757e-05, 8.119e-05, 1.664e-04, 3.123e-04, 5.600e-04

These start at values which are 1e17 higher than those from DESeq and so we can expect the resulting values to end up starting at ~ 5e11 higher than similar values. Thus when we do the math (and be amused at the fact that the number of p-values in the table is a factor of 2,3,4,5,6):

11910 * 2.16e-07: 0.002573 5955 * 3.757e-5: 0.223711 3970 * 8.119e-5: 0.322297 2978 * 1.664e-4: 0.4955 2382 * 3.123e-4: 0.743836 1985 * 5.600e-4: 1.112 which is caught by pmin() and reset to 1.

14 Print some volcano plots

Having performed all of the above, let us plot some of the results with a few labels of the top-10 genes on each side of the contrasts.

num_color <- color_choices[["clinic_cf"]][["tumaco_failure"]]
den_color <- color_choices[["clinic_cf"]][["tumaco_cure"]]

cf_monocyte_table <- t_cf_monocyte_table_sva[["data"]][["outcome"]]
cf_monocyte_volcano <- plot_volcano_condition_de(
  cf_monocyte_table, "outcome", label = expected_genes,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_monocyte_volcano_labeled.svg")
cf_monocyte_volcano[["plot"]]
dev.off()
## png 
##   2
cf_monocyte_volcano[["plot"]]

cf_monocyte_volcano_top10 <- plot_volcano_condition_de(
  cf_monocyte_table, "outcome", label = 10,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_monocyte_volcano_labeled_top10-v{ver}.svg"))
cf_monocyte_volcano_top10[["plot"]]
dev.off()
## png 
##   2
cf_monocyte_volcano_top10[["plot"]]

cf_eosinophil_table <- t_cf_eosinophil_table_sva[["data"]][["outcome"]]
cf_eosinophil_volcano <- plot_volcano_condition_de(
  cf_eosinophil_table, "outcome", label = expected_genes,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_eosinophil_volcano_labeled.svg")
cf_eosinophil_volcano[["plot"]]
dev.off()
## png 
##   2
cf_eosinophil_volcano[["plot"]]

cf_eosinophil_volcano_top10 <- plot_volcano_condition_de(
  cf_eosinophil_table, "outcome", label = 10,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_eosinophil_volcano_labeled_top10-v{ver}.svg"))
cf_eosinophil_volcano_top10[["plot"]]
dev.off()
## png 
##   2
cf_eosinophil_volcano_top10[["plot"]]

cf_neutrophil_table <- t_cf_neutrophil_table_sva[["data"]][["outcome"]]
cf_neutrophil_volcano <- plot_volcano_condition_de(
  cf_neutrophil_table, "outcome", label = expected_genes,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/cf_neutrophil_volcano_labeled.svg")
cf_neutrophil_volcano[["plot"]]
dev.off()
## png 
##   2
cf_neutrophil_volcano[["plot"]]

cf_neutrophil_volcano_top10 <- plot_volcano_condition_de(
  cf_neutrophil_table, "outcome", label = 10,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = glue("images/cf_neutrophil_volcano_labeled_top10-v{ver}.svg"))
cf_neutrophil_volcano_top10[["plot"]]
dev.off()
## png 
##   2
cf_neutrophil_volcano_top10[["plot"]]

15 Eosinophil time comparisons

15.1 Visit 1

t_cf_eosinophil_v1_de_sva <- all_pairwise(tv1_eosinophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              5              3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_v1_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8703
## basic_vs_dream      0.8892
## basic_vs_ebseq      0.8820
## basic_vs_edger      0.8716
## basic_vs_limma      0.9207
## basic_vs_noiseq     0.8842
## deseq_vs_dream      0.8326
## deseq_vs_ebseq      0.8647
## deseq_vs_edger      0.9996
## deseq_vs_limma      0.8464
## deseq_vs_noiseq     0.8575
## dream_vs_ebseq      0.8401
## dream_vs_edger      0.8359
## dream_vs_limma      0.9865
## dream_vs_noiseq     0.8565
## ebseq_vs_edger      0.8681
## ebseq_vs_limma      0.8493
## ebseq_vs_noiseq     0.9975
## edger_vs_limma      0.8498
## edger_vs_noiseq     0.8613
## limma_vs_noiseq     0.8644
t_cf_eosinophil_v1_table_sva <- combine_de_tables(
  t_cf_eosinophil_v1_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v1_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          13            19          11
##   edger_sigdown limma_sigup limma_sigdown
## 1            10           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_v1_sig_sva <- extract_significant_genes(
  t_cf_eosinophil_v1_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v1_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v1_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          13            19          11
##   edger_sigdown limma_sigup limma_sigdown
## 1            10           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 13 84
dim(t_cf_eosinophil_v1_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 19 84

15.2 Visit 2

t_cf_eosinophil_v2_de_sva <- all_pairwise(tv2_eosinophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              6              3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_v2_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8446
## basic_vs_dream      0.8578
## basic_vs_ebseq      0.8616
## basic_vs_edger      0.8465
## basic_vs_limma      0.8936
## basic_vs_noiseq     0.8899
## deseq_vs_dream      0.8310
## deseq_vs_ebseq      0.8802
## deseq_vs_edger      0.9996
## deseq_vs_limma      0.8540
## deseq_vs_noiseq     0.8632
## dream_vs_ebseq      0.7282
## dream_vs_edger      0.8348
## dream_vs_limma      0.9758
## dream_vs_noiseq     0.8017
## ebseq_vs_edger      0.8815
## ebseq_vs_limma      0.7589
## ebseq_vs_noiseq     0.9116
## edger_vs_limma      0.8581
## edger_vs_noiseq     0.8662
## limma_vs_noiseq     0.8158
t_cf_eosinophil_v2_table_sva <- combine_de_tables(
  t_cf_eosinophil_v2_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v2_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure           9             4          10
##   edger_sigdown limma_sigup limma_sigdown
## 1             1           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_v2_sig_sva <- extract_significant_genes(
  t_cf_eosinophil_v2_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v2_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v2_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       10          1        9          4        9
##         ebseq_down basic_up basic_down
## outcome         17        0          0

dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["ups"]][[1]])
## [1]  9 84
dim(t_cf_eosinophil_v2_sig_sva[["deseq"]][["downs"]][[1]])
## [1]  4 84

15.3 Visit 3

t_cf_eosinophil_v3_de_sva <- all_pairwise(tv3_eosinophils, model_batch = "svaseq",
                                          filter = TRUE,
                                          methods = methods)
## 
##    tumaco_cure tumaco_failure 
##              6              3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_cf_eosinophil_v3_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 tmc_flr___
## basic_vs_deseq      0.8019
## basic_vs_dream      0.8287
## basic_vs_ebseq      0.8927
## basic_vs_edger      0.8021
## basic_vs_limma      0.8620
## basic_vs_noiseq     0.8925
## deseq_vs_dream      0.8661
## deseq_vs_ebseq      0.8832
## deseq_vs_edger      1.0000
## deseq_vs_limma      0.9137
## deseq_vs_noiseq     0.8126
## dream_vs_ebseq      0.7588
## dream_vs_edger      0.8663
## dream_vs_limma      0.9683
## dream_vs_noiseq     0.8097
## ebseq_vs_edger      0.8833
## ebseq_vs_limma      0.7984
## ebseq_vs_noiseq     0.9402
## edger_vs_limma      0.9138
## edger_vs_noiseq     0.8129
## limma_vs_noiseq     0.7990
t_cf_eosinophil_v3_table_sva <- combine_de_tables(
  t_cf_eosinophil_v3_de_sva, keepers = t_cf_contrast, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cf_eosinophil_v3_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          68            29          73
##   edger_sigdown limma_sigup limma_sigdown
## 1            10           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_cf_eosinophil_v3_sig_sva <- extract_significant_genes(
  t_cf_eosinophil_v3_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_v3_cf_sig_sva-v{ver}.xlsx"))
t_cf_eosinophil_v3_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## outcome        0          0       73         10       68         29        2
##         ebseq_down basic_up basic_down
## outcome          9        0          0

dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["ups"]][[1]])
## [1] 68 84
dim(t_cf_eosinophil_v3_sig_sva[["deseq"]][["downs"]][[1]])
## [1] 29 84

15.4 Eosinophils: Compare sva to batch-in-visit

sva_aucc <- calculate_aucc(t_cf_eosinophil_table_sva[["data"]][[1]],
                           tbl2 = t_cf_eosinophil_table_batchvisit[["data"]][[1]],
                           py = "deseq_adjp", ly = "deseq_logfc")
sva_aucc
## These two tables have an aucc value of: 0.576029928864987 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 152, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.823 0.835
## sample estimates:
##    cor 
## 0.8291

shared_ids <- rownames(t_cf_eosinophil_table_sva[["data"]][[1]]) %in%
  rownames(t_cf_eosinophil_table_batchvisit[["data"]][[1]])
first <- t_cf_eosinophil_table_sva[["data"]][[1]][shared_ids, ]
second <- t_cf_eosinophil_table_batchvisit[["data"]][[1]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])
## 
##  Pearson's product-moment correlation
## 
## data:  first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 152, df = 10530, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.823 0.835
## sample estimates:
##    cor 
## 0.8291

15.5 Compare monocyte CF, neutrophil CF, eosinophil CF

t_mono_neut_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
                                       tbl2 = t_cf_neutrophil_table_sva[["data"]][["outcome"]],
                                       py = "deseq_adjp", ly = "deseq_logfc")
t_mono_neut_sva_aucc
## These two tables have an aucc value of: 0.204316386168083 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 43, df = 8577, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4028 0.4376
## sample estimates:
##    cor 
## 0.4203

t_mono_eo_sva_aucc <- calculate_aucc(t_cf_monocyte_table_sva[["data"]][["outcome"]],
                                     tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
                                     py = "deseq_adjp", ly = "deseq_logfc")
t_mono_eo_sva_aucc
## These two tables have an aucc value of: 0.0963678364630121 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 22, df = 9765, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2015 0.2393
## sample estimates:
##    cor 
## 0.2205

t_neut_eo_sva_aucc <- calculate_aucc(t_cf_neutrophil_table_sva[["data"]][["outcome"]],
                                     tbl2 = t_cf_eosinophil_table_sva[["data"]][["outcome"]],
                                     py = "deseq_adjp", ly = "deseq_logfc")
t_neut_eo_sva_aucc
## These two tables have an aucc value of: 0.20148477670576 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 42, df = 8571, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3973 0.4323
## sample estimates:
##   cor 
## 0.415

16 By visit

For these contrasts, we want to see fail_v1 vs. cure_v1, fail_v2 vs. cure_v2 etc. As a result, we will need to juggle the data slightly and add another set of contrasts.

16.1 Cure/Fail by visits, all cell types

t_visit_cf_all_de_sva <- all_pairwise(t_visitcf, model_batch = "svaseq",
                                      filter = TRUE,
                                      methods = methods)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##         30         24         20         15         17         17
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_visit_cf_all_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_all_table_sva <- combine_de_tables(
  t_visit_cf_all_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/t_all_visitcf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_all_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure          26            76          26            58
## 2 v2_failure_vs_v2_cure          51            41          43            28
## 3 v3_failure_vs_v3_cure          77            32          33            25
##   limma_sigup limma_sigdown
## 1           9            17
## 2           3             0
## 3           3             0
## Plot describing unique/shared genes in a differential expression table.

t_visit_cf_all_sig_sva <- extract_significant_genes(
  t_visit_cf_all_table_sva,
  excel = glue("{cf_prefix}/t_all_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_all_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        9         17       26         58       26         76        0
## v2cf        3          0       43         28       51         41        0
## v3cf        3          0       33         25       77         32        1
##      ebseq_down basic_up basic_down
## v1cf         37        0          0
## v2cf          0        0          0
## v3cf          0        0          0

16.2 Cure/Fail by visit, Monocytes

In the following block, I am including all samples for the monocytes and splitting them up by visit and then comparing v1 cure/fail, v2 cure/fail, v3 cure/fail.

I expect that this should be more robust than the datasets of only visit 1.

visitcf_factor <- paste0("v", pData(t_monocytes)[["visitnumber"]], "_",
                         pData(t_monocytes)[["finaloutcome"]])
t_monocytes_visitcf <- set_expt_conditions(t_monocytes, fact = visitcf_factor)
## The numbers of samples by condition are:
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          6          7
t_visit_cf_monocyte_de_sva <- all_pairwise(t_monocytes_visitcf, model_batch = "svaseq",
                                           filter = TRUE,
                                           methods = methods)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          6          7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_visit_cf_monocyte_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_monocyte_table_sva <- combine_de_tables(
  t_visit_cf_monocyte_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_table_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_monocyte_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure          15            10          10            13
## 2 v2_failure_vs_v2_cure           0             0           0             0
## 3 v3_failure_vs_v3_cure           0             0           0             0
##   limma_sigup limma_sigdown
## 1           1             1
## 2           0             0
## 3           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

t_visit_cf_monocyte_sig_sva <- extract_significant_genes(
  t_visit_cf_monocyte_table_sva,
  excel = glue("{cf_prefix}/Monocytes/t_monocyte_visitcf_sig_sva-v{ver}.xlsx"))
t_visit_cf_monocyte_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        1          1       10         13       15         10        0
## v2cf        0          0        0          0        0          0        1
## v3cf        0          0        0          0        0          0        0
##      ebseq_down basic_up basic_down
## v1cf         15        0          0
## v2cf          5        0          0
## v3cf          1        0          0

t_v1fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v1cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v1_maplot.png")
t_v1fc_deseq_ma
closed <- dev.off()
t_v1fc_deseq_ma

t_v2fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v2cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v2_maplot.png")
t_v2fc_deseq_ma
closed <- dev.off()
t_v2fc_deseq_ma

t_v3fc_deseq_ma <- t_visit_cf_monocyte_table_sva[["plots"]][["v3cf"]][["deseq_ma_plots"]]
dev <- pp(file = "images/monocyte_cf_de_v3_maplot.png")
t_v3fc_deseq_ma
closed <- dev.off()
t_v3fc_deseq_ma

One query from Alejandro is to look at the genes shared up/down across visits. I am not entirely certain we have enough samples for this to work, but let us find out.

I am thinking this is a good place to use the AUCC curves I learned about thanks to Julie Cridland.

Note that the following is all monocyte samples, this should therefore potentially be moved up and a version of this with only the Tumaco samples put here?

v1cf <- t_visit_cf_monocyte_table_sva[["data"]][["v1cf"]]
v2cf <- t_visit_cf_monocyte_table_sva[["data"]][["v2cf"]]
v3cf <- t_visit_cf_monocyte_table_sva[["data"]][["v3cf"]]

v1_sig <- c(
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v1cf"]]),
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v1cf"]]))
length(v1_sig)
## [1] 25
v2_sig <- c(
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v2_sig)
## [1] 0
v3_sig <- c(
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["ups"]][["v2cf"]]),
  rownames(t_visit_cf_monocyte_sig_sva[["deseq"]][["downs"]][["v2cf"]]))
length(v3_sig)
## [1] 0
t_monocyte_visit_aucc_v2v1 <- calculate_aucc(v1cf, tbl2 = v2cf,
                                             py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v2v1_aucc.png")
t_monocyte_visit_aucc_v2v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v2v1[["plot"]]

t_monocyte_visit_aucc_v3v1 <- calculate_aucc(v1cf, tbl2 = v3cf,
                                             py = "deseq_adjp", ly = "deseq_logfc")
dev <- pp(file = "images/monocyte_visit_v3v1_aucc.png")
t_monocyte_visit_aucc_v3v1[["plot"]]
closed <- dev.off()
t_monocyte_visit_aucc_v3v1[["plot"]]

16.3 Cure/Fail by visit, Neutrophils

visitcf_factor <- paste0("v", pData(t_neutrophils)[["visitnumber"]], "_",
                         pData(t_neutrophils)[["finaloutcome"]])
t_neutrophil_visitcf <- set_expt_conditions(t_neutrophils, fact = visitcf_factor)
## The numbers of samples by condition are:
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          5          7
t_visit_cf_neutrophil_de_sva <- all_pairwise(t_neutrophil_visitcf, model_batch = "svaseq",
                                             filter = TRUE,
                                             methods = methods)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          8          8          7          6          5          7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

t_visit_cf_neutrophil_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_neutrophil_table_sva <- combine_de_tables(
  t_visit_cf_neutrophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_table_sva-v{ver}.xlsx"))
## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_table_sva-v202501.xlsx before writing the tables.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_neutrophil_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure          12             6           6             6
## 2 v2_failure_vs_v2_cure           2             6           2             3
## 3 v3_failure_vs_v3_cure           2             2           0             2
##   limma_sigup limma_sigdown
## 1           1             0
## 2           0             0
## 3           0             0
## Plot describing unique/shared genes in a differential expression table.

t_visit_cf_neutrophil_sig_sva <- extract_significant_genes(
  t_visit_cf_neutrophil_table_sva,
  excel = glue("{cf_prefix}/Neutrophils/t_neutrophil_visitcf_sig_sva-v{ver}.xlsx"))
## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Neutrophils/t_neutrophil_visitcf_sig_sva-v202501.xlsx before writing the tables.
t_visit_cf_neutrophil_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        1          0        6          6       12          6        0
## v2cf        0          0        2          3        2          6        1
## v3cf        0          0        0          2        2          2        2
##      ebseq_down basic_up basic_down
## v1cf          2        0          0
## v2cf          1        0          0
## v3cf          3        0          0

16.4 Cure/Fail by visit, Eosinophils

visitcf_factor <- paste0("v", pData(t_eosinophils)[["visitnumber"]], "_",
                         pData(t_eosinophils)[["finaloutcome"]])
t_eosinophil_visitcf <- set_expt_conditions(t_eosinophils, fact = visitcf_factor)
## The numbers of samples by condition are:
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          5          3          6          3          6          3
t_visit_cf_eosinophil_de_sva <- all_pairwise(t_eosinophil_visitcf, model_batch = "svaseq",
                                             filter = TRUE,
                                             methods = methods, keepers = visitcf_contrasts)
## 
##    v1_cure v1_failure    v2_cure v2_failure    v3_cure v3_failure 
##          5          3          6          3          6          3
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and edger,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and edger,
## v2_failure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and limma,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and limma,
## v2_failure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_cure_vs_v1_cure and noiseq,
## v2_cure_vs_v1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, v2_failure_vs_v1_cure and noiseq,
## v2_failure_vs_v1_cure failed.

t_visit_cf_eosinophil_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_visit_cf_eosinophil_table_sva <- combine_de_tables(
  t_visit_cf_eosinophil_de_sva, keepers = visitcf_contrasts, scale_p = TRUE,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_table_sva-v{ver}.xlsx"))
## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_table_sva-v202501.xlsx before writing the tables.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_visit_cf_eosinophil_table_sva
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 v1_failure_vs_v1_cure           9            11           2             3
## 2 v2_failure_vs_v2_cure           4             3           5             2
## 3 v3_failure_vs_v3_cure          14             7          17             2
##   limma_sigup limma_sigdown
## 1           0             1
## 2           0             0
## 3           0             0
## Plot describing unique/shared genes in a differential expression table.

t_visit_cf_eosinophil_sig_sva <- extract_significant_genes(
  t_visit_cf_eosinophil_table_sva,
  excel = glue("{cf_prefix}/Eosinophils/t_eosinophil_visitcf_sig_sva-v{ver}.xlsx"))
## Deleting the file analyses/4_tumaco/DE_Cure_Fail/Eosinophils/t_eosinophil_visitcf_sig_sva-v202501.xlsx before writing the tables.
t_visit_cf_eosinophil_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## v1cf        0          1        2          3        9         11        4
## v2cf        0          0        5          2        4          3       11
## v3cf        0          0       17          2       14          7        3
##      ebseq_down basic_up basic_down
## v1cf         86        0          0
## v2cf         18        0          0
## v3cf         10        0          0

17 Shared genes in visit 1

Let us see how many genes are shared across these three visits using only the visit 1 data.

observed_v1_eosinophils <- c(
  rownames(t_cf_eosinophil_v1_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_eosinophil_v1_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_v1_monocytes <- c(
  rownames(t_cf_monocyte_v1_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_monocyte_v1_sig_sva[["deseq"]][["downs"]][["outcome"]]))
observed_v1_neutrophils <- c(
  rownames(t_cf_neutrophil_v1_sig_sva[["deseq"]][["ups"]][["outcome"]]),
  rownames(t_cf_neutrophil_v1_sig_sva[["deseq"]][["downs"]][["outcome"]]))
venn_input <- list(
  "eosinophil" = observed_v1_eosinophils,
  "monocyte" = observed_v1_monocytes,
  "neutrophils" = observed_v1_neutrophils)
shared <- Vennerable::Venn(venn_input)
shared
## A Venn object on 3 sets named
## eosinophil,monocyte,neutrophils 
## 000 100 010 110 001 101 011 111 
##   0  30  63   2  12   0   1   0
Vennerable::plot(shared)

Najib suggests that we should look at all cell types together at visit 1. Let us try and see what happens… Oh, I already did this in the block ‘Separate the Tumaco data by visit’ above.

Let us add a new block in which we test a concern: if we explicitly add visit to the model (with sva, potentially without too), will that change the results we observe? My assumption is that it should change the results very minimally; but we should make absolutely certain that this is true. The neutrophils are the place to test this first because they have some of the most variance observed in the data.

Therefore I want to have an instance of the pairwise contrast that has a model of ~ finaloutcome + visitnumber + SVs where the SVs come from an invocation of sva which also has finaloutcome + visitnumber before the null model.

In theory, all_pairwise() is able to do this via the argument alt_model, but it may be safer to do it manually in order to absolutely ensure that nothing unintended happens.

18 Persistence in visit 3

Having put some SL read mapping information in the sample sheet, Maria Adelaida added a new column using it with the putative persistence state on a per-sample basis. One question which arised from that: what differences are observable between the persistent yes vs. no samples on a per-cell-type basis among the visit 3 samples.

18.1 Setting up

First things first, create the datasets.

persistence_expt <- subset_expt(t_clinical, subset = "persistence=='Y'|persistence=='N'") %>%
  subset_expt(subset = 'visitnumber==3') %>%
  set_expt_conditions(fact = 'persistence')
## subset_expt(): There were 123, now there are 83 samples.
## subset_expt(): There were 83, now there are 30 samples.
## The numbers of samples by condition are:
## 
##  N  Y 
##  6 24
## persistence_biopsy <- subset_expt(persistence_expt, subset = "typeofcells=='biopsy'")
persistence_monocyte <- subset_expt(persistence_expt, subset = "typeofcells=='monocytes'")
## subset_expt(): There were 30, now there are 12 samples.
persistence_neutrophil <- subset_expt(persistence_expt, subset = "typeofcells=='neutrophils'")
## subset_expt(): There were 30, now there are 10 samples.
persistence_eosinophil <- subset_expt(persistence_expt, subset = "typeofcells=='eosinophils'")
## subset_expt(): There were 30, now there are 8 samples.

18.2 Take a look

See if there are any patterns which look usable.

## All
persistence_norm <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
                                   norm = "quant", filter = TRUE)
## Removing 2767 low-count genes (11389 remaining).
## transform_counts: Found 15 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_norm)[["plot"]]

persistence_nb <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
                                 batch = "svaseq", filter = TRUE)
## Removing 2767 low-count genes (11389 remaining).
## Setting 1544 low elements to zero.
## transform_counts: Found 1544 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_nb)[["plot"]]

## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
##                                   norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)[["plot"]]
## Insufficient data

## Monocytes
persistence_monocyte_norm <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
                                            norm = "quant", filter = TRUE)
## Removing 3827 low-count genes (10329 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_norm)[["plot"]]

persistence_monocyte_nb <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
                                          batch = "svaseq", filter = TRUE)
## Removing 3827 low-count genes (10329 remaining).
## Setting 47 low elements to zero.
## transform_counts: Found 47 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_nb)[["plot"]]

## Neutrophils
persistence_neutrophil_norm <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
                                              norm = "quant", filter = TRUE)
## Removing 5762 low-count genes (8394 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_norm)[["plot"]]

persistence_neutrophil_nb <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
                                            batch = "svaseq", filter = TRUE)
## Removing 5762 low-count genes (8394 remaining).
## Setting 46 low elements to zero.
## transform_counts: Found 46 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_nb)[["plot"]]

## Eosinophils
persistence_eosinophil_norm <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
                                              norm = "quant", filter = TRUE)
## Removing 4126 low-count genes (10030 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_norm)[["plot"]]

persistence_eosinophil_nb <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
                                            batch = "svaseq", filter = TRUE)
## Removing 4126 low-count genes (10030 remaining).
## Setting 25 low elements to zero.
## transform_counts: Found 25 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_nb)[["plot"]]

18.3 persistence DE

This is pretty sparse and unlikely to yield any interesting results I am thinking.

persistence_de_sva <- all_pairwise(persistence_expt, filter = TRUE, methods = methods,
                                   model_batch = "svaseq")
## 
##  N  Y 
##  6 24
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

persistence_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 Y_vs_N
## basic_vs_deseq  0.7178
## basic_vs_dream  0.7992
## basic_vs_ebseq  0.7451
## basic_vs_edger  0.7791
## basic_vs_limma  0.8217
## basic_vs_noiseq 0.9152
## deseq_vs_dream  0.8040
## deseq_vs_ebseq  0.7777
## deseq_vs_edger  0.9605
## deseq_vs_limma  0.8112
## deseq_vs_noiseq 0.7448
## dream_vs_ebseq  0.7899
## dream_vs_edger  0.8695
## dream_vs_limma  0.9789
## dream_vs_noiseq 0.7236
## ebseq_vs_edger  0.7900
## ebseq_vs_limma  0.7876
## ebseq_vs_noiseq 0.8327
## edger_vs_limma  0.8765
## edger_vs_noiseq 0.8002
## limma_vs_noiseq 0.7477
persistence_table_sva <- combine_de_tables(
  persistence_de_sva, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Persistence/persistence_all_de_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_table_sva
## A set of combined differential expression results.
##    table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N          55            44          26            49           7
##   limma_sigdown
## 1            22
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

persistence_monocyte_de_sva <- all_pairwise(persistence_monocyte, filter = TRUE,
                                            model_batch = "svaseq",
                                            methods = methods)
## 
##  N  Y 
##  2 10
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

persistence_monocyte_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 Y_vs_N
## basic_vs_deseq  0.9237
## basic_vs_dream  0.9683
## basic_vs_ebseq  0.9209
## basic_vs_edger  0.9245
## basic_vs_limma  0.9858
## basic_vs_noiseq 0.9405
## deseq_vs_dream  0.9268
## deseq_vs_ebseq  0.9808
## deseq_vs_edger  1.0000
## deseq_vs_limma  0.9260
## deseq_vs_noiseq 0.9677
## dream_vs_ebseq  0.9180
## dream_vs_edger  0.9277
## dream_vs_limma  0.9821
## dream_vs_noiseq 0.9421
## ebseq_vs_edger  0.9809
## ebseq_vs_limma  0.9239
## ebseq_vs_noiseq 0.9823
## edger_vs_limma  0.9270
## edger_vs_noiseq 0.9685
## limma_vs_noiseq 0.9426
persistence_monocyte_table_sva <- combine_de_tables(
  persistence_monocyte_de_sva, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Persistence/persistence_monocyte_de_sva-v{ver}.xlsx"))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_monocyte_table_sva
## A set of combined differential expression results.
##    table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N           1             0           0             1           0
##   limma_sigdown
## 1             0
## Only Y_vs_N_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
persistence_neutrophil_de_sva <- all_pairwise(persistence_neutrophil, filter = TRUE,
                                              model_batch = "svaseq",
                                              methods = methods)
## 
## N Y 
## 3 7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

persistence_neutrophil_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 Y_vs_N
## basic_vs_deseq  0.8270
## basic_vs_dream  0.8581
## basic_vs_ebseq  0.9144
## basic_vs_edger  0.8283
## basic_vs_limma  0.8808
## basic_vs_noiseq 0.9393
## deseq_vs_dream  0.9564
## deseq_vs_ebseq  0.7485
## deseq_vs_edger  0.9985
## deseq_vs_limma  0.9407
## deseq_vs_noiseq 0.8211
## dream_vs_ebseq  0.7597
## dream_vs_edger  0.9558
## dream_vs_limma  0.9858
## dream_vs_noiseq 0.8212
## ebseq_vs_edger  0.7601
## ebseq_vs_limma  0.7776
## ebseq_vs_noiseq 0.9725
## edger_vs_limma  0.9408
## edger_vs_noiseq 0.8250
## limma_vs_noiseq 0.8296
persistence_neutrophil_table_sva <- combine_de_tables(
  persistence_neutrophil_de_sva, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Persistence/persistence_neutrophil_de_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
persistence_neutrophil_table_sva
## A set of combined differential expression results.
##    table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 Y_vs_N          26            49          17            35           0
##   limma_sigdown
## 1             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

## There are insufficient samples (1) in the 'N' category.
##persistence_eosinophil_de_sva <- all_pairwise(persistence_eosinophil, filter = TRUE,
#model_batch = "svaseq",
##                                              methods = methods)
##persistence_eosinophil_de_sva
##persistence_eosinophil_table_sva <- combine_de_tables(
##  persistence_eosinophil_de_sva,
##  excel = glue("{xlsx_prefix}/DE_Persistence/persistence_eosinophil_de_sva-v{ver}.xlsx"))

19 Comparing visits without regard to cure/fail

In the following, I am hoping to lower variance associated with factors other than visit via sva and therefore be able to see what genes are changing for everyone with respect to time.

This is the one instance where I think it would be really nice to have biopsy samples for all three visits; I presume that we would have a really nice signal of stuff like keratin and other wound-healing associated genes.

19.1 All cell types

t_visit_all_de_sva <- all_pairwise(t_visit, filter = TRUE, methods = methods,
                                   model_batch = "svaseq")
## 
##  3  2  1 
## 34 35 40
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero

t_visit_all_de_sva
## Error in eval(expr, envir, enclos): object 't_visit_all_de_sva' not found
t_visit_all_table_sva <- combine_de_tables(
  t_visit_all_de_sva, keepers = visit_contrasts, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_table_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_all_de_sva' not found
t_visit_all_table_sva
## Error in eval(expr, envir, enclos): object 't_visit_all_table_sva' not found
t_visit_all_sig_sva <- extract_significant_genes(
  t_visit_all_table_sva,
  excel = glue("{xlsx_prefix}/DE_Visits/t_all_visit_sig_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_all_table_sva' not found
t_visit_all_sig_sva
## Error in eval(expr, envir, enclos): object 't_visit_all_sig_sva' not found

19.2 Monocyte samples

t_visit_monocytes <- set_expt_conditions(t_monocytes, fact = "visitnumber")
## The numbers of samples by condition are:
## 
##  3  2  1 
## 13 13 16
t_visit_monocyte_de_sva <- all_pairwise(t_visit_monocytes, filter = TRUE,
                                        model_batch = "svaseq",
                                        methods = methods)
## 
##  3  2  1 
## 13 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero

t_visit_monocyte_de_sva
## Error in eval(expr, envir, enclos): object 't_visit_monocyte_de_sva' not found
t_visit_monocyte_table_sva <- combine_de_tables(
  t_visit_monocyte_de_sva, keepers = visit_contrasts, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_table_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_monocyte_de_sva' not found
t_visit_monocyte_table_sva
## Error in eval(expr, envir, enclos): object 't_visit_monocyte_table_sva' not found
t_visit_monocyte_sig_sva <- extract_significant_genes(
  t_visit_monocyte_table_sva,
  excel = glue("{xlsx_prefix}/DE_Visits/Monocytes/t_monocyte_visit_sig_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_monocyte_table_sva' not found
t_visit_monocyte_sig_sva
## Error in eval(expr, envir, enclos): object 't_visit_monocyte_sig_sva' not found

19.3 Neutrophil samples

t_visit_neutrophils <- set_expt_conditions(t_neutrophils, fact = "visitnumber")
## The numbers of samples by condition are:
## 
##  3  2  1 
## 12 13 16
t_visit_neutrophil_de_sva <- all_pairwise(t_visit_neutrophils, filter = TRUE,
                                          model_batch = "svaseq",
                                          methods = methods)
## 
##  3  2  1 
## 12 13 16
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## The contrast c2 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast c3 is not in the results.
## If this is not an extra contrast, then this is an error.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of basic, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of deseq, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c2_vs_c1 and ebseq, c2_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c1 and ebseq, c3_vs_c1 failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of dream, c3_vs_c2 and ebseq, c3_vs_c2 failed.
## Error in if (contr %in% extra_eval_names) {: argument is of length zero

t_visit_neutrophil_de_sva
## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_de_sva' not found
t_visit_neutrophil_table_sva <- combine_de_tables(
  t_visit_neutrophil_de_sva, keepers = visit_contrasts, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_table_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_de_sva' not found
t_visit_neutrophil_table_sva
## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_table_sva' not found
t_visit_neutrophil_sig_sva <- extract_significant_genes(
  t_visit_neutrophil_table_sva,
  excel = glue("{xlsx_prefix}/DE_Visits/Neutrophils/t_neutrophil_visit_sig_sva-v{ver}.xlsx"))
## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_table_sva' not found
t_visit_neutrophil_sig_sva
## Error in eval(expr, envir, enclos): object 't_visit_neutrophil_sig_sva' not found

19.4 Eosinophil samples

t_visit_eosinophils <- set_expt_conditions(t_eosinophils, fact="visitnumber")

t_visit_eosinophil_de <- all_pairwise(t_visit_eosinophils, filter = TRUE,
                                      model_batch = "svaseq",
                                      methods = methods)
t_visit_eosinophil_de
t_visit_eosinophil_table <- combine_de_tables(
  t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
  excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_table_sva-v{ver}.xlsx"))
t_visit_eosinophil_table
t_visit_eosinophil_sig <- extract_significant_genes(
  t_visit_eosinophil_table,
  excel = glue("{xlsx_prefix}/DE_Visits/Eosinophils/t_eosinophil_visit_sig_sva-v{ver}.xlsx"))
## No significant genes observed.
## Error: <text>:9:3: unexpected symbol
## 8:   t_visit_eosinophil_de, keepers = visit_contrasts, scale_p = TRUE
## 9:   excel
##      ^

20 Explore ROC

Alejandro showed some ROC curves for eosinophil data showing sensitivity vs. specificity of a couple genes which were observed in v1 eosinophils vs. all-times eosinophils across cure/fail. I am curious to better understand how this was done and what utility it might have in other contexts.

To that end, I want to try something similar myself. In order to properly perform the analysis with these various tools, I need to reconfigure the data in a pretty specific format:

  1. Single df with 1 row per set of observations (sample in this case I think)
  2. The outcome column(s) need to be 1 (or more?) metadata factor(s) (cure/fail or a paste0 of relevant queries (eo_v1_cure, eo_v123_cure, etc)
  3. The predictor column(s) are the measurements (rpkm of 1 or more genes), 1 column each gene.

If I intend to use this for our tx data, I will likely need a utility function to create the properly formatted input df.

For the purposes of my playing, I will choose three genes from the eosinophil C/F table, one which is significant, one which is not, and an arbitrary.

The input genes will therefore be chosen from the data structure: t_cf_eosinophil_table_sva:

ENSG00000198178, ENSG00000179344, ENSG00000182628

eo_rpkm <- normalize_expt(tv1_eosinophils, convert = "rpkm", column = "cds_length")
## There appear to be 5355 genes without a length.

21 An external dataset

This paper is DOI:10.1126/scitranslmed.aax4204

Variable gene expression and parasite load predict treatment outcome in cutaneous leishmaniasis.

One query from Maria Adelaida is to see how this data fits with ours. I have read this paper a couple of times now and I get confused on a couple of points every time, which I will explain in a moment. The expermental design is key to my confusion and key to what I think is being missed in our interpretation of the results:

  1. The PCA is not cure vs. fail but healthy skin vs. CL lesion. It should be said that the text makes this perfectly clear, but I can never seem to remember that when I go to look at the data; presumably because I am thinking primarily about cure/fail.

21.1 Only the Scott data

external_norm <- normalize_expt(external_cf, filter = TRUE, norm = "quant",
                                convert = "cpm", transform = "log2")
## Removing 7327 low-count genes (14154 remaining).
plot_pca(external_norm)
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.

external_nb <- normalize_expt(external_cf, filter = TRUE, batch = "svaseq",
                                convert = "cpm", transform = "log2")
## Removing 7327 low-count genes (14154 remaining).
## Setting 171 low elements to zero.
## transform_counts: Found 171 values equal to 0, adding 1 to the matrix.
plot_pca(external_nb)
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.

external_de <- all_pairwise(external_cf, filter = TRUE, methods = methods,
                            model_batch = "svaseq")
## 
##    cure failure 
##      14       7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

external_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.3908
## basic_vs_dream      0.4488
## basic_vs_ebseq      0.9027
## basic_vs_edger      0.3914
## basic_vs_limma      0.4180
## basic_vs_noiseq     0.9604
## deseq_vs_dream      0.8718
## deseq_vs_ebseq      0.4149
## deseq_vs_edger      0.9997
## deseq_vs_limma      0.8487
## deseq_vs_noiseq     0.4412
## dream_vs_ebseq      0.4304
## dream_vs_edger      0.8727
## dream_vs_limma      0.9654
## dream_vs_noiseq     0.4269
## ebseq_vs_edger      0.4177
## ebseq_vs_limma      0.3577
## ebseq_vs_noiseq     0.9407
## edger_vs_limma      0.8497
## edger_vs_noiseq     0.4418
## limma_vs_noiseq     0.3627
external_table <- combine_de_tables(
  external_de, scale_p = TRUE,
  excel = "excel/scott_table.xlsx")
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
external_table
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure           0             0           0             0
##   limma_sigup limma_sigdown
## 1           0             0
## Only  has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
external_sig <- extract_significant_genes(external_table, excel = "excel/scott_sig.xlsx")
external_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                 limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure        0          0        0          0        0          0
##                 ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure        0          0        0          0
external_top100 <- extract_significant_genes(external_table, n = 100)
external_up <- external_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
external_down <- external_top100[["deseq"]][["downs"]][["failure_vs_cure"]]

21.2 An explicit comparison of methods.

I think I am getting a significantly different result from Scott, so I am going to do an explicit side-by-side comparison of our results at each step. In order to do this, I am using the capsule they kindly provided with their publication.

I am copy/pasting material from their publication with some modification which I will note as I go.

Here is their block ‘r packages’

Note/Spoiler alert: It actually turns out our results are basically relatively similar, I just didn’t understand what comparisons are actually in paper vs those I have primary interest. In addition, we handled gene IDs differently (gene card vs. EnsemblID) which has a surprisingly big effect.

Oh, I just realized that when I did these analyses, I did them in a completely separate tree and compared the results post-facto. This assumption remains in this document and therefore is unlikely to work properly in the containerized environment I am attempting to create. Given that the primary goal of this section is to show to myself that I compared the two datasets as thoroughly as I could, perhaps I should just disable them for the container and allow the reader to perform the exercise de-novo.

library(tidyverse)
library(ggthemes)
library(reshape2)
library(edgeR)
library(patchwork)
library(vegan)
library(DT)
library(tximport)
library(gplots)
library(FinCal)
library(ggrepel)
library(gt)
library(ggExtra)
library(EnsDb.Hsapiens.v86)
library(stringr)
library(cowplot)
library(ggpubr)

I have a separate tree in which I copied the capsule and data. I performed exactly their steps kallisto quant steps within it and put the output data into the same place within it. I did change the commands slightly because I downloaded the files from SRA and so don’t have them with names like ‘host_CL01’, but instead ‘PRJNA…’. But the samples are in the same order, so I sent the output files to the same final filenames. Here is an example from the first sample:

cd preprocessing
module add kallisto
kallisto index -i Homo_sapiens.GRCh38.cdna.all.Index Homo_sapiens.GRCh38.cdna.all.fa
# Map reads to the indexed reference transcriptome for HOST
# first the healthy subjects (HS)
export LESS = '--buffers 0 -B'
kallisto quant -i Homo_sapiens.GRCh38.cdna.all.Index -o host_HS01 -t 24 -b 60 \
         --single -l 250 -s 30 <(less SRR8668755/*-trimmed.fastq.xz) 2>host_HS01.log 1>&2 &

21.3 Block ‘sample_info’

I am going to change the path very slightly in the following block simply because I put the capsule in a separate directory and do not want to copy it here. Otherwise it is unmodified. Also, the function gt::tab_header() annoys the crap out of me.

import <- read_tsv("../scott_2019/capsule-6534016/data/studydesign.txt")
import %>% dplyr::filter(disease == "cutaneous") %>%
  dplyr::select(-2) %>%  gt() %>%
  tab_header(title = md("Clinical metadata from patients with cutaneous leishmaniasis (CL)"),
             subtitle = md("`(n=21)`")) %>%  cols_align(align = "center", columns = TRUE)
targets.lesion <- import
targets.onlypatients <- targets.lesion[8:28,] # only CL lesions (n=21)

# Making factors that will be used for pairwise comparisons:
# HS vs. CL lesions as a factor:
disease.lesion <- factor(targets.lesion$disease)
# Cure vs. Failure lesions as a factor:
treatment.lesion <- factor(targets.onlypatients$treatment_outcome)

21.4 Importing the data and annotations

They did use a slightly different annotation set, Ensembl revision 86. Once again I am modifying the paths slightly to reflect where I put the capsule.

# capturing Ensembl transcript IDs (tx) and gene symbols ("gene_name") from
# EnsDb.Hsapiens.v86 annotation package
Tx <- as.data.frame(transcripts(EnsDb.Hsapiens.v86,
                                columns=c(listColumns(EnsDb.Hsapiens.v86, "tx"),
                                          "gene_name")))

Tx <- dplyr::rename(Tx, target_id = tx_id)
row.names(Tx) <- NULL
Tx <- Tx[,c(6,12)]

# getting file paths for Kallisto outputs
paths.all <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.lesion$sample, "abundance.h5")
paths.patients <- file.path("../scott_2019/capsule-6534016/data/readMapping/human", targets.onlypatients$sample, "abundance.h5")

# importing .h5 Kallisto data and collapsing transcript-level data to genes
Txi.lesion.coding <- tximport(paths.all,
                              type = "kallisto",
                              tx2gene = Tx,
                              txOut = FALSE,
                              ignoreTxVersion = TRUE,
                              countsFromAbundance = "lengthScaledTPM")

# importing againg, but this time just the CL patients
Txi.lesion.coding.onlypatients <- tximport(paths.patients,
                                           type = "kallisto",
                                           tx2gene = Tx,
                                           txOut = FALSE,
                                           ignoreTxVersion = TRUE,
                                           countsFromAbundance = "lengthScaledTPM")

21.5 Filtering and normalization

The block ‘visualizationDatasets’ follows unchanged. In the next block I will add another plot or perhaps 2

# First make a DGEList from the counts:
Txi.lesion.coding.DGEList <- DGEList(Txi.lesion.coding$counts)
colnames(Txi.lesion.coding.DGEList$counts) <- targets.lesion$sample
colnames(Txi.lesion.coding$counts) <- targets.lesion$sample

Txi.lesion.coding.DGEList.OP <- DGEList(Txi.lesion.coding.onlypatients$counts)
colnames(Txi.lesion.coding.DGEList.OP) <- targets.onlypatients$sample

# Convert to counts per million:
Txi.lesion.coding.DGEList.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList, log = TRUE)
Txi.lesion.coding.DGEList.OP.cpm <- edgeR::cpm(Txi.lesion.coding.DGEList.OP, log = TRUE)

keepers.coding <- rowSums(Txi.lesion.coding.DGEList.cpm>1)>=7
keepers.coding.OP <- rowSums(Txi.lesion.coding.DGEList.OP.cpm>1)>=7

Txi.lesion.coding.DGEList.filtered <- Txi.lesion.coding.DGEList[keepers.coding,]
Txi.lesion.coding.DGEList.OP.filtered <- Txi.lesion.coding.DGEList.OP[keepers.coding.OP,]

# convert back to cpm:
Txi.lesion.coding.DGEList.LogCPM.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.filtered,
                                                        log=TRUE)
Txi.lesion.coding.DGEList.LogCPM.OP.filtered <- edgeR::cpm(Txi.lesion.coding.DGEList.OP.filtered,
                                                           log=TRUE)

# Normalizing data:
calcNorm1 <- calcNormFactors(Txi.lesion.coding.DGEList.filtered, method = "TMM")
calcNorm2 <- calcNormFactors(Txi.lesion.coding.DGEList.OP.filtered, method = "TMM")

Txi.lesion.coding.DGEList.LogCPM.filtered.norm <- edgeR::cpm(calcNorm1, log=TRUE)
colnames(Txi.lesion.coding.DGEList.LogCPM.filtered.norm) <- targets.lesion$sample
Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm <- edgeR::cpm(calcNorm2, log=TRUE)
colnames(Txi.lesion.coding.DGEList.OP.LogCPM.filtered.norm) <- targets.onlypatients$sample
# Raw dataset:
V1 <- as.data.frame(Txi.lesion.coding.DGEList.cpm)
colnames(V1) <- targets.lesion$sample
V1 <- melt(V1)
colnames(V1) <- c("sample","expression")

# Filtered dataset:
V1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered)
colnames(V1.1) <- targets.lesion$sample
V1.1 <- melt(V1.1)
colnames(V1.1) <- c("sample","expression")

# Filtered-normalized dataset:
V1.1.1 <- as.data.frame(Txi.lesion.coding.DGEList.LogCPM.filtered.norm)
colnames(V1.1.1) <- targets.lesion$sample
V1.1.1 <- melt(V1.1.1)
colnames(V1.1.1) <- c("sample","expression")

# plotting:
ggplot(V1, aes(x=sample, y=expression, fill=sample)) +
  geom_violin(trim = TRUE, show.legend = TRUE) +
  stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
  theme_bw() +
  theme(legend.position = "none", axis.title=element_text(size=7),
        axis.title.x=element_blank(), axis.text=element_text(size=5),
        axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(size = 7)) +
  ggtitle("Raw dataset") +
  ggplot(V1.1, aes(x=sample, y=expression, fill=sample)) +
  geom_violin(trim = TRUE, show.legend = TRUE) +
  stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
  theme_bw() +
  theme(legend.position = "none", axis.title=element_text(size=7),
        axis.title.x=element_blank(), axis.text=element_text(size=5),
        axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(size = 7)) +
  ggtitle("Filtered dataset") +
  ggplot(V1.1.1, aes(x=sample, y=expression, fill=sample)) +
  geom_violin(trim = TRUE, show.legend = TRUE) +
  stat_summary(fun.y = "median", geom = "point", shape = 95, size = 10, color = "black") +
  theme_bw() +
  theme(legend.position = "none", axis.title=element_text(size=7),
        axis.title.x=element_blank(), axis.text=element_text(size=5),
        axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(size = 7)) +
  ggtitle("Filtered and normalized dataset")

21.6 The unfiltered data

The following block in their dataset recreated the matrix without filtering and will use that for differential expression. It is a little hard to follow for me because they subset based on the sample numbers (8 to 28, which if I am not mistaken just drops the healthy samples).

DataNotFiltered_Norm_OP <- calcNormFactors(Txi.lesion.coding.DGEList[,8:28],
                                           method = "TMM")
DataNotFiltered_Norm_log2CPM_OP <- edgeR::cpm(DataNotFiltered_Norm_OP, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM_OP) <- targets.onlypatients$sample
CPM_normData_notfiltered_OP <- 2^(DataNotFiltered_Norm_log2CPM_OP)
#uncomment the next line to produce raw data that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(Txi.lesion.coding$counts, file = "Amorim_GEO_raw.txt", sep = "\t", quote = FALSE)

# Including all the individuals (HS and CL patients) for public domain submission:
DataNotFiltered_Norm <- calcNormFactors(Txi.lesion.coding.DGEList, method = "TMM")
DataNotFiltered_Norm_log2CPM <- edgeR::cpm(DataNotFiltered_Norm, log=TRUE)
colnames(DataNotFiltered_Norm_log2CPM) <- targets.lesion$sample
CPM_normData_notfiltered <- 2^(DataNotFiltered_Norm_log2CPM)
#uncomment the next line to produce the normalized data file that was uploaded to the Gene Expression Omnibus (GEO) for publication.
#write.table(DataNotFiltered_Norm_log2CPM, "Amorim_GEO_normalized.txt", sep = "\t", quote = FALSE)

21.7 The scott exploratory analysis

The following block generated a couple of the figures in the paper and comprise a pretty straightforward PCA. I am going to make a following block containing the same image with the cure/fail visualization using the same method/data.

pca.res <- prcomp(t(Txi.lesion.coding.DGEList.LogCPM.filtered.norm), scale.=F, retx=T)
pc.var <- pca.res$sdev^2
pc.per <- round(pc.var/sum(pc.var)*100, 1)
data.frame <- as.data.frame(pca.res$x)

# Calculate distance between samples by permanova:
allsamples.dist <- vegdist(t(2^Txi.lesion.coding.DGEList.LogCPM.filtered.norm),
                           method = "bray")

vegan <- adonis2(allsamples.dist~targets.lesion$disease,
                 data=targets.lesion,
                 permutations = 999, method="bray")

targets.lesion$disease
ggplot(data.frame, aes(x=PC1, y=PC2, color=factor(targets.lesion$disease))) +
  geom_point(size=5, shape=20) +
  theme_calc() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 15, vjust = 0.5),
        axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
        legend.position="none") +
  scale_color_manual(values = c("#073F80","#EB512C")) +
  annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
                                            vegan[1,5]), size=3, fontface="bold") +
  xlab(paste("PC1 -",pc.per[1],"%")) +
  ylab(paste("PC2 -",pc.per[2],"%")) +
  xlim(-200,110)

21.7.1 My most similar pca

I just realized that somewhere along the way in creating this container, I messed up this analysis pretty badly:

  1. I dropped the 7 control samples.
  2. I am comparing cure/fail but these analyses are all control/cutaneous.

When I originally did this on my workstation I had an actual 1:1 comparison and saw that our results were quite similar. I need to bring that back into this in order to show that neither we nor they are crazy people.

Either way, I think the main takeaway is that their dataset does not spend much time looking at cure/fail but instead control/infected for a reason.

Note, the fun aspects of the experiment (time to cure, size of lesion, etc) are not annotated in the metadata provided by SRA, but instead may be found in the capsule kindly provided by the lab. As a result, I copied that file into the sample_sheets/ directory and have added it to the expressionset. There is an important caveat, though: I did not include the non-diseased samples for this comparison; as a result the disease metadata factor is boring (e.g. it is only cutaneous).

external_cf[["accession"]] <- pData(external_cf)[["sample"]]
disease_factor <- pData(external_cf)[["disease"]]
table(disease_factor)
## disease_factor
## cutaneous 
##        21
external_disease <- set_expt_conditions(external_cf, fact = disease_factor)
## The numbers of samples by condition are:
## 
## cutaneous 
##        21
external_l2cpm <- normalize_expt(external_cf, filter = TRUE,
                                convert = "cpm", transform = "log2")
## Removing 7327 low-count genes (14154 remaining).
## transform_counts: Found 165 values equal to 0, adding 1 to the matrix.
plot_pca(external_l2cpm, plot_labels = "repel")
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by female, male.

Use the following block if you wish to bring together SRA-downloaded data with the experimental design from the Scott paper. It requires running the blocks above in which I loaded the capsule-derived metadata.

test <- pData(external_cf)
test_import <- as.data.frame(import)
test_import[["accession"]] <- pData(external_cf[["accession"]])
test_merged <- merge(test, import, by = "accession")

This is real comparison point to their cure/fail analysis.

21.8 Cure/Fail PCA using the same prcomp result

I am just copy/pasting their code again, but changing the color factor so that cure is purple, failure is red, and na(uninfected) is black.

The following plot should be the first direct comparison point between the two analysis pipelines. Thus, if you look back a few block at my invocation of plot_pca(external_norm), you will see a green/orange plot which is functionally identical if you note:

  1. The x and y axes are flipped, which ok whatever it is PCA.
  2. I excluded the healthy samples.
  3. I dropped to gene level and used hisat.

With those caveats in mind, it is trivial to find the same relationshipes in the samples. E.g. the bottom red/purple individual samples are in the same relative position as my top orange/green pair. the same 4 samples are relative x-axis outliers (my right green, their left purple). The last 6 samples (my orange, their red) are all in the relative orientation.

I think I can further prove the similarity of our inputs via a direct comparison of the datastructures: Txi.lesion.coding.DGEList.LogCPM.filtered.norm (ugh what a name) vs. external_cf. In order to make that comparison, I need to rename my rows to the genecard IDs and the columns.

their_norm_exprs <- Txi.lesion.coding.DGEList.LogCPM.filtered.norm

my_hgnc_ids <- make.names(fData(external_cf)[["hgnc_symbol"]], unique = TRUE)
my_renamed <- set_expt_genenames(external_cf, ids = my_hgnc_ids)
my_norm <- normalize_expt(my_renamed, filter = TRUE, transform = "log2", convert = "cpm")
my_norm_exprs <- as.data.frame(exprs(my_norm))

our_exprs <- merge(their_norm_exprs, my_norm_exprs, by = "row.names")
rownames(our_exprs) <- our_exprs[["Row.names"]]
our_exprs[["Row.names"]] <- NULL
dim(our_exprs)

## I fully expected a correlation heatmap of the combined
## data to show a set of paired samples across the board.
## That is absolutely not true.
correlations <- plot_corheat(our_exprs)
correlations[["scatter"]]
correlations[["plot"]]
color_fact <- factor(targets.lesion$treatment_outcome)
levels(color_fact)
## Added by atb to see cure/fail on the same dataset
ggplot(data.frame, aes(x=PC1, y=PC2, color=color_fact)) +
  geom_point(size=5, shape=20) +
  theme_calc() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 15, vjust = 0.5),
        axis.text.y = element_text(size = 15), axis.title = element_text(size = 15),
        legend.position="none") +
  scale_color_manual(values = c("purple", "red","black")) +
  annotate("text", x=-50, y=80, label=paste("Permanova Pr(>F) =",
                                            vegan[1,5]), size=3, fontface="bold") +
  xlab(paste("PC1 -",pc.per[1],"%")) +
  ylab(paste("PC2 -",pc.per[2],"%")) +
  xlim(-200,110)

21.9 DE comparisons

The following is their comparison of healthy tissue vs. CL lesion and Failure vs. Cure. I am going to follow it with my analagous examination using limma. Note, each of the pairs of variables created in the following block is xxx followed by xxx.treat; the former is healthy vs lesion and the latter is the fail vs cure set.

# Model matrices:
# CL lesions vs. HS:
design.lesion <- model.matrix(~0 + disease.lesion)
colnames(design.lesion) <- levels(disease.lesion)

# Failure vs. Cure:
design.lesion.treatment <- model.matrix(~0 + treatment.lesion)
colnames(design.lesion.treatment) <- levels(treatment.lesion)

myDGEList.lesion.coding <- DGEList(calcNorm1$counts)
myDGEList.OP.NotFil <- DGEList(CPM_normData_notfiltered_OP)

# Model mean-variance trend and fit linear model to data.
# Use VOOM function from Limma package to model the mean-variance relationship
normData.lesion.coding <- voom(myDGEList.lesion.coding, design.lesion)
normData.OP.NotFil <- voom(myDGEList.OP.NotFil, design.lesion.treatment)

colnames(normData.lesion.coding) <- targets.lesion$sample
colnames(normData.OP.NotFil) <- targets.onlypatients$sample

# fit a linear model to your data
fit.lesion.coding <- lmFit(normData.lesion.coding, design.lesion)
fit.lesion.coding.treatment <- lmFit(normData.OP.NotFil, design.lesion.treatment)

# contrast matrix
contrast.matrix.lesion <- makeContrasts(CL.vs.CON = cutaneous - control,
                                        levels=design.lesion)
contrast.matrix.lesion.treat <- makeContrasts(failure.vs.cure = failure - cure,
                                              levels=design.lesion.treatment)

# extract the linear model fit
fits.lesion.coding <- contrasts.fit(fit.lesion.coding,
                                    contrast.matrix.lesion)
fits.lesion.coding.treat <- contrasts.fit(fit.lesion.coding.treatment,
                                          contrast.matrix.lesion.treat)

# get bayesian stats for your linear model fit
ebFit.lesion.coding <- eBayes(fits.lesion.coding)
ebFit.lesion.coding.treat <- eBayes(fits.lesion.coding.treat)

# TopTable ----
allHits.lesion.coding <- topTable(ebFit.lesion.coding,
                                  adjust ="BH", coef=1,
                                  number=34935, sort.by="logFC")
allHits.lesion.coding.treat <- topTable(ebFit.lesion.coding.treat,
                                        adjust ="BH", coef=1,
                                        number=34776, sort.by="logFC")
myTopHits <- rownames_to_column(allHits.lesion.coding, "geneID")
myTopHits.treat <- rownames_to_column(allHits.lesion.coding.treat, "geneID")

# mutate the format of numeric values:
myTopHits <- mutate(myTopHits, log10Pval = round(-log10(adj.P.Val),2),
                    adj.P.Val = round(adj.P.Val, 2),
                    B = round(B, 2),
                    AveExpr = round(AveExpr, 2),
                    t = round(t, 2),
                    logFC = round(logFC, 2),
                    geneID = geneID)

myTopHits.treat <- mutate(myTopHits.treat, log10Pval = round(-log10(adj.P.Val),2),
                          adj.P.Val = round(adj.P.Val, 2),
                          B = round(B, 2),
                          AveExpr = round(AveExpr, 2),
                          t = round(t, 2),
                          logFC = round(logFC, 2),
                          geneID = geneID)
#save(myTopHits, file = "myTopHits")
#save(myTopHits.treat, file = "myTopHits.treat")

21.10 Perform my analagous limma analysis

my_filt <- normalize_expt(my_renamed, filter = "simple")
limma_cf <- limma_pairwise(my_filt, model_batch = FALSE)

my_table <- limma_cf[["all_tables"]][["failure_vs_cure"]]
their_table <- myTopHits.treat

dim(my_table)
dim(myTopHits.treat)
our_table <- merge(my_table, myTopHits.treat, by.x = "row.names", by.y = "geneID")
dim(our_table)
comparison <- plot_linear_scatter(our_table[, c("logFC.x", "logFC.y")])
comparison$scatter
comparison$correlation
comparison$lm_model

Ok, so there is a constituitive difference in our results, and it is significant. What does that mean for the set of genes observed?

With that said, in my most recent manual run of this, the results are quite good, I got a 0.75 correlation; I bet the primary outliers (on the axes) are just genes for which we got different gene<->tx mappings due to me using hisat and their usage of kallisto.

I guess I can test this hypothesis by just swapping in their counts into my data structure.

test_counts <- as.data.frame(myDGEList.lesion.coding[["counts"]])
test_counts[["host_HS01"]] <- NULL
test_counts[["host_HS02"]] <- NULL
test_counts[["host_HS03"]] <- NULL
test_counts[["host_HS04"]] <- NULL
test_counts[["host_HS05"]] <- NULL
test_counts[["host_HS06"]] <- NULL
test_counts[["host_HS07"]] <- NULL

dim(test_counts)
dim(exprs(my_test))
## Oh, that surprises me, the kallisto data has ~ 6k fewer genes?

21.11 See if there are shared DE genes

!!NOTE!! I am using a non-adjusted p-value filter here because I want to use the same filter they used for the volcano plot.

my_filter <- abs(my_table[["logFC"]]) > 1.0 & my_table[["P.Value"]] <= 0.05
sum(my_filter)
their_filter <- abs(their_table[["logFC"]]) > 1.0 & their_table[["P.Value"]] <= 0.05
sum(their_filter)

my_shared <- rownames(my_table)[my_filter] %in% their_table[their_filter, "geneID"]
sum(my_shared)

shared <- rownames(my_table)[my_filter]
shared[my_shared]

both <- list(
  "us" = rownames(my_table)[my_filter],
  "them" = their_table[their_filter, "geneID"])
tt <- UpSetR::fromList(both)
UpSetR::upset(tt)

21.12 Compare the two datasets directly

only_tmrc3 <- subset_expt(tmrc3_external, subset = "condition=='Colombia'") %>%
  set_expt_conditions(fact = "finaloutcome")
## subset_expt(): There were 39, now there are 18 samples.
## The numbers of samples by condition are:
## 
## failure    cure 
##       5      13
only_tmrc3_de <- all_pairwise(only_tmrc3, model_batch = "svaseq",
                              filter = TRUE,
                              methods = methods)
## 
## failure    cure 
##       5      13
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

only_tmrc3_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.7781
## basic_vs_dream      0.9001
## basic_vs_ebseq      0.7965
## basic_vs_edger      0.8963
## basic_vs_limma      0.9061
## basic_vs_noiseq     0.9366
## deseq_vs_dream      0.7191
## deseq_vs_ebseq      0.8921
## deseq_vs_edger      0.9247
## deseq_vs_limma      0.7154
## deseq_vs_noiseq     0.7816
## dream_vs_ebseq      0.7621
## dream_vs_edger      0.8361
## dream_vs_limma      0.9890
## dream_vs_noiseq     0.8748
## ebseq_vs_edger      0.9223
## ebseq_vs_limma      0.7494
## ebseq_vs_noiseq     0.8417
## edger_vs_limma      0.8311
## edger_vs_noiseq     0.8987
## limma_vs_noiseq     0.8629
only_tmrc3_table <- combine_de_tables(only_tmrc3_de, scale_p = TRUE)
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
only_tmrc3_table
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure          27            26          28            15
##   limma_sigup limma_sigdown
## 1           1             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

only_tmrc3_top100 <- extract_significant_genes(only_tmrc3_table, n = 100)
only_tmrc3_up <- only_tmrc3_top100[["deseq"]][["ups"]][["failure_vs_cure"]]
only_tmrc3_down <- only_tmrc3_top100[["deseq"]][["downs"]][["failure_vs_cure"]]

tmrc3_external_de <- all_pairwise(tmrc3_external, model_batch = "svaseq",
                                  filter = "simple",
                                  methods = methods)
## 
##   Brazil Colombia 
##       21       18
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tmrc3_external_table <- combine_de_tables(
  tmrc3_external_de, scale_p = TRUE,
  excel = "excel/tmrc3_scott_biopsies.xlsx")
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_sig <- extract_significant_genes(
  tmrc3_external_table, excel = "excel/tmrc3_scott_biopsies_sig.xlsx")

tmrc3_external_cf <- set_expt_conditions(tmrc3_external, fact = "finaloutcome")
## The numbers of samples by condition are:
## 
## failure    cure 
##      12      27
tmrc3_external_cf <-  set_expt_batches(tmrc3_external_cf, fact = "lab")
## The number of samples by batch are:
## 
##   Brazil Colombia 
##       21       18
tmrc3_external_cf_norm <- normalize_expt(tmrc3_external_cf, filter = TRUE,
                                         norm = "quant", convert = "cpm", transform = "log2")
## Removing 6904 low-count genes (14577 remaining).
## transform_counts: Found 18 values equal to 0, adding 1 to the matrix.
plot_pca(tmrc3_external_cf_norm)
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.

tmrc3_external_cf_nb <- normalize_expt(tmrc3_external_cf, filter = TRUE,
                                       batch = "svaseq", convert = "cpm", transform = "log2")
## Removing 6904 low-count genes (14577 remaining).
## Setting 1515 low elements to zero.
## transform_counts: Found 1515 values equal to 0, adding 1 to the matrix.
plot_pca(tmrc3_external_cf_nb)
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by failure, cure
## Shapes are defined by Brazil, Colombia.

tmrc3_external_cf_de <- all_pairwise(tmrc3_external_cf, model_batch = "svaseq",
                                     filter = TRUE,
                                     methods = methods)
## 
## failure    cure 
##      12      27
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

tmrc3_external_cf_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.7725
## basic_vs_dream      0.9080
## basic_vs_ebseq      0.8250
## basic_vs_edger      0.8165
## basic_vs_limma      0.9167
## basic_vs_noiseq     0.9416
## deseq_vs_dream      0.8238
## deseq_vs_ebseq      0.9092
## deseq_vs_edger      0.9500
## deseq_vs_limma      0.7961
## deseq_vs_noiseq     0.8259
## dream_vs_ebseq      0.8159
## dream_vs_edger      0.8854
## dream_vs_limma      0.9769
## dream_vs_noiseq     0.8648
## ebseq_vs_edger      0.9177
## ebseq_vs_limma      0.7869
## ebseq_vs_noiseq     0.9009
## edger_vs_limma      0.8568
## edger_vs_noiseq     0.8677
## limma_vs_noiseq     0.8497
tmrc3_external_cf_table <- combine_de_tables(
  tmrc3_external_cf_de, scale_p = TRUE,
  excel = "excel/tmrc3_scott_cf_table.xlsx")
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Error : colNames must be a unique vector (case sensitive)
tmrc3_external_cf_table
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure          37           127          38            91
##   limma_sigup limma_sigdown
## 1           7             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tmrc3_external_cf_sig <- extract_significant_genes(
  tmrc3_external_cf_table, excel = "excel/tmrc3_scott_cf_sig.xlsx")
tmrc3_external_cf_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                 limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure        7          0       38         91       37        127
##                 ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure        3          6        0          0

tmrc3_external_species <- set_expt_conditions(tmrc3_external, fact = "ParasiteSpecies") %>%
  set_expt_colors(color_choices[["parasite"]])
## The numbers of samples by condition are:
## 
## lvbraziliensis   lvpanamensis  notapplicable 
##             22             14              3
## Warning in set_expt_colors(., color_choices[["parasite"]]): Colors for the
## following categories are not being used: lvguyanensis.

21.13 Compare the l2FC values

Let us look at the top/bottom 100 genes of these two datasets and see if they have any similarities.

Note to self, set up s4 dispatch on compare_de_tables!

compared <- compare_de_tables(only_tmrc3_table, external_table, first_table = 1, second_table = 1)
compared$scatter

compared$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 14, df = 13240, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1033 0.1368
## sample estimates:
##    cor 
## 0.1201

22 Compare visits by celltype and C/F

I assume this request came out of the review process, but I am not quite sure where to put it. If I understand it correctly, the goal is to look across visits for combinations of cure and fail (not fail/cure, but v2/v1) and across cell types.

Thus, in order to do this, I will need to combine those three parameters or set up a more complex model to handle this.

t_cellvisitcf <- set_expt_conditions(t_clinical_nobiop, fact = "cell_visit_cf")
## The numbers of samples by condition are:
## 
##    eosinophils_1_cure eosinophils_1_failure    eosinophils_2_cure 
##                     5                     3                     6 
## eosinophils_2_failure    eosinophils_3_cure eosinophils_3_failure 
##                     3                     6                     3 
##      monocytes_1_cure   monocytes_1_failure      monocytes_2_cure 
##                     8                     8                     7 
##   monocytes_2_failure      monocytes_3_cure   monocytes_3_failure 
##                     6                     6                     7 
##    neutrophils_1_cure neutrophils_1_failure    neutrophils_2_cure 
##                     8                     8                     7 
## neutrophils_2_failure    neutrophils_3_cure neutrophils_3_failure 
##                     6                     5                     7
t_cellvisitcf_de <- all_pairwise(t_cellvisitcf, keepers = visittype_contrasts,
                                 model_batch = "svaseq", filter = TRUE,
                                 methods = methods)
## 
##    eosinophils_1_cure eosinophils_1_failure    eosinophils_2_cure 
##                     5                     3                     6 
## eosinophils_2_failure    eosinophils_3_cure eosinophils_3_failure 
##                     3                     6                     3 
##      monocytes_1_cure   monocytes_1_failure      monocytes_2_cure 
##                     8                     8                     7 
##   monocytes_2_failure      monocytes_3_cure   monocytes_3_failure 
##                     6                     6                     7 
##    neutrophils_1_cure neutrophils_1_failure    neutrophils_2_cure 
##                     8                     8                     7 
## neutrophils_2_failure    neutrophils_3_cure neutrophils_3_failure 
##                     6                     5                     7
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.

## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and edger,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and edger,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and edger,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and edger, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and edger, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and edger, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and edger, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and edger, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and edger, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and edger, neutrophils_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and limma,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and limma,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and limma,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and limma, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and limma, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and limma, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and limma, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and limma, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and limma, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and limma, neutrophils_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_1_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_2_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq,
## eosinophils_3_failure_vs_eosinophils_1_cure and noiseq,
## eosinophils_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_1_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_1_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_1_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_2_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_2_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_2_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_cure_vs_eosinophils_1_cure
## and noiseq, monocytes_3_cure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, monocytes_3_failure_vs_eosinophils_1_cure
## and noiseq, monocytes_3_failure_vs_eosinophils_1_cure failed.
## Warning in correlate_de_tables(results, annot_df = annot_df, extra_contrasts =
## extra_contrasts): The merge of ebseq, neutrophils_1_cure_vs_eosinophils_1_cure
## and noiseq, neutrophils_1_cure_vs_eosinophils_1_cure failed.

t_cellvisitcf_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 21 comparisons.
t_cellvisitcf_mono_table <- combine_de_tables(
  t_cellvisitcf_de, keepers = visittype_contrasts_mono, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_mono_table
## A set of combined differential expression results.
##                                        table deseq_sigup deseq_sigdown
## 1       monocytes_2_cure_vs_monocytes_1_cure           0             2
## 2 monocytes_2_failure_vs_monocytes_1_failure           2             2
## 3       monocytes_3_cure_vs_monocytes_1_cure           1             3
## 4 monocytes_3_failure_vs_monocytes_1_failure           1             3
##   edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1           0             0           0             0
## 2           1             1           1             0
## 3           0             0           0             0
## 4           0             0           0             0
## Plot describing unique/shared genes in a differential expression table.

t_cellvisitcf_mono_sig <- extract_significant_genes(
  t_cellvisitcf_mono_table,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/monocyte_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_mono_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                   limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_mono_cure           0          0        0          0        0          2
## v2v1_mono_failure        1          0        1          1        2          2
## v3v1_mono_cure           0          0        0          0        1          3
## v3v1_mono_failure        0          0        0          0        1          3
##                   ebseq_up ebseq_down basic_up basic_down
## v2v1_mono_cure           0          0        0          0
## v2v1_mono_failure        3          3        0          0
## v3v1_mono_cure           2          1        0          0
## v3v1_mono_failure        0          1        0          0

t_cellvisitcf_neut_table <- combine_de_tables(
  t_cellvisitcf_de, keepers = visittype_contrasts_ne, scale_p = TRUE,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_neut_table
## A set of combined differential expression results.
##                                            table deseq_sigup deseq_sigdown
## 1       neutrophils_2_cure_vs_neutrophils_1_cure          85           132
## 2 neutrophils_2_failure_vs_neutrophils_1_failure         127           150
## 3       neutrophils_3_cure_vs_neutrophils_1_cure         105           195
## 4 neutrophils_3_failure_vs_neutrophils_1_failure          87            24
##   edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1          75           118          90            31
## 2         116           175         105            42
## 3         110           157         114            39
## 4          77            20          56            36
## Plot describing unique/shared genes in a differential expression table.

t_cellvisitcf_neut_sig <- extract_significant_genes(
  t_cellvisitcf_neut_table,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/neutrophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_neut_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                 limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_ne_cure          90         31       75        118       85        132
## v2v1_ne_failure      105         42      116        175      127        150
## v3v1_ne_cure         114         39      110        157      105        195
## v3v1_ne_failure       56         36       77         20       87         24
##                 ebseq_up ebseq_down basic_up basic_down
## v2v1_ne_cure          24         16        3          2
## v2v1_ne_failure       75         10        6          1
## v3v1_ne_cure          44          8        0          0
## v3v1_ne_failure       17          5        0          0

t_cellvisitcf_eo_table <- combine_de_tables(
  t_cellvisitcf_de, keepers = visittype_contrasts_eo,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'closure'
## Error : rowNames must be a logical vector with NAs
t_cellvisitcf_eo_table
## A set of combined differential expression results.
##                                            table deseq_sigup deseq_sigdown
## 1       eosinophils_2_cure_vs_eosinophils_1_cure           5             1
## 2 eosinophils_2_failure_vs_eosinophils_1_failure           1             5
## 3       eosinophils_3_cure_vs_eosinophils_1_cure           9             1
## 4 eosinophils_3_failure_vs_eosinophils_1_failure           0             8
##   edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1           0             0           0             0
## 2           0             0           0             0
## 3           0             0           0             1
## 4           0             0           0             0
## Plot describing unique/shared genes in a differential expression table.

t_cellvisitcf_eo_sig <- extract_significant_genes(
  t_cellvisitcf_eo_table,
  excel = glue("{xlsx_prefix}/DE_Visits/Cure_Fail/eosinophil_visit_cf_combined_sig_sva-v{ver}.xlsx"))
t_cellvisitcf_eo_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                 limma_up limma_down edger_up edger_down deseq_up deseq_down
## v2v1_eo_cure           0          0        0          0        5          1
## v2v1_eo_failure        0          0        0          0        1          5
## v3v1_eo_cure           0          1        0          0        9          1
## v3v1_eo_failure        0          0        0          0        0          8
##                 ebseq_up ebseq_down basic_up basic_down
## v2v1_eo_cure           1          0        0          0
## v2v1_eo_failure        4          2        0          0
## v3v1_eo_cure           1          1        0          0
## v3v1_eo_failure       17          5        0          0

tmp <- loadme(filename = savefile)

Bibliography

Chung, Matthew, Vincent M. Bruno, David A. Rasko, Christina A. Cuomo, José F. Muñoz, Jonathan Livny, Amol C. Shetty, Anup Mahurkar, and Julie C. Dunning Hotopp. 2021. “Best Practices on the Differential Expression Analysis of Multi-Species RNA-seq.” Genome Biology 22 (April): 121. https://doi.org/10.1186/s13059-021-02337-8.
Hoffman, Gabriel E, and Panos Roussos. 2020. “Dream: Powerful Differential Expression Analysis for Repeated Measures Designs.” Bioinformatics 37 (2): 192–201. https://doi.org/10.1093/bioinformatics/btaa687.
Leng, Ning, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M. G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. 2013. EBSeq: An Empirical Bayes Hierarchical Model for Inference in RNA-seq Experiments.” Bioinformatics 29 (8): 1035–43. https://doi.org/10.1093/bioinformatics/btt087.
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” bioRxiv. https://doi.org/10.1101/002832.
McCarthy, Davis J., Yunshun Chen, and Gordon K. Smyth. 2012. “Differential Expression Analysis of Multifactor RNA-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97. https://doi.org/10.1093/nar/gks042.
Molania, Ramyar, Momeneh Foroutan, Johann A. Gagnon-Bartsch, Luke C. Gandolfo, Aryan Jain, Abhishek Sinha, Gavriel Olshansky, Alexander Dobrovic, Anthony T. Papenfuss, and Terence P. Speed. 2023. “Removing Unwanted Variation from Large-Scale RNA Sequencing Data with PRPS.” Nature Biotechnology 41 (1): 82–95. https://doi.org/10.1038/s41587-022-01440-w.
Risso, Davide, John Ngai, Terence P. Speed, and Sandrine Dudoit. 2014. “Normalization of RNA-seq Data Using Factor Analysis of Control Genes or Samples.” Nature Biotechnology 32 (9): 896–902. https://doi.org/10.1038/nbt.2931.
Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47–47. https://doi.org/10.1093/nar/gkv007.
Tarazona, Sonia, Pedro Furió-Tarí, David Turrà, Antonio Di Pietro, María José Nueda, Alberto Ferrer, and Ana Conesa. 2015. “Data Quality Aware Analysis of Differential Expression in RNA-seq with NOISeq R/Bioc Package.” Nucleic Acids Research 43 (21): e140. https://doi.org/10.1093/nar/gkv711.
