TMRC3 202412: Differential Expression analyses

atb

2024-12-30

1 Changelog

  • 202309: Disabled GSVA analyses until/unless we get permission to include the mSigDB 7.5.1 release. I will simplify the filenames so that one may easily drop in a downloaded copy of the data and run those blocks. Until then, I guess you will have to trust me when I say those blocks all work?
  • 202309: Moved all GSEA analyses to 04lrt_gsea_gsva.Rmd
  • 202309 next day: Moving GSEA back because it adds too much complexity to save/reload the DE results for gProfiler and friends.
  • Still hunting for messed up colors, changed input data to match new version.

2 Introduction

The various differential expression analyses of the data generated in tmrc3_datasets will occur in this document.

2.1 Naming conventions

I am going to try to standardize how I name the various data structures created in this document. Most of the large data created are either sets of differential expression analyses, their combined results, or the set of results deemed ‘significant’.

Hopefully by now they all follow these guidelines:

{clinic(s)}sample-subset}{primary-question(s)}{datatype}{batch-method}

  • {clinic}: This is either tc or t for Tumaco and Cali, or just Tumaco.
  • {sample-subset}: Things like ‘all’ or ‘monocytes’.
  • {primary-question}: Shorthand name for the primary contrasts performed, thus ‘clinics’ would suggest a comparison of Tumaco vs. Cali. ‘visits’ would compare v2/v1, etc.
  • {datatype}: de, table, sig
  • {batch-type}: nobatch, batch{factor}, sva. {factor} in this instance should be a column from the metadata.

With this in mind, ‘tc_biopsies_clinic_de_sva’ should be the Tumaco+Cali biopsy data after performing the differential expression analyses comparing the clinics using sva.

I suspect there remain some exceptions and/or errors.

2.2 Define contrasts for DE analyses

Each of the following lists describes the set of contrasts that I think are interesting for the various ways one might consider the TMRC3 dataset. The variables are named according to the assumed data with which they will be used, thus tc_cf_contrasts is expected to be used for the Tumaco+Cali data and provide a series of cure/fail comparisons which (to the extent possible) across both locations. In every case, the name of the list element will be used as the contrast name, and will thus be seen as the sheet name in the output xlsx file(s); the two pieces of the character vector value are the numerator and denominator of the associated contrast.

2.3 GSEA

The GSEA analyses will follow each DE analysis during this document.

Most (all?) of the GSEA analyses used in this paper were done via gProfiler rather than goseq/clusterProfiler/topGO/GOstats. Primarily because it is so easy to invoke gprofiler.

clinic_contrasts <- list(
  "clinics" = c("cali", "tumaco"))
## In some cases we have no Cali failure samples, so there remain only 2
## contrasts that are likely of interest
tc_cf_contrasts <- list(
  "tumaco" = c("tumaco_failure", "tumaco_cure"),
  "cure" = c("tumaco_cure", "cali_cure"))
## In other cases, we have cure/fail for both places.
clinic_cf_contrasts <- list(
  "cali" = c("cali_failure", "cali_cure"),
  "tumaco" = c("tumaco_failure", "tumaco_cure"),
  "cure" = c("tumaco_cure", "cali_cure"),
  "fail" = c("tumaco_failure", "cali_failure"))
cf_contrast <- list(
  "outcome" = c("tumaco_failure", "tumaco_cure"))
t_cf_contrast <- list(
  "outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
  "v1cf" = c("v1_failure", "v1_cure"),
  "v2cf" = c("v2_failure", "v2_cure"),
  "v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
  "v2v1" = c("c2", "c1"),
  "v3v1" = c("c3", "c1"),
  "v3v2" = c("c3", "c2"))
visit_v1later <- list(
  "later_vs_first" = c("later", "first"))
celltypes <- list(
  "eo_mono" = c("eosinophils", "monocytes"),
  "ne_mono" = c("neutrophils", "monocytes"),
  "eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
  "mestizo_indigenous" = c("mestiza", "indigena"),
  "mestizo_afrocol" = c("mestiza", "afrocol"),
  "indigenous_afrocol" = c("indigena", "afrocol"))

3 Compare samples by clinic

3.1 DE: Compare clinics, all samples

Perform a svaseq-guided comparison of the two clinics. Ideally this will give some clue about just how strong the clinic-based batch effect really is and what its causes are.

tc_clinic_type <- tc_valid %>%
  set_expt_conditions(fact = "clinic") %>%
  set_expt_batches(fact = "typeofcells")
## The numbers of samples by condition are:
## 
##   cali tumaco 
##     61    123
## The number of samples by batch are:
## 
##      biopsy eosinophils   monocytes neutrophils 
##          18          41          63          62
table(pData(tc_clinic_type)[["condition"]])
## 
##   cali tumaco 
##     61    123
tc_all_clinic_de_sva <- all_pairwise(tc_clinic_type, model_batch = "svaseq",
                                     filter = TRUE, methods = methods)
## 
##   cali tumaco 
##     61    123
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: tumaco_vs_cali.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cali.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumaco.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##   cali tumaco 
##     61    123
## Choosing among model matrix columns: conditioncali, conditiontumaco.

tc_all_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 tumc_vs_cl
## basic_vs_deseq      0.8086
## basic_vs_ebseq      0.7548
## basic_vs_edger      0.8735
## basic_vs_limma      0.9693
## basic_vs_noiseq     0.9114
## deseq_vs_ebseq      0.8504
## deseq_vs_edger      0.9377
## deseq_vs_limma      0.7998
## deseq_vs_noiseq     0.8770
## ebseq_vs_edger      0.8618
## ebseq_vs_limma      0.7772
## ebseq_vs_noiseq     0.8382
## edger_vs_limma      0.8621
## edger_vs_noiseq     0.9363
## limma_vs_noiseq     0.8895
tc_all_clinic_de_sva[["deseq"]][["contrasts_performed"]]
## [1] "tumaco_vs_cali"
tc_all_clinic_table_sva <- combine_de_tables(
  tc_all_clinic_de_sva, keepers = clinic_contrasts,
  excel = glue("{clinic_prefix}/tc_all_clinic_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_all_clinic_table_sva
## A set of combined differential expression results.
##                     table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 tumaco_vs_cali-inverted         273          1799         323          1667
##   limma_sigup limma_sigdown
## 1         392           606
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tc_all_clinic_sig_sva <- extract_significant_genes(
  tc_all_clinic_table_sva,
  excel = glue("{clinic_prefix}/compare_clinics/tc_clinic_type_sig_sva-v{ver}.xlsx"))
tc_all_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## clinics      392        606      323       1667      273       1799      224
##         ebseq_down basic_up basic_down
## clinics        424     2670       2029

3.1.1 GSEA: comparing the clinics

increased_tumaco_categories_up <- simple_gprofiler(
  tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]],
  excel = glue("{gsea_prefix}/tumaco_cateogies_up-v{ver}.xlsx"))
increased_tumaco_categories_up
## A set of ontologies produced by gprofiler using 273
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 17 MF
## 12 BP
## 1 KEGG
## 1 REAC
## 0 WP
## 100 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
increased_tumaco_categories_up[["pvalue_plots"]][["BP"]]
## NULL
increased_cali_categories <- simple_gprofiler(
  tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]],
  excel = glue("{gsea_prefix}/cali_cateogies_up-v{ver}.xlsx"))
increased_cali_categories
## A set of ontologies produced by gprofiler using 1799
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 59 MF
## 686 BP
## 2 KEGG
## 20 REAC
## 7 WP
## 333 TF
## 2 MIRNA
## 16 HPA
## 0 CORUM
## 14 HP hits.
increased_cali_categories[["pvalue_plots"]][["BP"]]
## NULL

3.1.2 Visualize clinic differences

Let us take a quick look at the results of the comparison of Tumaco/Cali

Note: I keep re-introducing an error which causes these (volcano and MA) plots to be reversed with respect to the logFC values. Pay careful attention to these and make sure that they agree with the numbers of genes observed in the contrast.

## Check that up is up
summary(tc_all_clinic_table_sva[["data"]][["clinics"]][["deseq_logfc"]])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -20.280  -0.584  -0.155  -0.255   0.172   3.514
## I think we can assume that most genes are down when considering Tumaco/Cali.
sum(tc_all_clinic_table_sva$data$clinics$deseq_logfc < -1.0 &
      tc_all_clinic_table_sva$data$clinics$deseq_adjp < 0.05)
## [1] 1795
tc_all_clinic_table_sva[["plots"]][["clinics"]][["deseq_vol_plots"]]

## Ok, so it says 1794 up, but that is clearly the down side...  Something is definitely messed up.
## The points are on the correct sides of the plot, but the categories of up/down are reversed.
## Theresa noted that she colors differently, and I think better: left side gets called
## 'increased in denominator', right side gets called 'increased in numerator';
## these two groups are colored according to their condition colors, and everything else is gray.
## I am checking out Theresa's helper_functions.R to get a sense of how she handles this, I think
## I can use a variant of her idea pretty easily:
##  1.  Add a column 'Significance', which is a factor, and contains either 'Not enriched',
##      'Enriched in x', or 'Enriched in y' according to the logfc/adjp.
##  2.  use the significance column for the geom_point color/fill in the volcano plot.
## My change to this idea would be to extract the colors from the input expressionset.

There appear to be many more genes which are increased in the Tumaco samples with respect to the Cali samples.

3.2 DE: Compare clinics, eosinophil samples

The remaining cell types all have pretty strong clinic-based variance; but I am not certain if it is consistent across cell types.

table(pData(tc_eosinophils)[["condition"]])
## 
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
tc_eosinophils_clinic_de_nobatch <- all_pairwise(tc_eosinophils, parallel = parallel,
                                                 model_batch = FALSE, filter = TRUE,
                                                 methods = methods)
## 
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 6 comparisons.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Dream/limma step 5/6: Running eBayes.
## Dream/limma step 6/6: Creating tables.
## Varpart/limma step 6/6: 1/6: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 2/6: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 3/6: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 4/6: Creating table: conditioncali_cure.  Adjust = BH
## Varpart/limma step 6/6: 5/6: Creating table: conditiontumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 6/6: Creating table: conditiontumaco_failure.  Adjust = BH
## Limma step 6/6: 1/6: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: tumaco_failure.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: conditioncali_cure.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: conditiontumaco_cure.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: conditiontumaco_failure.  Adjust = BH
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/3: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

tc_eosinophils_clinic_de_nobatch
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_eosinophils_clinic_de_nobatch[["deseq"]][["contrasts_performed"]]
## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"  
## [3] "tumaco_cure_vs_cali_cure"
tc_eosinophils_clinic_table_nobatch <- combine_de_tables(
  tc_eosinophils_clinic_de_nobatch, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_table_nobatch
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure         102            35         114
## 2      tumaco_cure_vs_cali_cure         834           814         856
##   edger_sigdown limma_sigup limma_sigdown
## 1            32          62            17
## 2           817         712           705
## Plot describing unique/shared genes in a differential expression table.

tc_eosinophils_clinic_sig_nobatch <- extract_significant_genes(
  tc_eosinophils_clinic_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco       62         17      114         32      102         35        9
## cure        712        705      856        817      834        814      698
##        ebseq_down basic_up basic_down
## tumaco         36        0          0
## cure          599     2768       2947

tc_eosinophils_clinic_de_sva <- all_pairwise(tc_eosinophils, model_batch = "svaseq",
                                             filter = TRUE, methods = methods)
## 
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 6 comparisons.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/3: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##             15             17              9
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

tc_eosinophils_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
tc_eosinophils_clinic_de_sva[["deseq"]][["contrasts_performed"]]
## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"  
## [3] "tumaco_cure_vs_cali_cure"
tc_eosinophils_clinic_table_sva <- combine_de_tables(
  tc_eosinophils_clinic_de_sva, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          89            57          90
## 2      tumaco_cure_vs_cali_cure         777           808         781
##   edger_sigdown limma_sigup limma_sigdown
## 1            41          77            30
## 2           806         723           710
## Plot describing unique/shared genes in a differential expression table.

tc_eosinophils_clinic_sig_sva <- extract_significant_genes(
  tc_eosinophils_clinic_table_sva,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco       77         30       90         41       89         57        9
## cure        723        710      781        806      777        808      698
##        ebseq_down basic_up basic_down
## tumaco         36        0          0
## cure          599     2768       2947

3.3 DE: Compare clinics, biopsy samples

Interestingly to me, the biopsy samples appear to have the least location-based variance. But we can perform an explicit DE and see how well that hypothesis holds up.

Note that these data include cure and fail samples for

table(pData(tc_biopsies)[["condition"]])
## 
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
tc_biopsies_clinic_de_sva <- all_pairwise(tc_biopsies, parallel = parallel,
                                          model_batch = "svaseq", filter = TRUE,
                                          methods = methods)
## 
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 6 comparisons.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/3: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure    tumaco_cure tumaco_failure 
##              4              9              5
## Choosing among model matrix columns: conditioncali_cure, conditiontumaco_cure, conditiontumaco_failure.

tc_biopsies_clinic_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
tc_biopsies_clinic_de_sva[["deseq"]][["contrasts_performed"]]
## [1] "tumaco_failure_vs_tumaco_cure" "tumaco_failure_vs_cali_cure"  
## [3] "tumaco_cure_vs_cali_cure"
tc_biopsies_clinic_table_sva <- combine_de_tables(
  tc_biopsies_clinic_de_sva, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_biopsies_clinic_table_sva
## A set of combined differential expression results.
##                           table deseq_sigup deseq_sigdown edger_sigup
## 1 tumaco_failure_vs_tumaco_cure          14            11          18
## 2      tumaco_cure_vs_cali_cure           1             0           0
##   edger_sigdown limma_sigup limma_sigdown
## 1             6           0             0
## 2             0           0             0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tc_biopsies_clinic_sig_sva <- extract_significant_genes(
  tc_biopsies_clinic_table_sva,
  excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_sig_sva-v{ver}.xlsx"))
tc_biopsies_clinic_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## tumaco        0          0       18          6       14         11       11
## cure          0          0        0          0        1          0       28
##        ebseq_down basic_up basic_down
## tumaco         60        0          0
## cure            1        0          0

3.4 DE: Compare clinics, monocyte samples

At least for the moment, I am only looking at the differences between no-batch vs. sva across clinics for the monocyte samples. This was chosen mostly arbitrarily.

3.4.1 DE: Compare clinics, monocytes without batch estimation

Our baseline is the comparison of the monocytes samples without batch in the model or surrogate estimation. In theory at least, this should correspond to the PCA plot above when no batch estimation was performed.

table(pData(tc_monocytes)[["condition"]])
## 
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
tc_monocytes_de_nobatch <- all_pairwise(tc_monocytes, model_batch = FALSE,
                                        filter = TRUE,
                                        methods = methods)
## 
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 10 comparisons.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Dream/limma step 5/6: Running eBayes.
## Dream/limma step 6/6: Creating tables.
## Varpart/limma step 6/6: 1/10: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 2/10: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 3/10: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 4/10: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Varpart/limma step 6/6: 5/10: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Varpart/limma step 6/6: 6/10: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 7/10: Creating table: conditioncali_cure.  Adjust = BH
## Varpart/limma step 6/6: 8/10: Creating table: conditioncali_failure.  Adjust = BH
## Varpart/limma step 6/6: 9/10: Creating table: conditiontumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 10/10: Creating table: conditiontumaco_failure.  Adjust = BH
## Limma step 6/6: 1/8: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/8: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/8: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/8: Creating table: tumaco_failure.  Adjust = BH
## Limma step 6/6: 5/8: Creating table: conditioncali_cure.  Adjust = BH
## Limma step 6/6: 6/8: Creating table: conditioncali_failure.  Adjust = BH
## Limma step 6/6: 7/8: Creating table: conditiontumaco_cure.  Adjust = BH
## Limma step 6/6: 8/8: Creating table: conditiontumaco_failure.  Adjust = BH
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/6: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/4: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/4: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/4: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/4: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

tc_monocytes_de_nobatch
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_monocytes_table_nobatch <- combine_de_tables(
  tc_monocytes_de_nobatch, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_nobatch-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_monocytes_table_nobatch
## A set of combined differential expression results.
##                            table deseq_sigup deseq_sigdown edger_sigup
## 1      cali_failure_vs_cali_cure          16            20          32
## 2  tumaco_failure_vs_tumaco_cure          48           121          60
## 3       tumaco_cure_vs_cali_cure         786           729         778
## 4 tumaco_failure_vs_cali_failure         638           492         518
##   edger_sigdown limma_sigup limma_sigdown
## 1            13          38             5
## 2           139          24            37
## 3           784         646           716
## 4           540         395           570
## Plot describing unique/shared genes in a differential expression table.

tc_monocytes_sig_nobatch <- extract_significant_genes(
  tc_monocytes_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_nobatch-v{ver}.xlsx"))
tc_monocytes_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali         38          5       32         13       16         20       92
## tumaco       24         37       60        139       48        121        0
## cure        646        716      778        784      786        729      646
## fail        395        570      518        540      638        492      165
##        ebseq_down basic_up basic_down
## cali           23      412        313
## tumaco         23      158        185
## cure          664     3198       3530
## fail          529     1642       2355

3.4.2 DE: Compare clinics, monocytes with svaseq

In contrast, the following comparison should give a view of the data corresponding to the svaseq PCA plot above. In the best case scenario, we should therefore be able to see some significane differences between the Tumaco cure and fail samples.

tc_monocytes_de_sva <- all_pairwise(tc_monocytes, model_batch = "svaseq",
                                    filter = TRUE,
                                    methods = methods)
## 
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 10 comparisons.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/6: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/4: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/4: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/4: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/4: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             21             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

tc_monocytes_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
tc_monocytes_table_sva <- combine_de_tables(
  tc_monocytes_de_sva, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_monocytes_table_sva
## A set of combined differential expression results.
##                            table deseq_sigup deseq_sigdown edger_sigup
## 1      cali_failure_vs_cali_cure          28            36          40
## 2  tumaco_failure_vs_tumaco_cure          34            86          29
## 3       tumaco_cure_vs_cali_cure         761           732         713
## 4 tumaco_failure_vs_cali_failure         684           584         583
##   edger_sigdown limma_sigup limma_sigdown
## 1            17          52             7
## 2            70          14            57
## 3           762         640           663
## 4           623         434           567
## Plot describing unique/shared genes in a differential expression table.

tc_monocytes_sig_sva <- extract_significant_genes(
  tc_monocytes_table_sva,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_sva-v{ver}.xlsx"))
tc_monocytes_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali         52          7       40         17       28         36       92
## tumaco       14         57       29         70       34         86        0
## cure        640        663      713        762      761        732      646
## fail        434        567      583        623      684        584      165
##        ebseq_down basic_up basic_down
## cali           23      412        313
## tumaco         23      158        185
## cure          664     3198       3530
## fail          529     1642       2355

3.4.3 DE Compare: How similar are the no-batch vs. SVA results?

The following block shows that these two results are exceedingly different, sugesting that the Cali cure/fail and Tumaco cure/fail cannot easily be considered in the same analysis. I did some playing around with my calculate_aucc function in this block and found that it is in some important way broken, at least if one expands the top-n genes to more than 20% of the number of genes in the data.

cali_table <- tc_monocytes_table_nobatch[["data"]][["cali"]]
table <- tc_monocytes_table_nobatch[["data"]][["tumaco"]]

cali_aucc <- calculate_aucc(cali_table, table, px = "deseq_adjp", py = "deseq_adjp",
                            lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc
## These two tables have an aucc value of: 0.0659267365585479 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 1.2, df = 11106, p-value = 0.2
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.006843  0.030345
## sample estimates:
##     cor 
## 0.01175

cali_table_sva <- tc_monocytes_table_sva[["data"]][["cali"]]
tumaco_table_sva <- tc_monocytes_table_sva[["data"]][["tumaco"]]
cali_aucc_sva <- calculate_aucc(cali_table_sva, tumaco_table_sva, px = "deseq_adjp",
                                py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_sva
## These two tables have an aucc value of: 0.0842668799254026 and correlation:
## 
##  Pearson's product-moment correlation
## 
## data:  tbl[[lx]] and tbl[[ly]]
## t = 16, df = 11106, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1356 0.1719
## sample estimates:
##    cor 
## 0.1538

3.5 DE: Compare clinics, neutrophil samples

tc_neutrophils_de_nobatch <- all_pairwise(tc_neutrophils, parallel = parallel,
                                          model_batch = FALSE, filter = TRUE,
                                          methods = methods)
## 
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 10 comparisons.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Dream/limma step 5/6: Running eBayes.
## Dream/limma step 6/6: Creating tables.
## Varpart/limma step 6/6: 1/10: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 2/10: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 3/10: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Varpart/limma step 6/6: 4/10: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Varpart/limma step 6/6: 5/10: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Varpart/limma step 6/6: 6/10: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 7/10: Creating table: conditioncali_cure.  Adjust = BH
## Varpart/limma step 6/6: 8/10: Creating table: conditioncali_failure.  Adjust = BH
## Varpart/limma step 6/6: 9/10: Creating table: conditiontumaco_cure.  Adjust = BH
## Varpart/limma step 6/6: 10/10: Creating table: conditiontumaco_failure.  Adjust = BH
## Limma step 6/6: 1/8: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/8: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/8: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/8: Creating table: tumaco_failure.  Adjust = BH
## Limma step 6/6: 5/8: Creating table: conditioncali_cure.  Adjust = BH
## Limma step 6/6: 6/8: Creating table: conditioncali_failure.  Adjust = BH
## Limma step 6/6: 7/8: Creating table: conditiontumaco_cure.  Adjust = BH
## Limma step 6/6: 8/8: Creating table: conditiontumaco_failure.  Adjust = BH
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/6: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/4: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/4: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/4: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/4: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

tc_neutrophils_de_nobatch
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: none.
## The primary analysis performed 21 comparisons.
tc_neutrophils_table_nobatch <- combine_de_tables(
  tc_neutrophils_de_nobatch, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_nobatch-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_neutrophils_table_nobatch
## A set of combined differential expression results.
##                            table deseq_sigup deseq_sigdown edger_sigup
## 1      cali_failure_vs_cali_cure          33            83          42
## 2  tumaco_failure_vs_tumaco_cure          95            50         112
## 3       tumaco_cure_vs_cali_cure         910           337         934
## 4 tumaco_failure_vs_cali_failure         984           257         808
##   edger_sigdown limma_sigup limma_sigdown
## 1            32          37            10
## 2            57           7            12
## 3           342         630           520
## 4           283         380           462
## Plot describing unique/shared genes in a differential expression table.

tc_neutrophils_sig_nobatch <- extract_significant_genes(
  tc_neutrophils_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_nobatch-v{ver}.xlsx"))
tc_neutrophils_sig_nobatch
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali         37         10       42         32       33         83       91
## tumaco        7         12      112         57       95         50        7
## cure        630        520      934        342      910        337      687
## fail        380        462      808        283      984        257      113
##        ebseq_down basic_up basic_down
## cali           38      404        274
## tumaco          7        7          2
## cure          299     2219       2496
## fail          312     1193       1681

tc_neutrophils_de_sva <- all_pairwise(tc_neutrophils, parallel = parallel,
                                      model_batch = "svaseq", filter = TRUE,
                                      methods = methods)
## 
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 10 comparisons.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/6: Creating table: cali_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: tumaco_cure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: tumaco_failure_vs_cali_cure.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: tumaco_cure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: tumaco_failure_vs_cali_failure.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: tumaco_failure_vs_tumaco_cure.  Adjust = BH
## Limma step 6/6: 1/4: Creating table: cali_cure.  Adjust = BH
## Limma step 6/6: 2/4: Creating table: cali_failure.  Adjust = BH
## Limma step 6/6: 3/4: Creating table: tumaco_cure.  Adjust = BH
## Limma step 6/6: 4/4: Creating table: tumaco_failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##      cali_cure   cali_failure    tumaco_cure tumaco_failure 
##             18              3             20             21
## Choosing among model matrix columns: conditioncali_cure, conditioncali_failure, conditiontumaco_cure, conditiontumaco_failure.

tc_neutrophils_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
tc_neutrophils_table_sva <- combine_de_tables(
  tc_neutrophils_de_sva, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_neutrophils_table_sva
## A set of combined differential expression results.
##                            table deseq_sigup deseq_sigdown edger_sigup
## 1      cali_failure_vs_cali_cure          91           181         103
## 2  tumaco_failure_vs_tumaco_cure          86            38          72
## 3       tumaco_cure_vs_cali_cure         844           379         843
## 4 tumaco_failure_vs_cali_failure         696           197         608
##   edger_sigdown limma_sigup limma_sigdown
## 1           122          75            50
## 2            23          37            47
## 3           367         645           481
## 4           214         310           325
## Plot describing unique/shared genes in a differential expression table.

tc_neutrophils_sig_sva <- extract_significant_genes(
  tc_neutrophils_table_sva,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_sva-v{ver}.xlsx"))
tc_neutrophils_sig_sva
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## cali         75         50      103        122       91        181       91
## tumaco       37         47       72         23       86         38        7
## cure        645        481      843        367      844        379      687
## fail        310        325      608        214      696        197      113
##        ebseq_down basic_up basic_down
## cali           38      404        274
## tumaco          7        7          2
## cure          299     2219       2496
## fail          312     1193       1681

3.6 GSEA: Extract clinic-specific genes

Given the above comparisons, we can extract some gene sets which resulted from those DE analyses and eventually perform some ontology/KEGG/reactome/etc searches. This reminds me, I want to make my extract_significant_ functions to return gene-set data structures and my various ontology searches to take them as inputs. This should help avoid potential errors when extracting up/down genes.

clinic_sigenes_up <- rownames(tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]])
clinic_sigenes_down <- rownames(tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]])
clinic_sigenes <- c(clinic_sigenes_up, clinic_sigenes_down)

tc_eosinophils_sigenes_up <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_eosinophils_sigenes_down <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_monocytes_sigenes_up <- rownames(tc_monocytes_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_monocytes_sigenes_down <- rownames(tc_monocytes_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_neutrophils_sigenes_up <- rownames(tc_neutrophils_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_neutrophils_sigenes_down <- rownames(tc_neutrophils_sig_sva[["deseq"]][["downs"]][["cure"]])

tc_eosinophils_sigenes <- c(tc_eosinophils_sigenes_up,
                            tc_eosinophils_sigenes_down)
tc_monocytes_sigenes <- c(tc_monocytes_sigenes_up,
                          tc_monocytes_sigenes_down)
tc_neutrophils_sigenes <- c(tc_neutrophils_sigenes_up,
                            tc_neutrophils_sigenes_down)

3.7 GSEA: gProfiler of genes deemed up/down when comparing Cali and Tumaco

I was curious to try to understand why the two clinics appear to be so different vis a vis their PCA/DE; so I thought that gProfiler might help boil those results down to something more digestible.

3.7.1 GSEA: Compare clinics, all samples

Note that in the following block I used the function simple_gprofiler(), but later in this document I will use all_gprofiler(). The first invocation limits the search to a single table, while the second will iterate over every result in a pairwise differential expression analysis.

In this instance, we are looking at the vector of gene IDs deemed significantly different between the two clinics in either the up or down direction.

One other thing worth noting, the new version of gProfiler provides some fun interactive plots. I will add an example here.

tc_eosinophil_gprofiler <- simple_gprofiler(
  tc_eosinophils_sigenes_up,
  excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophil_gprofiler
## A set of ontologies produced by gprofiler using 777
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 20 MF
## 218 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 540 TF
## 6 MIRNA
## 0 HPA
## 2 CORUM
## 0 HP hits.
clinic_gp <- simple_gprofiler(
  clinic_sigenes,
  excel = glue("{gsea_prefix}/both_clinics_cali_up-v{ver}.xlsx"))
clinic_gp$pvalue_plots$REAC

clinic_gp$pvalue_plots$BP
## NULL
clinic_gp$pvalue_plots$TF

clinic_gp$interactive_plots$GO
## NULL

3.7.2 GSEA: Compare clinics, Eosinophil samples

In the following block, I am looking at the gProfiler over represented groups observed across clinics in only the Eosinophils. First I do so for all genes(up or down), followed by only the up and down groups. Each of the following will include only the Reactome and GO:BP plots. These searches did not have too many other hits, excepting the transcription factor database.

tc_eosinophils_gp <- simple_gprofiler(
  tc_eosinophils_sigenes,
  excel = glue("{gsea_prefix}/eosinophil_clinics-v{ver}.xlsx"))
tc_eosinophils_gp
## A set of ontologies produced by gprofiler using 1585
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 39 MF
## 276 BP
## 0 KEGG
## 5 REAC
## 0 WP
## 563 TF
## 10 MIRNA
## 0 HPA
## 5 CORUM
## 0 HP hits.
tc_eosinophils_gp$pvalue_plots$REAC

tc_eosinophils_gp$pvalue_plots$BP
## NULL
tc_eosinophils_up_gp <- simple_gprofiler(
  tc_eosinophils_sigenes_up,
  excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophils_up_gp
## A set of ontologies produced by gprofiler using 777
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 20 MF
## 218 BP
## 0 KEGG
## 2 REAC
## 0 WP
## 540 TF
## 6 MIRNA
## 0 HPA
## 2 CORUM
## 0 HP hits.
tc_eosinophils_up_gp$pvalue_plots$REAC

tc_eosinophils_down_gp <- simple_gprofiler(
  tc_eosinophils_sigenes_down,
  excel = glue("{gsea_prefix}/eosinophil_clinics_cali_up-v{ver}.xlsx"))
tc_eosinophils_down_gp
## A set of ontologies produced by gprofiler using 808
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 14 MF
## 94 BP
## 2 KEGG
## 9 REAC
## 2 WP
## 77 TF
## 0 MIRNA
## 0 HPA
## 0 CORUM
## 0 HP hits.
tc_eosinophils_down_gp$pvalue_plots$REAC

3.7.3 GSEA: Compare clinics, Monocyte samples

In the following block I repeated the above query, but this time looking at the monocyte samples.

tc_monocytes_up_gp <- simple_gprofiler(
  tc_monocytes_sigenes,
  excel = glue("{gsea_prefix}/monocyte_clinics-v{ver}.xlsx"))
tc_monocytes_up_gp
## A set of ontologies produced by gprofiler using 1493
## genes against the hsapiens annotations and significance cutoff 0.05.
## There are: 
## 55 MF
## 476 BP
## 0 KEGG
## 6 REAC
## 4 WP
## 495 TF
## 2 MIRNA
## 0 HPA
## 1 CORUM
## 0 HP hits.
tc_monocytes_up_gp$pvalue_plots$REAC

tc_monocytes_up_gp$pvalue_plots$BP
## NULL
tc_monocytes_down_gp <- simple_gprofiler(
  tc_monocytes_sigenes_down,
  excel = glue("{gsea_prefix}/monocyte_clinics_cali_up-v{ver}.xlsx"))
tc_monocytes_down_gp$pvalue_plots$REAC

tc_monocytes_down_gp$pvalue_plots$BP
## NULL

3.7.3.1 GSEA: Compare clinics, Neutrophil samples

Ibid. This time looking at the Neutrophils. Thus the first two images should be a superset of the second and third pairs of images; assuming that the genes in the up/down list do not cause the groups to no longer be significant. Interestingly, the reactome search did not return any hits for the increased search.

tc_neutrophils_gp <- simple_gprofiler(
  tc_neutrophils_sigenes,
  excel = glue("{gsea_prefix}/neutrophil_clinics-v{ver}.xlsx"))
## tc_neutrophils_gp$pvalue_plots$REAC ## no hits
tc_neutrophils_gp$pvalue_plots$BP
## NULL
tc_neutrophils_gp$pvalue_plots$TF

tc_neutrophils_up_gp <- simple_gprofiler(
  tc_neutrophils_sigenes_up,
  excel = glue("{gsea_prefix}/neutrophil_clinics_tumaco_up-v{ver}.xlsx"))
## tc_neutrophils_up_gp$pvalue_plots$REAC ## No hits
tc_neutrophils_up_gp$pvalue_plots$BP
## NULL
tc_neutrophils_down_gp <- simple_gprofiler(
  tc_neutrophils_sigenes_down,
  excel = glue("{gsea_prefix}/neutrophil_clinics_cali_up-v{ver}.xlsx"))
tc_neutrophils_down_gp$pvalue_plots$REAC

tc_neutrophils_down_gp$pvalue_plots$BP
## NULL

4 Compare DE: How similar are Tumaco C/F vs. Cali C/F

The following expands the cross-clinic query above to also test the neutrophils. Once again, I think it will pretty strongly support the hypothesis that the two clinics are not compatible.

We are concerned that the clinic-based batch effect may make our results essentially useless. One way to test this concern is to compare the set of genes observed different between the Cali Cure/Fail vs. the Tumaco Cure/Fail.

cali_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["cali"]]
tumaco_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["tumaco"]]

cali_merged_nobatch <- merge(cali_table_nobatch, tumaco_table_nobatch, by="row.names")
cor.test(cali_merged_nobatch[, "deseq_logfc.x"], cali_merged_nobatch[, "deseq_logfc.y"])
## 
##  Pearson's product-moment correlation
## 
## data:  cali_merged_nobatch[, "deseq_logfc.x"] and cali_merged_nobatch[, "deseq_logfc.y"]
## t = -16, df = 9242, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1798 -0.1401
## sample estimates:
##   cor 
## -0.16
cali_aucc_nobatch <- calculate_aucc(cali_table_nobatch, tumaco_table_nobatch, px = "deseq_adjp",
                                    py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_nobatch$plot

5 Tumaco and Cali, cure vs. fail

In all of the above, we are looking to understand the differences between the two location. Let us now step back and perform the original question: fail/cure without regard to location.

I performed this query with a few different parameters, notably with(out) sva and again using each cell type, including biopsies. The main reasion I am keeping these comparisons is in the relatively weak hope that there will be sufficient signal in the full dataset that it might be able to overcome the apparently ridiculous batch effect from the two clinics.

5.1 All cell types together, with(out) SVA

table(pData(tc_valid)[["condition"]])
## 
##    cure failure 
##     122      62
tc_all_cf_de_sva <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
                                 model_batch = "svaseq")
## 
##    cure failure 
##     122      62
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_all_cf_table_sva <- combine_de_tables(
  tc_all_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_all_cf_sig_sva <- extract_significant_genes(
  tc_all_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_sva-v{ver}.xlsx"))

tc_all_cf_de_batch <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
                                   model_batch = TRUE)
## 
##    cure failure 
##     122      62 
## 
##  1  2  3 
## 83 50 51
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + batch.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     122      62
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_all_cf_table_batch <- combine_de_tables(
  tc_all_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_batch-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_all_cf_sig_batch <- extract_significant_genes(
  tc_all_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_batch-v{ver}.xlsx"))

5.2 All cell types except biopsies

I am not sure if this is the best choice, but I call the set of all samples excluding biopsies ‘clinical’.

table(pData(tc_clinical_nobiop)[["condition"]])
## 
##    cure failure 
##     109      57
tc_clinical_cf_de_sva <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
                                      model_batch = "svaseq",
                                      methods = methods)
## 
##    cure failure 
##     109      57
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_clinical_cf_de_sva
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.8865
## basic_vs_ebseq      0.7583
## basic_vs_edger      0.9008
## basic_vs_limma      0.9413
## basic_vs_noiseq     0.9206
## deseq_vs_ebseq      0.8258
## deseq_vs_edger      0.9918
## deseq_vs_limma      0.8939
## deseq_vs_noiseq     0.9403
## ebseq_vs_edger      0.8175
## ebseq_vs_limma      0.8180
## ebseq_vs_noiseq     0.8311
## edger_vs_limma      0.8974
## edger_vs_noiseq     0.9512
## limma_vs_noiseq     0.8734
tc_clinical_cf_table_sva <- combine_de_tables(
  tc_clinical_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_clinical_cf_table_sva
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure         186            96         214            93
##   limma_sigup limma_sigdown
## 1          97            79
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tc_clinical_cf_sig_sva <- extract_significant_genes(
  tc_clinical_cf_table_sva, according_to = "deseq",
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_sva-v{ver}.xlsx"))
tc_clinical_cf_sig_sva
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         deseq_up deseq_down
## outcome      186         96

tc_clinical_cf_de_batch <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
                                        model_batch = TRUE,
                                        methods = methods)
## 
##    cure failure 
##     109      57 
## 
## eosinophils   monocytes neutrophils 
##          41          63          62
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + batch.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##     109      57
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_clinical_cf_de_batch
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 falr_vs_cr
## basic_vs_deseq      0.6416
## basic_vs_ebseq      0.7583
## basic_vs_edger      0.6418
## basic_vs_limma      0.7121
## basic_vs_noiseq     0.9206
## deseq_vs_ebseq      0.7638
## deseq_vs_edger      0.9991
## deseq_vs_limma      0.8070
## deseq_vs_noiseq     0.7624
## ebseq_vs_edger      0.7635
## ebseq_vs_limma      0.6891
## ebseq_vs_noiseq     0.8311
## edger_vs_limma      0.8115
## edger_vs_noiseq     0.7625
## limma_vs_noiseq     0.6686
tc_clinical_cf_table_batch <- combine_de_tables(
  tc_clinical_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_batch-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_clinical_cf_table_batch
## A set of combined differential expression results.
##             table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 failure_vs_cure         106            68         116            74
##   limma_sigup limma_sigdown
## 1          83            45
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tc_clinical_cf_sig_batch <- extract_significant_genes(
  tc_clinical_cf_table_batch, according_to = "deseq",
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_batch-v{ver}.xlsx"))
tc_clinical_cf_sig_batch
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##         deseq_up deseq_down
## outcome      106         68

5.2.1 A portion of Supplemental Figure 11.

num_color <- color_choices[["cf"]][["cure"]]
den_color <- color_choices[["cf"]][["failure"]]
tc_clinical_cf_table <- tc_clinical_cf_table_sva[["data"]][["outcome"]]
tc_clinical_cf_volcano_top10 <- plot_volcano_condition_de(
  tc_clinical_cf_table, "outcome", label = 10,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/s11c_tc_clinical_cf_volcano_labeled_top10.svg")
tc_clinical_cf_volcano_top10[["plot"]]
dev.off()
## png 
##   2
tc_clinical_cf_volcano_top10[["plot"]]

5.3 Biopsies, with(out) SVA

In the following block, we repeat the same question, but using only the biopsy samples from both clinics.

tc_biopsies_cf <- set_expt_conditions(tc_biopsies, fact = "finaloutcome")
## The numbers of samples by condition are:
## 
##    cure failure 
##      13       5
tc_biopsies_cf_de_sva <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
                                      model_batch = "svaseq")
## 
##    cure failure 
##      13       5
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_biopsies_cf_table_sva <- combine_de_tables(
  tc_biopsies_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Biopsies/tc_biopsies_cf_table_sva-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_biopsies_cf_sig_sva <- extract_significant_genes(
  tc_biopsies_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_sva-v{ver}.xlsx"))

tc_biopsies_cf_de_batch <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
                                        model_batch = TRUE)
## 
##    cure failure 
##      13       5 
## 
##  1 
## 18
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The condition+batch model failed. Does your experimental design support both condition and batch? Using only a conditional model.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## The condition+batch model failed. Does your experimental design support both condition and batch? Using only a conditional model.
## Getting factors from: ~ 0 + condition.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Dream/limma step 5/6: Running eBayes.
## Dream/limma step 6/6: Creating tables.
## Varpart/limma step 6/6: 1/3: Creating table: failure_vs_cure.  Adjust = BH
## Varpart/limma step 6/6: 2/3: Creating table: conditioncure.  Adjust = BH
## Varpart/limma step 6/6: 3/3: Creating table: conditionfailure.  Adjust = BH
## Limma step 6/6: 1/4: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/4: Creating table: failure.  Adjust = BH
## Limma step 6/6: 3/4: Creating table: conditioncure.  Adjust = BH
## Limma step 6/6: 4/4: Creating table: conditionfailure.  Adjust = BH
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The condition+batch model failed. Does your experimental design support both condition and batch? Using only a conditional model.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## The condition+batch model failed. Does your experimental design support both condition and batch? Using only a conditional model.
## Getting factors from: ~ 0 + condition.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The condition+batch model failed. Does your experimental design support both condition and batch? Using only a conditional model.
## The provided conditions are:
## conditions
##    cure failure 
##      13       5
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_biopsies_cf_table_batch <- combine_de_tables(
  tc_biopsies_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_table_batch-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_biopsies_cf_sig_batch <- extract_significant_genes(
  tc_biopsies_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_batch-v{ver}.xlsx"))

5.4 Eosinophils, with(out) SVA

In the following block, we repeat the same question, but using only the Eosinophil samples from both clinics.

tc_eosinophils_cf <- set_expt_conditions(tc_eosinophils, fact = "finaloutcome")
## The numbers of samples by condition are:
## 
##    cure failure 
##      32       9
tc_eosinophils_cf_de_sva <- all_pairwise(tc_eosinophils_cf, filter = TRUE, methods = methods,
                                         model_batch = "svaseq")
## 
##    cure failure 
##      32       9
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_eosinophils_cf_table_sva <- combine_de_tables(
  tc_eosinophils_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Eosinophils/tc_eosinophils_cf_table_sva-v{ver}.xlsx"))
tc_eosinophils_cf_sig_sva <- extract_significant_genes(
  tc_eosinophils_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_sva-v{ver}.xlsx"))

tc_eosinophils_cf_de_batch <- all_pairwise(tc_eosinophils_cf, filter = TRUE,
                                           model_batch = TRUE,
                                           methods = methods)
## 
##    cure failure 
##      32       9 
## 
##  3  2  1 
## 13 14 14
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + batch.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      32       9
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_eosinophils_cf_table_batch <- combine_de_tables(
  tc_eosinophils_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_table_batch-v{ver}.xlsx"))
tc_eosinophils_cf_sig_batch <- extract_significant_genes(
  tc_eosinophils_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_batch-v{ver}.xlsx"))

5.5 Monocytes, with(out) SVA

Repeat yet again, this time with the monocyte samples. The idea is to see if there is a cell type which is particularly good (or bad) at discriminating the two clinics.

tc_monocytes_cf <- set_expt_conditions(tc_monocytes, fact = "finaloutcome")
## The numbers of samples by condition are:
## 
##    cure failure 
##      39      24
tc_monocytes_cf_de_sva <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
                                       model_batch = "svaseq")
## 
##    cure failure 
##      39      24
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_monocytes_cf_table_sva <- combine_de_tables(
  tc_monocytes_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Monocytes/tc_monocytes_cf_table_sva-v{ver}.xlsx"))
tc_monocytes_cf_sig_sva <- extract_significant_genes(
  tc_monocytes_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_sva-v{ver}.xlsx"))

tc_monocytes_cf_de_batch <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
                                         model_batch = TRUE)
## 
##    cure failure 
##      39      24 
## 
##  3  2  1 
## 19 18 26
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + batch.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      39      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_monocytes_cf_table_batch <- combine_de_tables(
  tc_monocytes_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_table_batch-v{ver}.xlsx"))
tc_monocytes_cf_sig_batch <- extract_significant_genes(
  tc_monocytes_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_batch-v{ver}.xlsx"))

5.6 Neutrophils, with(out) SVA

Last try, this time using the Neutrophil samples.

tc_neutrophils_cf <- set_expt_conditions(tc_neutrophils, fact = "finaloutcome")
## The numbers of samples by condition are:
## 
##    cure failure 
##      38      24
tc_neutrophils_cf_de_sva <- all_pairwise(tc_neutrophils_cf, parallel = parallel,
                                         filter = TRUE, model_batch = "svaseq",
                                         methods = methods)
## 
##    cure failure 
##      38      24
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_neutrophils_cf_table_sva <- combine_de_tables(
  tc_neutrophils_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Neutrophils/tc_neutrophils_cf_table_sva-v{ver}.xlsx"))
tc_neutrophils_cf_sig_sva <- extract_significant_genes(
  tc_neutrophils_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_sva-v{ver}.xlsx"))

tc_neutrophils_cf_de_batch <- all_pairwise(tc_neutrophils_cf, filter = TRUE,
                                           model_batch = TRUE,
                                           methods = methods)
## 
##    cure failure 
##      38      24 
## 
##  3  2  1 
## 19 18 25
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + batch.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + batch.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##    cure failure 
##      38      24
## Choosing among model matrix columns: conditioncure, conditionfailure.

tc_neutrophils_cf_table_batch <- combine_de_tables(
  tc_neutrophils_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_table_batch-v{ver}.xlsx"))
tc_neutrophils_cf_sig_batch <- extract_significant_genes(
  tc_neutrophils_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_batch-v{ver}.xlsx"))

6 Visit comparisons

Later in this document I do a bunch of visit/cf comparisons. In this block I want to explicitly only compare v1 to other visits. This is something I did quite a lot in the 2019 datasets, but never actually moved to this document.

v1_vs_later <- all_pairwise(tc_v1vs, model_batch = "svaseq", methods = methods,
                            filter = TRUE)
## 
## first later 
##    65   101
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: later_vs_first.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: first.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: later.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## first later 
##    65   101
## Choosing among model matrix columns: conditionfirst, conditionlater.

v1_vs_later_table <- combine_de_tables(
  v1_vs_later, keepers = visit_v1later,
  excel = glue("{visit_prefix}/v1_vs_later_tables-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
v1_vs_later_sig <- extract_significant_genes(
  v1_vs_later_table,
  excel = glue("{visit_prefix}/v1_vs_later_sig-v{ver}.xlsx"))

6.0.0.1 GSEA: V1 vs other visits.

v1later_gp <- all_gprofiler(v1_vs_later_sig)
v1later_gp[[1]]$pvalue_plots$REAC

v1later_gp[[2]]$pvalue_plots$REAC

7 Sex comparison

tc_sex_de <- all_pairwise(tc_sex, model_batch = "svaseq", methods = methods,
                          filter = TRUE)
## 
## female   male 
##     28    156
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: male_vs_female.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: female.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: male.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## female   male 
##     28    156
## Choosing among model matrix columns: conditionfemale, conditionmale.

tc_sex_table <- combine_de_tables(
  tc_sex_de, excel = glue("{sex_prefix}/tc_sex_table-v{ver}.xlsx"))
tc_sex_sig <- extract_significant_genes(
  tc_sex_table, excel = glue("{sex_prefix}/tc_sex_sig-v{ver}.xlsx"))
tc_sex_gp <- all_gprofiler(tc_sex_sig)
tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")
## subset_expt(): There were 184, now there are 122 samples.
tc_sex_cure_de <- all_pairwise(tc_sex_cure, model_batch = "svaseq",
                               filter = TRUE,
                               methods = methods)
## 
## female   male 
##     19    103
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 3 comparisons.
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/1: Creating table: male_vs_female.  Adjust = BH
## Limma step 6/6: 1/2: Creating table: female.  Adjust = BH
## Limma step 6/6: 2/2: Creating table: male.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
## female   male 
##     19    103
## Choosing among model matrix columns: conditionfemale, conditionmale.

tc_sex_cure_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 mal_vs_fml
## basic_vs_deseq      0.6740
## basic_vs_ebseq      0.5398
## basic_vs_edger      0.8003
## basic_vs_limma      0.9099
## basic_vs_noiseq     0.8159
## deseq_vs_ebseq      0.6490
## deseq_vs_edger      0.8641
## deseq_vs_limma      0.6481
## deseq_vs_noiseq     0.7503
## ebseq_vs_edger      0.6887
## ebseq_vs_limma      0.5899
## ebseq_vs_noiseq     0.6346
## edger_vs_limma      0.7769
## edger_vs_noiseq     0.8670
## limma_vs_noiseq     0.7196
tc_sex_cure_table <- combine_de_tables(
  tc_sex_cure_de, excel = glue("{sex_prefix}/tc_sex_cure_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_sex_cure_table
## A set of combined differential expression results.
##            table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 male_vs_female          70            74          64            80
##   limma_sigup limma_sigdown
## 1          40            74
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

tc_sex_cure_sig <- extract_significant_genes(
  tc_sex_cure_table, excel = glue("{sex_prefix}/tc_sex_cure_sig-v{ver}.xlsx"))
tc_sex_cure_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                limma_up limma_down edger_up edger_down deseq_up deseq_down
## male_vs_female       40         74       64         80       70         74
##                ebseq_up ebseq_down basic_up basic_down
## male_vs_female       10          8       12          5

7.0.0.1 GSEA: Sex comparisons both clinics

tc_sex_cure_gp <- all_gprofiler(tc_sex_cure_sig)
tc_sex_cure_gp
## Running gProfiler on every set of significant genes found:
##                     BP CORUM HP HPA KEGG MIRNA MF REAC TF WP
## male_vs_female_up    3     0  1   0    1     0  1    0  3  0
## male_vs_female_down  3     0  0   0    0     0  0    1  0  0
tc_sex_cure_gp[[1]][["pvalue_plots"]][["BP"]]
## NULL
tc_sex_cure_gp[[2]][["pvalue_plots"]][["BP"]]
## NULL

8 Ethnicity comparisons

tc_ethnicity_de <- all_pairwise(tc_etnia_expt, model_batch = "svaseq",
                                filter = TRUE,
                                methods = methods)
## 
##  afrocol indigena  mestiza 
##       91       46       47
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 6 comparisons.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5 + SV6.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/3: Creating table: indigena_vs_afrocol.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: mestiza_vs_afrocol.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: mestiza_vs_indigena.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: afrocol.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: indigena.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: mestiza.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       91       46       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.

tc_ethnicity_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
tc_ethnicity_table <- combine_de_tables(
  tc_ethnicity_de, keepers = ethnicity_contrasts,
  excel = glue("{eth_prefix}/tc_ethnicity_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
tc_ethnicity_table
## A set of combined differential expression results.
##                 table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena          48            22          53            23
## 2  mestiza_vs_afrocol          51           171          58           180
## 3 indigena_vs_afrocol          67           269          72           280
##   limma_sigup limma_sigdown
## 1          24            14
## 2          44            90
## 3          78           144
## Plot describing unique/shared genes in a differential expression table.

tc_ethnicity_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]

tc_ethnicity_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]

tc_ethnicity_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]

tc_ethnicity_sig <- extract_significant_genes(
  tc_ethnicity_table, excel = glue("{eth_prefix}/tc_ethnicity_sig-v{ver}.xlsx"))

ethnicity_cure <- subset_expt(tc_etnia_expt, subset = "finaloutcome=='cure'")
## subset_expt(): There were 184, now there are 122 samples.
ethnicity_cure_de <- all_pairwise(ethnicity_cure, model_batch = "svaseq",
                                  filter = TRUE,
                                  methods = methods)
## 
##  afrocol indigena  mestiza 
##       39       36       47
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 6 comparisons.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Dream/limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Dream/limma 2/6: Attempting voomWithDreamWeights.

## Dream/limma step 3/6: running dream.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## Error in cbind(L, Luni) : 
##   number of rows of matrices must match (see arg 2)
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.

## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Getting factors from: ~ 0 + condition + SV1 + SV2 + SV3 + SV4 + SV5.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Creating tables without an intercept.
## Limma step 6/6: 1/3: Creating table: indigena_vs_afrocol.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: mestiza_vs_afrocol.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: mestiza_vs_indigena.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: afrocol.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: indigena.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: mestiza.  Adjust = BH
## Starting noiseq pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## The provided conditions are:
## conditions
##  afrocol indigena  mestiza 
##       39       36       47
## Choosing among model matrix columns: conditionafrocol, conditionindigena, conditionmestiza.

ethnicity_cure_table <- combine_de_tables(
  ethnicity_cure_de, keepers = ethnicity_contrasts,
  excel = glue("{eth_prefix}/ethnicity_cure_table-v{ver}.xlsx"))
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
## Could not create a linear model of the data.
## Going to perform a scatter plot without linear model.
ethnicity_cure_table
## A set of combined differential expression results.
##                 table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 mestiza_vs_indigena          64            24          60            26
## 2  mestiza_vs_afrocol          68           168          78           167
## 3 indigena_vs_afrocol          87           352          97           340
##   limma_sigup limma_sigdown
## 1          36            15
## 2          63           100
## 3         109           180
## Plot describing unique/shared genes in a differential expression table.

ethnicity_cure_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]

ethnicity_cure_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]

ethnicity_cure_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]

ethnicity_cure_sig <- extract_significant_genes(
  ethnicity_cure_table, excel = glue("{eth_prefix}/ethnicity_cure_sig-v{ver}.xlsx"))
ethnicity_cure_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## mestizo_indigenous       36         15       60         26       64         24
## mestizo_afrocol          63        100       78        167       68        168
## indigenous_afrocol      109        180       97        340       87        352
##                    ebseq_up ebseq_down basic_up basic_down
## mestizo_indigenous       11          0        0          1
## mestizo_afrocol           6         16       12         20
## indigenous_afrocol       10         38      182        133

8.0.0.1 GSEA: Ethnicity differences

Performed once with both clinics and again with only Tumaco.

tc_ethnicity_gp <- all_gprofiler(tc_ethnicity_sig)
pander::pander(sessionInfo())

R version 4.4.1 (2024-06-14)

Platform: x86_64-conda-linux-gnu

locale: C

attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: edgeR(v.4.0.16), ruv(v.0.9.7.1), DOSE(v.3.28.2), forcats(v.1.0.0), dplyr(v.1.1.4), hpgltools(v.1.0), Matrix(v.1.6-5), glue(v.1.7.0), SummarizedExperiment(v.1.32.0), GenomicRanges(v.1.54.1), GenomeInfoDb(v.1.38.6), IRanges(v.2.36.0), S4Vectors(v.0.40.2), MatrixGenerics(v.1.14.0), matrixStats(v.1.2.0), Biobase(v.2.62.0) and BiocGenerics(v.0.48.1)

loaded via a namespace (and not attached): fs(v.1.6.3), bitops(v.1.0-7), enrichplot(v.1.22.0), blockmodeling(v.1.1.5), HDO.db(v.0.99.1), httr(v.1.4.7), RColorBrewer(v.1.1-3), numDeriv(v.2016.8-1.1), tools(v.4.4.1), backports(v.1.4.1), utf8(v.1.2.4), R6(v.2.5.1), lazyeval(v.0.2.2), mgcv(v.1.9-1), withr(v.3.0.0), gridExtra(v.2.3), preprocessCore(v.1.64.0), fdrtool(v.1.2.18), cli(v.3.6.2), scatterpie(v.0.2.1), labeling(v.0.4.3), slam(v.0.1-53), EBSeq(v.2.0.0), sass(v.0.4.8), mvtnorm(v.1.2-4), robustbase(v.0.99-4), genefilter(v.1.84.0), yulab.utils(v.0.1.7), ggupset(v.0.4.0), gson(v.0.1.0), R.utils(v.2.12.3), limma(v.3.58.1), RSQLite(v.2.3.5), gridGraphics(v.0.5-1), generics(v.0.1.3), gtools(v.3.9.5), crosstalk(v.1.2.1), zip(v.2.3.1), GO.db(v.3.18.0), fansi(v.1.0.6), abind(v.1.4-5), R.methodsS3(v.1.8.2), lifecycle(v.1.0.4), yaml(v.2.3.8), gplots(v.3.1.3.1), qvalue(v.2.34.0), SparseArray(v.1.2.4), grid(v.4.4.1), blob(v.1.2.4), promises(v.1.2.1), crayon(v.1.5.2), lattice(v.0.22-5), cowplot(v.1.1.3), annotate(v.1.80.0), KEGGREST(v.1.42.0), pillar(v.1.9.0), knitr(v.1.45), varhandle(v.2.0.6), fgsea(v.1.28.0), boot(v.1.3-29), corpcor(v.1.6.10), codetools(v.0.2-19), fastmatch(v.1.1-4), ggfun(v.0.1.8), data.table(v.1.15.0), Vennerable(v.3.1.0.9000), treeio(v.1.29.1), vctrs(v.0.6.5), png(v.0.1-8), Rdpack(v.2.6), testthat(v.3.2.1), gtable(v.0.3.4), cachem(v.1.0.8), xfun(v.0.42), openxlsx(v.4.2.5.2), rbibutils(v.2.2.16), S4Arrays(v.1.2.0), mime(v.0.12), RcppEigen(v.0.3.3.9.4), tidygraph(v.1.3.1), survival(v.3.5-8), iterators(v.1.0.14), NOISeq(v.2.46.0), statmod(v.1.5.0), ellipsis(v.0.3.2), nlme(v.3.1-164), pbkrtest(v.0.5.2), ggtree(v.3.15.0), bit64(v.4.0.5), EnvStats(v.2.8.1), UpSetR(v.1.4.0), rprojroot(v.2.0.4), bslib(v.0.6.1), KernSmooth(v.2.23-22), colorspace(v.2.1-0), DBI(v.1.2.2), DESeq2(v.1.42.0), tidyselect(v.1.2.0), bit(v.4.0.5), compiler(v.4.4.1), graph(v.1.80.0), desc(v.1.4.3), DelayedArray(v.0.28.0), plotly(v.4.10.4), shadowtext(v.0.1.3), scales(v.1.3.0), caTools(v.1.18.2), DEoptimR(v.1.1-3), remaCor(v.0.0.18), RBGL(v.1.78.0), stringr(v.1.5.1), digest(v.0.6.34), minqa(v.1.2.6), variancePartition(v.1.32.5), rmarkdown(v.2.25), aod(v.1.3.3), XVector(v.0.42.0), RhpcBLASctl(v.0.23-42), htmltools(v.0.5.7), pkgconfig(v.2.0.3), lme4(v.1.1-35.1), lpsymphony(v.1.30.0), highr(v.0.10), fastmap(v.1.1.1), rlang(v.1.1.3), htmlwidgets(v.1.6.4), shiny(v.1.8.0), farver(v.2.1.1), jquerylib(v.0.1.4), IHW(v.1.30.0), jsonlite(v.1.8.8), BiocParallel(v.1.36.0), GOSemSim(v.2.28.1), R.oo(v.1.26.0), RCurl(v.1.98-1.14), magrittr(v.2.0.3), GenomeInfoDbData(v.1.2.11), ggplotify(v.0.1.2), patchwork(v.1.2.0), munsell(v.0.5.0), Rcpp(v.1.0.12), ape(v.5.8), viridis(v.0.6.5), stringi(v.1.8.3), ggraph(v.2.1.0), brio(v.1.1.4), zlibbioc(v.1.48.0), MASS(v.7.3-60.0.1), plyr(v.1.8.9), parallel(v.4.4.1), ggrepel(v.0.9.5), Biostrings(v.2.70.2), graphlayouts(v.1.1.0), splines(v.4.4.1), pander(v.0.6.5), locfit(v.1.5-9.8), igraph(v.2.0.2), reshape2(v.1.4.4), pkgload(v.1.3.4), gprofiler2(v.0.2.3), XML(v.3.99-0.16.1), evaluate(v.0.23), BiocManager(v.1.30.25), nloptr(v.2.0.3), foreach(v.1.5.2), tweenr(v.2.0.2), httpuv(v.1.6.14), tidyr(v.1.3.1), purrr(v.1.0.2), polyclip(v.1.10-6), ggplot2(v.3.5.0), ggforce(v.0.4.2), broom(v.1.0.5), xtable(v.1.8-4), tidytree(v.0.4.6), fANCOVA(v.0.6-1), later(v.1.3.2), viridisLite(v.0.4.2), tibble(v.3.2.1), lmerTest(v.3.1-3), clusterProfiler(v.4.10.1), aplot(v.0.2.2), memoise(v.2.0.1), AnnotationDbi(v.1.64.1), sva(v.3.50.0) and GSEABase(v.1.64.0)

message("This is hpgltools commit: ", get_git_commit())
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 350d6b6a9d0cd6b7bc08660fd18ec46866ad4bd3
## This is hpgltools commit: Mon Dec 30 10:41:22 2024 -0500: 350d6b6a9d0cd6b7bc08660fd18ec46866ad4bd3
message("Saving to ", savefile)
## Saving to 03differential_expression_both.rda.xz
# tmp <- sm(saveme(filename = savefile))
tmp <- loadme(filename = savefile)
---
title: "TMRC3 `r Sys.getenv('VERSION')`: Differential Expression analyses"
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
runtime: shiny
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: zenburn
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
---

<style type="text/css">
body .main-container {
  max-width: 1600px;
}
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
  font-size: 16px
}
</style>

```{r options, include=FALSE}
library(hpgltools)
library(dplyr)
library(forcats)
library(glue)
library(DOSE)

knitr::opts_knit$set(progress = TRUE, verbose = TRUE, width = 90, echo = TRUE)
knitr::opts_chunk$set(
  error = TRUE, fig.width = 8, fig.height = 8, fig.retina = 2,
  out.width = "100%", dev = "png",
  dev.args = list(png = list(type = "cairo-png")))
old_options <- options(digits = 4, stringsAsFactors = FALSE, knitr.duplicate.label = "allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size = 12))
ver <- Sys.getenv("VERSION")
parallel <- toupper(Sys.getenv("PARALLEL"))
if (parallel == "" || parallel == "TRUE") {
  parallel <- TRUE
} else {
  parallel <- FALSE
}
rundate <- format(Sys.Date(), format = "%Y%m%d")

rmd_file <- glue("03differential_expression_both.Rmd")
savefile <- gsub(pattern = "\\.Rmd", replace = "\\.rda\\.xz", x = rmd_file)
loaded <- load(file = glue("rda/tmrc3_data_structures-v{ver}.rda"))
xlsx_prefix <- "analyses/3_cali_and_tumaco"
clinic_prefix <- glue("{xlsx_prefix}/DE_Clinic")
clinic_cf_prefix <- glue("{xlsx_prefix}/DE_Clinic_Cure_Fail")
cf_prefix <- glue("{xlsx_prefix}/DE_Cure_Fail")
visit_prefix <- glue("{xlsx_prefix}/DE_Visits")
sex_prefix <- glue("{xlsx_prefix}/Sex")
eth_prefix <- glue("{xlsx_prefix}/Ethnicity")
gsea_prefix <- glue("{xlsx_prefix}/GSEA")
```

# Changelog

* 202309: Disabled GSVA analyses until/unless we get permission to
  include the mSigDB 7.5.1 release.  I will simplify the filenames so
  that one may easily drop in a downloaded copy of the data and run
  those blocks.  Until then, I guess you will have to trust me when I
  say those blocks all work?
* 202309: Moved all GSEA analyses to 04lrt_gsea_gsva.Rmd
* 202309 next day: Moving GSEA back because it adds too much
  complexity to save/reload the DE results for gProfiler and friends.
* Still hunting for messed up colors, changed input data to match new version.

# Introduction

The various differential expression analyses of the data generated in tmrc3_datasets
will occur in this document.

## Naming conventions

I am going to try to standardize how I name the various data
structures created in this document.  Most of the large data created
are either sets of differential expression analyses, their combined
results, or the set of results deemed 'significant'.

Hopefully by now they all follow these guidelines:

{clinic(s)}_sample-subset}_{primary-question(s)}_{datatype}_{batch-method}

* {clinic}: This is either tc or t for Tumaco and Cali, or just
Tumaco.
* {sample-subset}: Things like 'all' or 'monocytes'.
* {primary-question}: Shorthand name for the primary contrasts
performed, thus 'clinics' would suggest a comparison of Tumaco
vs. Cali.  'visits' would compare v2/v1, etc.
* {datatype}: de, table, sig
* {batch-type}: nobatch, batch{factor}, sva.  {factor} in this
instance should be a column from the metadata.

With this in mind, 'tc_biopsies_clinic_de_sva' should be the Tumaco+Cali
biopsy data after performing the differential expression analyses
comparing the clinics using sva.

I suspect there remain some exceptions and/or errors.

## Define contrasts for DE analyses

Each of the following lists describes the set of contrasts that I
think are interesting for the various ways one might consider the
TMRC3 dataset.  The variables are named according to the assumed data
with which they will be used, thus tc_cf_contrasts is expected to be
used for the Tumaco+Cali data and provide a series of cure/fail
comparisons which (to the extent possible) across both locations.  In
every case, the name of the list element will be used as the contrast
name, and will thus be seen as the sheet name in the output xlsx
file(s); the two pieces of the character vector value are the
numerator and denominator of the associated contrast.

## GSEA

The GSEA analyses will follow each DE analysis during this document.

Most (all?) of the GSEA analyses used in this paper were done via
gProfiler rather than goseq/clusterProfiler/topGO/GOstats.  Primarily
because it is so easy to invoke gprofiler.

```{r}
clinic_contrasts <- list(
  "clinics" = c("cali", "tumaco"))
## In some cases we have no Cali failure samples, so there remain only 2
## contrasts that are likely of interest
tc_cf_contrasts <- list(
  "tumaco" = c("tumaco_failure", "tumaco_cure"),
  "cure" = c("tumaco_cure", "cali_cure"))
## In other cases, we have cure/fail for both places.
clinic_cf_contrasts <- list(
  "cali" = c("cali_failure", "cali_cure"),
  "tumaco" = c("tumaco_failure", "tumaco_cure"),
  "cure" = c("tumaco_cure", "cali_cure"),
  "fail" = c("tumaco_failure", "cali_failure"))
cf_contrast <- list(
  "outcome" = c("tumaco_failure", "tumaco_cure"))
t_cf_contrast <- list(
  "outcome" = c("failure", "cure"))
visitcf_contrasts <- list(
  "v1cf" = c("v1_failure", "v1_cure"),
  "v2cf" = c("v2_failure", "v2_cure"),
  "v3cf" = c("v3_failure", "v3_cure"))
visit_contrasts <- list(
  "v2v1" = c("c2", "c1"),
  "v3v1" = c("c3", "c1"),
  "v3v2" = c("c3", "c2"))
visit_v1later <- list(
  "later_vs_first" = c("later", "first"))
celltypes <- list(
  "eo_mono" = c("eosinophils", "monocytes"),
  "ne_mono" = c("neutrophils", "monocytes"),
  "eo_ne" = c("eosinophils", "neutrophils"))
ethnicity_contrasts <- list(
  "mestizo_indigenous" = c("mestiza", "indigena"),
  "mestizo_afrocol" = c("mestiza", "afrocol"),
  "indigenous_afrocol" = c("indigena", "afrocol"))
```

# Compare samples by clinic

## DE: Compare clinics, all samples

Perform a svaseq-guided comparison of the two clinics.  Ideally this
will give some clue about just how strong the clinic-based batch
effect really is and what its causes are.

```{r}
tc_clinic_type <- tc_valid %>%
  set_expt_conditions(fact = "clinic") %>%
  set_expt_batches(fact = "typeofcells")

table(pData(tc_clinic_type)[["condition"]])
tc_all_clinic_de_sva <- all_pairwise(tc_clinic_type, model_batch = "svaseq",
                                     filter = TRUE, methods = methods)
tc_all_clinic_de_sva
tc_all_clinic_de_sva[["deseq"]][["contrasts_performed"]]

tc_all_clinic_table_sva <- combine_de_tables(
  tc_all_clinic_de_sva, keepers = clinic_contrasts,
  excel = glue("{clinic_prefix}/tc_all_clinic_table_sva-v{ver}.xlsx"))
tc_all_clinic_table_sva
tc_all_clinic_sig_sva <- extract_significant_genes(
  tc_all_clinic_table_sva,
  excel = glue("{clinic_prefix}/compare_clinics/tc_clinic_type_sig_sva-v{ver}.xlsx"))
tc_all_clinic_sig_sva
```

### GSEA: comparing the clinics

```{r}
increased_tumaco_categories_up <- simple_gprofiler(
  tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]],
  excel = glue("{gsea_prefix}/tumaco_cateogies_up-v{ver}.xlsx"))
increased_tumaco_categories_up
increased_tumaco_categories_up[["pvalue_plots"]][["BP"]]

increased_cali_categories <- simple_gprofiler(
  tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]],
  excel = glue("{gsea_prefix}/cali_cateogies_up-v{ver}.xlsx"))
increased_cali_categories
increased_cali_categories[["pvalue_plots"]][["BP"]]
```

### Visualize clinic differences

Let us take a quick look at the results of the comparison of
Tumaco/Cali

Note: I keep re-introducing an error which causes these (volcano and MA) plots to be
reversed with respect to the logFC values.  Pay careful attention to
these and make sure that they agree with the numbers of genes observed
in the contrast.

```{r}
## Check that up is up
summary(tc_all_clinic_table_sva[["data"]][["clinics"]][["deseq_logfc"]])
## I think we can assume that most genes are down when considering Tumaco/Cali.
sum(tc_all_clinic_table_sva$data$clinics$deseq_logfc < -1.0 &
      tc_all_clinic_table_sva$data$clinics$deseq_adjp < 0.05)
tc_all_clinic_table_sva[["plots"]][["clinics"]][["deseq_vol_plots"]]
## Ok, so it says 1794 up, but that is clearly the down side...  Something is definitely messed up.
## The points are on the correct sides of the plot, but the categories of up/down are reversed.
## Theresa noted that she colors differently, and I think better: left side gets called
## 'increased in denominator', right side gets called 'increased in numerator';
## these two groups are colored according to their condition colors, and everything else is gray.
## I am checking out Theresa's helper_functions.R to get a sense of how she handles this, I think
## I can use a variant of her idea pretty easily:
##  1.  Add a column 'Significance', which is a factor, and contains either 'Not enriched',
##      'Enriched in x', or 'Enriched in y' according to the logfc/adjp.
##  2.  use the significance column for the geom_point color/fill in the volcano plot.
## My change to this idea would be to extract the colors from the input expressionset.
```

There appear to be many more genes which are increased in the Tumaco
samples with respect to the Cali samples.

## DE: Compare clinics, eosinophil samples

The remaining cell types all have pretty strong clinic-based variance;
but I am not certain if it is consistent across cell types.

```{r}
table(pData(tc_eosinophils)[["condition"]])
tc_eosinophils_clinic_de_nobatch <- all_pairwise(tc_eosinophils, parallel = parallel,
                                                 model_batch = FALSE, filter = TRUE,
                                                 methods = methods)
tc_eosinophils_clinic_de_nobatch
tc_eosinophils_clinic_de_nobatch[["deseq"]][["contrasts_performed"]]

tc_eosinophils_clinic_table_nobatch <- combine_de_tables(
  tc_eosinophils_clinic_de_nobatch, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_table_nobatch
tc_eosinophils_clinic_sig_nobatch <- extract_significant_genes(
  tc_eosinophils_clinic_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_nobatch-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_nobatch

tc_eosinophils_clinic_de_sva <- all_pairwise(tc_eosinophils, model_batch = "svaseq",
                                             filter = TRUE, methods = methods)
tc_eosinophils_clinic_de_sva
tc_eosinophils_clinic_de_sva[["deseq"]][["contrasts_performed"]]

tc_eosinophils_clinic_table_sva <- combine_de_tables(
  tc_eosinophils_clinic_de_sva, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_table_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_table_sva
tc_eosinophils_clinic_sig_sva <- extract_significant_genes(
  tc_eosinophils_clinic_table_sva,
  excel = glue("{clinic_cf_prefix}/Eosinophils/tc_eosinophils_clinic_sig_sva-v{ver}.xlsx"))
tc_eosinophils_clinic_sig_sva
```

## DE: Compare clinics, biopsy samples

Interestingly to me, the biopsy samples appear to have the least
location-based variance.  But we can perform an explicit DE and see
how well that hypothesis holds up.

Note that these data include cure and fail samples for

```{r}
table(pData(tc_biopsies)[["condition"]])
tc_biopsies_clinic_de_sva <- all_pairwise(tc_biopsies, parallel = parallel,
                                          model_batch = "svaseq", filter = TRUE,
                                          methods = methods)
tc_biopsies_clinic_de_sva
tc_biopsies_clinic_de_sva[["deseq"]][["contrasts_performed"]]

tc_biopsies_clinic_table_sva <- combine_de_tables(
  tc_biopsies_clinic_de_sva, keepers = tc_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_table_sva-v{ver}.xlsx"))
tc_biopsies_clinic_table_sva
tc_biopsies_clinic_sig_sva <- extract_significant_genes(
  tc_biopsies_clinic_table_sva,
  excel = glue("{clinic_cf_prefix}/Biopsies/tc_biopsies_clinic_sig_sva-v{ver}.xlsx"))
tc_biopsies_clinic_sig_sva
```

## DE: Compare clinics, monocyte samples

At least for the moment, I am only looking at the differences between
no-batch vs. sva across clinics for the monocyte samples.  This
was chosen mostly arbitrarily.

### DE: Compare clinics, monocytes without batch estimation

Our baseline is the comparison of the monocytes samples without batch
in the model or surrogate estimation.  In theory at least, this should
correspond to the PCA plot above when no batch estimation was performed.

```{r}
table(pData(tc_monocytes)[["condition"]])
tc_monocytes_de_nobatch <- all_pairwise(tc_monocytes, model_batch = FALSE,
                                        filter = TRUE,
                                        methods = methods)
tc_monocytes_de_nobatch

tc_monocytes_table_nobatch <- combine_de_tables(
  tc_monocytes_de_nobatch, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_nobatch-v{ver}.xlsx"))
tc_monocytes_table_nobatch
tc_monocytes_sig_nobatch <- extract_significant_genes(
  tc_monocytes_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_nobatch-v{ver}.xlsx"))
tc_monocytes_sig_nobatch
```

### DE: Compare clinics, monocytes with svaseq

In contrast, the following comparison should give a view of the data
corresponding to the svaseq PCA plot above.  In the best case
scenario, we should therefore be able to see some significane
differences between the Tumaco cure and fail samples.

```{r}
tc_monocytes_de_sva <- all_pairwise(tc_monocytes, model_batch = "svaseq",
                                    filter = TRUE,
                                    methods = methods)
tc_monocytes_de_sva

tc_monocytes_table_sva <- combine_de_tables(
  tc_monocytes_de_sva, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_table_sva-v{ver}.xlsx"))
tc_monocytes_table_sva
tc_monocytes_sig_sva <- extract_significant_genes(
  tc_monocytes_table_sva,
  excel = glue("{clinic_cf_prefix}/Monocytes/tc_monocytes_clinic_sig_sva-v{ver}.xlsx"))
tc_monocytes_sig_sva
```

### DE Compare: How similar are the no-batch vs. SVA results?

The following block shows that these two results are exceedingly
different, sugesting that the Cali cure/fail and Tumaco cure/fail
cannot easily be considered in the same analysis.  I did some playing
around with my calculate_aucc function in this block and found that it
is in some important way broken, at least if one expands the top-n
genes to more than 20% of the number of genes in the data.

```{r}
cali_table <- tc_monocytes_table_nobatch[["data"]][["cali"]]
table <- tc_monocytes_table_nobatch[["data"]][["tumaco"]]

cali_aucc <- calculate_aucc(cali_table, table, px = "deseq_adjp", py = "deseq_adjp",
                            lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc

cali_table_sva <- tc_monocytes_table_sva[["data"]][["cali"]]
tumaco_table_sva <- tc_monocytes_table_sva[["data"]][["tumaco"]]
cali_aucc_sva <- calculate_aucc(cali_table_sva, tumaco_table_sva, px = "deseq_adjp",
                                py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_sva
```

## DE: Compare clinics, neutrophil samples

```{r}
tc_neutrophils_de_nobatch <- all_pairwise(tc_neutrophils, parallel = parallel,
                                          model_batch = FALSE, filter = TRUE,
                                          methods = methods)
tc_neutrophils_de_nobatch

tc_neutrophils_table_nobatch <- combine_de_tables(
  tc_neutrophils_de_nobatch, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_nobatch-v{ver}.xlsx"))
tc_neutrophils_table_nobatch
tc_neutrophils_sig_nobatch <- extract_significant_genes(
  tc_neutrophils_table_nobatch,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_nobatch-v{ver}.xlsx"))
tc_neutrophils_sig_nobatch

tc_neutrophils_de_sva <- all_pairwise(tc_neutrophils, parallel = parallel,
                                      model_batch = "svaseq", filter = TRUE,
                                      methods = methods)
tc_neutrophils_de_sva

tc_neutrophils_table_sva <- combine_de_tables(
  tc_neutrophils_de_sva, keepers = clinic_cf_contrasts,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_table_sva-v{ver}.xlsx"))
tc_neutrophils_table_sva
tc_neutrophils_sig_sva <- extract_significant_genes(
  tc_neutrophils_table_sva,
  excel = glue("{clinic_cf_prefix}/Neutrophils/tc_neutrophils_sig_sva-v{ver}.xlsx"))
tc_neutrophils_sig_sva
```

## GSEA: Extract clinic-specific genes

Given the above comparisons, we can extract some gene sets which
resulted from those DE analyses and eventually perform some
ontology/KEGG/reactome/etc searches.  This reminds me, I want to make
my extract_significant_ functions to return gene-set data structures
and my various ontology searches to take them as inputs.  This should
help avoid potential errors when extracting up/down genes.

```{r}
clinic_sigenes_up <- rownames(tc_all_clinic_sig_sva[["deseq"]][["ups"]][["clinics"]])
clinic_sigenes_down <- rownames(tc_all_clinic_sig_sva[["deseq"]][["downs"]][["clinics"]])
clinic_sigenes <- c(clinic_sigenes_up, clinic_sigenes_down)

tc_eosinophils_sigenes_up <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_eosinophils_sigenes_down <- rownames(tc_eosinophils_clinic_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_monocytes_sigenes_up <- rownames(tc_monocytes_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_monocytes_sigenes_down <- rownames(tc_monocytes_sig_sva[["deseq"]][["downs"]][["cure"]])
tc_neutrophils_sigenes_up <- rownames(tc_neutrophils_sig_sva[["deseq"]][["ups"]][["cure"]])
tc_neutrophils_sigenes_down <- rownames(tc_neutrophils_sig_sva[["deseq"]][["downs"]][["cure"]])

tc_eosinophils_sigenes <- c(tc_eosinophils_sigenes_up,
                            tc_eosinophils_sigenes_down)
tc_monocytes_sigenes <- c(tc_monocytes_sigenes_up,
                          tc_monocytes_sigenes_down)
tc_neutrophils_sigenes <- c(tc_neutrophils_sigenes_up,
                            tc_neutrophils_sigenes_down)
```

## GSEA: gProfiler of genes deemed up/down when comparing Cali and Tumaco

I was curious to try to understand why the two clinics appear to be so
different vis a vis their PCA/DE; so I thought that gProfiler might
help boil those results down to something more digestible.

### GSEA: Compare clinics, all samples

Note that in the following block I used the function
simple_gprofiler(), but later in this document I will use
all_gprofiler().  The first invocation limits the search to a single
table, while the second will iterate over every result in a pairwise
differential expression analysis.

In this instance, we are looking at the vector of gene IDs deemed
significantly different between the two clinics in either the up or
down direction.

One other thing worth noting, the new version of gProfiler provides
some fun interactive plots.  I will add an example here.

```{r}
tc_eosinophil_gprofiler <- simple_gprofiler(
  tc_eosinophils_sigenes_up,
  excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophil_gprofiler

clinic_gp <- simple_gprofiler(
  clinic_sigenes,
  excel = glue("{gsea_prefix}/both_clinics_cali_up-v{ver}.xlsx"))
clinic_gp$pvalue_plots$REAC
clinic_gp$pvalue_plots$BP
clinic_gp$pvalue_plots$TF
clinic_gp$interactive_plots$GO
```

### GSEA: Compare clinics, Eosinophil samples

In the following block, I am looking at the gProfiler over represented
groups observed across clinics in only the Eosinophils.  First I do so
for all genes(up or down), followed by only the up and down groups.
Each of the following will include only the Reactome and GO:BP plots.
These searches did not have too many other hits, excepting the
transcription factor database.

```{r}
tc_eosinophils_gp <- simple_gprofiler(
  tc_eosinophils_sigenes,
  excel = glue("{gsea_prefix}/eosinophil_clinics-v{ver}.xlsx"))
tc_eosinophils_gp
tc_eosinophils_gp$pvalue_plots$REAC
tc_eosinophils_gp$pvalue_plots$BP

tc_eosinophils_up_gp <- simple_gprofiler(
  tc_eosinophils_sigenes_up,
  excel = glue("{gsea_prefix}/eosinophil_clinics_tumaco_up-v{ver}.xlsx"))
tc_eosinophils_up_gp
tc_eosinophils_up_gp$pvalue_plots$REAC

tc_eosinophils_down_gp <- simple_gprofiler(
  tc_eosinophils_sigenes_down,
  excel = glue("{gsea_prefix}/eosinophil_clinics_cali_up-v{ver}.xlsx"))
tc_eosinophils_down_gp
tc_eosinophils_down_gp$pvalue_plots$REAC
```

### GSEA: Compare clinics, Monocyte samples

In the following block I repeated the above query, but this time
looking at the monocyte samples.

```{r}
tc_monocytes_up_gp <- simple_gprofiler(
  tc_monocytes_sigenes,
  excel = glue("{gsea_prefix}/monocyte_clinics-v{ver}.xlsx"))
tc_monocytes_up_gp
tc_monocytes_up_gp$pvalue_plots$REAC
tc_monocytes_up_gp$pvalue_plots$BP

tc_monocytes_down_gp <- simple_gprofiler(
  tc_monocytes_sigenes_down,
  excel = glue("{gsea_prefix}/monocyte_clinics_cali_up-v{ver}.xlsx"))
tc_monocytes_down_gp$pvalue_plots$REAC
tc_monocytes_down_gp$pvalue_plots$BP
```

#### GSEA: Compare clinics, Neutrophil samples

Ibid.  This time looking at the Neutrophils.  Thus the first two
images should be a superset of the second and third pairs of images;
assuming that the genes in the up/down list do not cause the groups to
no longer be significant.  Interestingly, the reactome search did not
return any hits for the increased search.

```{r}
tc_neutrophils_gp <- simple_gprofiler(
  tc_neutrophils_sigenes,
  excel = glue("{gsea_prefix}/neutrophil_clinics-v{ver}.xlsx"))
## tc_neutrophils_gp$pvalue_plots$REAC ## no hits
tc_neutrophils_gp$pvalue_plots$BP
tc_neutrophils_gp$pvalue_plots$TF

tc_neutrophils_up_gp <- simple_gprofiler(
  tc_neutrophils_sigenes_up,
  excel = glue("{gsea_prefix}/neutrophil_clinics_tumaco_up-v{ver}.xlsx"))
## tc_neutrophils_up_gp$pvalue_plots$REAC ## No hits
tc_neutrophils_up_gp$pvalue_plots$BP

tc_neutrophils_down_gp <- simple_gprofiler(
  tc_neutrophils_sigenes_down,
  excel = glue("{gsea_prefix}/neutrophil_clinics_cali_up-v{ver}.xlsx"))
tc_neutrophils_down_gp$pvalue_plots$REAC
tc_neutrophils_down_gp$pvalue_plots$BP
```

# Compare DE: How similar are Tumaco C/F vs. Cali C/F

The following expands the cross-clinic query above to also test the
neutrophils.  Once again, I think it will pretty strongly support the
hypothesis that the two clinics are not compatible.

We are concerned that the clinic-based batch effect may make our
results essentially useless.  One way to test this concern is to
compare the set of genes observed different between the Cali Cure/Fail
vs. the Tumaco Cure/Fail.

```{r}
cali_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["cali"]]
tumaco_table_nobatch <- tc_neutrophils_table_nobatch[["data"]][["tumaco"]]

cali_merged_nobatch <- merge(cali_table_nobatch, tumaco_table_nobatch, by="row.names")
cor.test(cali_merged_nobatch[, "deseq_logfc.x"], cali_merged_nobatch[, "deseq_logfc.y"])
cali_aucc_nobatch <- calculate_aucc(cali_table_nobatch, tumaco_table_nobatch, px = "deseq_adjp",
                                    py = "deseq_adjp", lx = "deseq_logfc", ly = "deseq_logfc")
cali_aucc_nobatch$plot
```

# Tumaco and Cali, cure vs. fail

In all of the above, we are looking to understand the differences between the two location.
Let us now step back and perform the original question: fail/cure without regard to location.

I performed this query with a few different parameters, notably with(out)
sva and again using each cell type, including biopsies. The main
reasion I am keeping these comparisons is in the relatively weak hope
that there will be sufficient signal in the full dataset that it might
be able to overcome the apparently ridiculous batch effect from the
two clinics.

## All cell types together, with(out) SVA

```{r}
table(pData(tc_valid)[["condition"]])
tc_all_cf_de_sva <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
                                 model_batch = "svaseq")
tc_all_cf_table_sva <- combine_de_tables(
  tc_all_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_sva-v{ver}.xlsx"))
tc_all_cf_sig_sva <- extract_significant_genes(
  tc_all_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_sva-v{ver}.xlsx"))

tc_all_cf_de_batch <- all_pairwise(tc_valid, filter = TRUE, methods = methods,
                                   model_batch = TRUE)
tc_all_cf_table_batch <- combine_de_tables(
  tc_all_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_table_batch-v{ver}.xlsx"))
tc_all_cf_sig_batch <- extract_significant_genes(
  tc_all_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_valid_cf_sig_batch-v{ver}.xlsx"))
```

## All cell types except biopsies

I am not sure if this is the best choice, but I call the set of all
samples excluding biopsies 'clinical'.

```{r}
table(pData(tc_clinical_nobiop)[["condition"]])
tc_clinical_cf_de_sva <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
                                      model_batch = "svaseq",
                                      methods = methods)
tc_clinical_cf_de_sva

tc_clinical_cf_table_sva <- combine_de_tables(
  tc_clinical_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_sva-v{ver}.xlsx"))
tc_clinical_cf_table_sva

tc_clinical_cf_sig_sva <- extract_significant_genes(
  tc_clinical_cf_table_sva, according_to = "deseq",
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_sva-v{ver}.xlsx"))
tc_clinical_cf_sig_sva

tc_clinical_cf_de_batch <- all_pairwise(tc_clinical_nobiop, filter = TRUE,
                                        model_batch = TRUE,
                                        methods = methods)
tc_clinical_cf_de_batch

tc_clinical_cf_table_batch <- combine_de_tables(
  tc_clinical_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_table_batch-v{ver}.xlsx"))
tc_clinical_cf_table_batch

tc_clinical_cf_sig_batch <- extract_significant_genes(
  tc_clinical_cf_table_batch, according_to = "deseq",
  excel = glue("{cf_prefix}/Clinical_Samples/tc_clinical_cf_sig_batch-v{ver}.xlsx"))
tc_clinical_cf_sig_batch
```

### A portion of Supplemental Figure 11.

```{r}
num_color <- color_choices[["cf"]][["cure"]]
den_color <- color_choices[["cf"]][["failure"]]
tc_clinical_cf_table <- tc_clinical_cf_table_sva[["data"]][["outcome"]]
tc_clinical_cf_volcano_top10 <- plot_volcano_condition_de(
  tc_clinical_cf_table, "outcome", label = 10,
  fc_col = "deseq_logfc", p_col = "deseq_adjp", line_position = NULL,
  color_high = num_color, color_low = den_color, label_size = 6)
pp(file = "figures/s11c_tc_clinical_cf_volcano_labeled_top10.svg")
tc_clinical_cf_volcano_top10[["plot"]]
dev.off()
tc_clinical_cf_volcano_top10[["plot"]]
```

## Biopsies, with(out) SVA

In the following block, we repeat the same question, but using only
the biopsy samples from both clinics.

```{r}
tc_biopsies_cf <- set_expt_conditions(tc_biopsies, fact = "finaloutcome")
tc_biopsies_cf_de_sva <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
                                      model_batch = "svaseq")
tc_biopsies_cf_table_sva <- combine_de_tables(
  tc_biopsies_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Biopsies/tc_biopsies_cf_table_sva-v{ver}.xlsx"))
tc_biopsies_cf_sig_sva <- extract_significant_genes(
  tc_biopsies_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_sva-v{ver}.xlsx"))

tc_biopsies_cf_de_batch <- all_pairwise(tc_biopsies_cf, filter = TRUE, methods = methods,
                                        model_batch = TRUE)
tc_biopsies_cf_table_batch <- combine_de_tables(
  tc_biopsies_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_table_batch-v{ver}.xlsx"))
tc_biopsies_cf_sig_batch <- extract_significant_genes(
  tc_biopsies_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_biopsies_cf_sig_batch-v{ver}.xlsx"))
```

## Eosinophils, with(out) SVA

In the following block, we repeat the same question, but using only
the Eosinophil samples from both clinics.

```{r}
tc_eosinophils_cf <- set_expt_conditions(tc_eosinophils, fact = "finaloutcome")
tc_eosinophils_cf_de_sva <- all_pairwise(tc_eosinophils_cf, filter = TRUE, methods = methods,
                                         model_batch = "svaseq")
tc_eosinophils_cf_table_sva <- combine_de_tables(
  tc_eosinophils_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Eosinophils/tc_eosinophils_cf_table_sva-v{ver}.xlsx"))
tc_eosinophils_cf_sig_sva <- extract_significant_genes(
  tc_eosinophils_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_sva-v{ver}.xlsx"))

tc_eosinophils_cf_de_batch <- all_pairwise(tc_eosinophils_cf, filter = TRUE,
                                           model_batch = TRUE,
                                           methods = methods)
tc_eosinophils_cf_table_batch <- combine_de_tables(
  tc_eosinophils_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_table_batch-v{ver}.xlsx"))
tc_eosinophils_cf_sig_batch <- extract_significant_genes(
  tc_eosinophils_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_eosinophils_cf_sig_batch-v{ver}.xlsx"))
```

## Monocytes, with(out) SVA

Repeat yet again, this time with the monocyte samples.  The idea is to
see if there is a cell type which is particularly good (or bad) at
discriminating the two clinics.

```{r}
tc_monocytes_cf <- set_expt_conditions(tc_monocytes, fact = "finaloutcome")
tc_monocytes_cf_de_sva <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
                                       model_batch = "svaseq")
tc_monocytes_cf_table_sva <- combine_de_tables(
  tc_monocytes_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Monocytes/tc_monocytes_cf_table_sva-v{ver}.xlsx"))
tc_monocytes_cf_sig_sva <- extract_significant_genes(
  tc_monocytes_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_sva-v{ver}.xlsx"))

tc_monocytes_cf_de_batch <- all_pairwise(tc_monocytes_cf, filter = TRUE, methods = methods,
                                         model_batch = TRUE)
tc_monocytes_cf_table_batch <- combine_de_tables(
  tc_monocytes_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_table_batch-v{ver}.xlsx"))
tc_monocytes_cf_sig_batch <- extract_significant_genes(
  tc_monocytes_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_monocytes_cf_sig_batch-v{ver}.xlsx"))
```

## Neutrophils, with(out) SVA

Last try, this time using the Neutrophil samples.

```{r}
tc_neutrophils_cf <- set_expt_conditions(tc_neutrophils, fact = "finaloutcome")
tc_neutrophils_cf_de_sva <- all_pairwise(tc_neutrophils_cf, parallel = parallel,
                                         filter = TRUE, model_batch = "svaseq",
                                         methods = methods)
tc_neutrophils_cf_table_sva <- combine_de_tables(
  tc_neutrophils_cf_de_sva, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/Neutrophils/tc_neutrophils_cf_table_sva-v{ver}.xlsx"))
tc_neutrophils_cf_sig_sva <- extract_significant_genes(
  tc_neutrophils_cf_table_sva,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_sva-v{ver}.xlsx"))

tc_neutrophils_cf_de_batch <- all_pairwise(tc_neutrophils_cf, filter = TRUE,
                                           model_batch = TRUE,
                                           methods = methods)
tc_neutrophils_cf_table_batch <- combine_de_tables(
  tc_neutrophils_cf_de_batch, keepers = t_cf_contrast,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_table_batch-v{ver}.xlsx"))
tc_neutrophils_cf_sig_batch <- extract_significant_genes(
  tc_neutrophils_cf_table_batch,
  excel = glue("{cf_prefix}/All_Samples/tc_neutrophils_cf_sig_batch-v{ver}.xlsx"))
```

# Visit comparisons

Later in this document I do a bunch of visit/cf comparisons.  In this
block I want to explicitly only compare v1 to other visits.  This is
something I did quite a lot in the 2019 datasets, but never actually
moved to this document.

```{r}
v1_vs_later <- all_pairwise(tc_v1vs, model_batch = "svaseq", methods = methods,
                            filter = TRUE)
v1_vs_later_table <- combine_de_tables(
  v1_vs_later, keepers = visit_v1later,
  excel = glue("{visit_prefix}/v1_vs_later_tables-v{ver}.xlsx"))
v1_vs_later_sig <- extract_significant_genes(
  v1_vs_later_table,
  excel = glue("{visit_prefix}/v1_vs_later_sig-v{ver}.xlsx"))
```

#### GSEA: V1 vs other visits.

```{r}
v1later_gp <- all_gprofiler(v1_vs_later_sig)
v1later_gp[[1]]$pvalue_plots$REAC
v1later_gp[[2]]$pvalue_plots$REAC
```

# Sex comparison

```{r}
tc_sex_de <- all_pairwise(tc_sex, model_batch = "svaseq", methods = methods,
                          filter = TRUE)
tc_sex_table <- combine_de_tables(
  tc_sex_de, excel = glue("{sex_prefix}/tc_sex_table-v{ver}.xlsx"))
tc_sex_sig <- extract_significant_genes(
  tc_sex_table, excel = glue("{sex_prefix}/tc_sex_sig-v{ver}.xlsx"))
tc_sex_gp <- all_gprofiler(tc_sex_sig)
```

```{r}
tc_sex_cure <- subset_expt(tc_sex, subset = "finaloutcome=='cure'")
tc_sex_cure_de <- all_pairwise(tc_sex_cure, model_batch = "svaseq",
                               filter = TRUE,
                               methods = methods)
tc_sex_cure_de
tc_sex_cure_table <- combine_de_tables(
  tc_sex_cure_de, excel = glue("{sex_prefix}/tc_sex_cure_table-v{ver}.xlsx"))
tc_sex_cure_table
tc_sex_cure_sig <- extract_significant_genes(
  tc_sex_cure_table, excel = glue("{sex_prefix}/tc_sex_cure_sig-v{ver}.xlsx"))
tc_sex_cure_sig
```

#### GSEA: Sex comparisons both clinics

```{r}
tc_sex_cure_gp <- all_gprofiler(tc_sex_cure_sig)
tc_sex_cure_gp
tc_sex_cure_gp[[1]][["pvalue_plots"]][["BP"]]
tc_sex_cure_gp[[2]][["pvalue_plots"]][["BP"]]
```

# Ethnicity comparisons

```{r}
tc_ethnicity_de <- all_pairwise(tc_etnia_expt, model_batch = "svaseq",
                                filter = TRUE,
                                methods = methods)
tc_ethnicity_de
tc_ethnicity_table <- combine_de_tables(
  tc_ethnicity_de, keepers = ethnicity_contrasts,
  excel = glue("{eth_prefix}/tc_ethnicity_table-v{ver}.xlsx"))
tc_ethnicity_table
tc_ethnicity_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]
tc_ethnicity_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]
tc_ethnicity_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]

tc_ethnicity_sig <- extract_significant_genes(
  tc_ethnicity_table, excel = glue("{eth_prefix}/tc_ethnicity_sig-v{ver}.xlsx"))

ethnicity_cure <- subset_expt(tc_etnia_expt, subset = "finaloutcome=='cure'")
ethnicity_cure_de <- all_pairwise(ethnicity_cure, model_batch = "svaseq",
                                  filter = TRUE,
                                  methods = methods)
ethnicity_cure_table <- combine_de_tables(
  ethnicity_cure_de, keepers = ethnicity_contrasts,
  excel = glue("{eth_prefix}/ethnicity_cure_table-v{ver}.xlsx"))
ethnicity_cure_table
ethnicity_cure_table[["plots"]][["mestizo_indigenous"]][["deseq_ma_plots"]]
ethnicity_cure_table[["plots"]][["mestizo_afrocol"]][["deseq_ma_plots"]]
ethnicity_cure_table[["plots"]][["indigenous_afrocol"]][["deseq_ma_plots"]]
ethnicity_cure_sig <- extract_significant_genes(
  ethnicity_cure_table, excel = glue("{eth_prefix}/ethnicity_cure_sig-v{ver}.xlsx"))
ethnicity_cure_sig
```

#### GSEA: Ethnicity differences

Performed once with both clinics and again with only Tumaco.

```{r}
tc_ethnicity_gp <- all_gprofiler(tc_ethnicity_sig)
```

```{r}
pander::pander(sessionInfo())
message("This is hpgltools commit: ", get_git_commit())
message("Saving to ", savefile)
# tmp <- sm(saveme(filename = savefile))
```

```{r loadme_after, eval=FALSE}
tmp <- loadme(filename = savefile)
```
