1 Changelog

  • 202401-202405: Cleanups, formatting, ensuring that everything works in the container.
  • 202310: Cleaning up to make everything pass within a containerized environment.
  • 202310: Received a set of colors and contrasts of interest for a barplot of significance.
  • 20230410: Making some changes to improve the differential expression plots as well as prepare for some different pathway/GSEA/GSVA analyses on the data.

2 Introduction

Having established that the TMRC2 macrophage data looks robust and illustrative of a couple of interesting questions, let us perform a couple of differential analyses of it.

Also note that as of 202212, we received a new set of samples which now include some which are a completely different cell type, U937. As their ATCC page states, they are malignant cells taken from the pleural effusion of a 37 year old white male with histiocytic lymphoma and which exhibit the morphology of monocytes. Thus, this document now includes some comparisons of the cell types as well as the various macrophage donors (given that there are now more donors too).

2.1 Human data

I am moving the dataset manipulations here so that I can look at them all together before running the various DE analyses.

2.2 Create sets focused on drug, celltype, strain, and combinations

Let us start by playing with the metadata a little and create sets with the condition set to:

  • Drug treatment
  • Cell type (macrophage or U937)
  • Donor
  • Infection Strain
  • Some useful combinations thereof

In addition, keep mental track of which datasets are comprised of all samples vs. those which are only macrophage vs. those which are only U937. (Thus, the usage of all_human vs. hs_macr vs. u937 as prefixes for the data structures.)

Ideally, these recreations of the data should perhaps be in the datastructures worksheet.

all_human <- sanitize_metadata(hs_macrophage, columns = "drug") %>%
  set_conditions(fact = "drug", colors = color_choices[["drug"]]) %>%
  set_batches(fact = "typeofcells")
## Recasting the data.frame to DataFrame.
## The numbers of samples by condition are:
## 
## antimony     none 
##       34       34
## The number of samples by batch are:
## 
## Macrophages        U937 
##          54          14
## The following 3 lines were copy/pasted to datastructures and should be removed soon.
no_strain_idx <- colData(all_human)[["strainid"]] == "none"
##colData(all_human)[["strainid"]] <- paste0("s", colData(all_human)[["strainid"]],
##                                         "_", colData(all_human)[["macrophagezymodeme"]])
colData(all_human)[no_strain_idx, "strainid"] <- "none"
table(colData(all_human)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
all_human_types <- set_conditions(all_human, fact = "typeofcells") %>%
  set_batches(fact = "drug")
## The numbers of samples by condition are:
## 
## Macrophages        U937 
##          54          14
## The number of samples by batch are:
## 
## antimony     none 
##       34       34
type_zymo_fact <- paste0(colData(all_human_types)[["condition"]], "_",
                         colData(all_human_types)[["macrophagezymodeme"]])
type_zymo <- set_conditions(all_human_types, fact = type_zymo_fact)
## The numbers of samples by condition are:
## 
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
type_drug_fact <- paste0(colData(all_human_types)[["condition"]], "_",
                         colData(all_human_types)[["drug"]])
type_drug <- set_conditions(all_human_types, fact = type_drug_fact)
## The numbers of samples by condition are:
## 
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
strain_fact <- colData(all_human_types)[["strainid"]]
table(strain_fact)
## strain_fact
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
new_conditions <- paste0(colData(hs_macrophage)[["macrophagetreatment"]], "_",
                         colData(hs_macrophage)[["macrophagezymodeme"]])
## Note the sanitize() call is redundant with the addition of sanitize() in the
## datastructures file, but I don't want to wait to rerun that.
hs_macr <- set_conditions(hs_macrophage, fact = new_conditions) %>%
  sanitize_metadata(column = "drug") %>%
  subset_se(subset = "typeofcells!='U937'") %>%
  set_se_colors(color_choices[["treatment_zymo"]])
## The numbers of samples by condition are:
## 
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            15            14            14            15             5 
## uninf_sb_none 
##             5
## Recasting the data.frame to DataFrame.

2.2.1 Separate Macrophage samples

Once again, we should reconsider where the following block is placed, but these datastructures are likely to be used in many of the following analyses.

hs_macr_drug_se <- set_conditions(hs_macr, fact = "drug", colors = color_choices[["drug"]])
## The numbers of samples by condition are:
## 
## antimony     none 
##       27       27
hs_macr_strain_se <- set_conditions(hs_macr, fact = "macrophagezymodeme",
                                      colors = color_choices[["zymo"]]) %>%
  subset_se(subset = "macrophagezymodeme != 'none'")
## The numbers of samples by condition are:
## 
## none  z22  z23 
##    8   23   23
table(colData(hs_macr)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     6     2     2     2     6     5     6     2     5     6     2     8

2.2.2 Refactor U937 samples

The U937 samples were separated in the datastructures file, but we want to use the combination of drug/zymodeme with them pretty much exclusively.

new_conditions <- paste0(colData(hs_u937)[["macrophagetreatment"]], "_",
                         colData(hs_u937)[["macrophagezymodeme"]])
u937_se <- set_conditions(hs_u937, fact = new_conditions,
                            colors = color_choices[["treatment_zymo"]])
## The numbers of samples by condition are:
## 
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##             3             3             3             3             1 
## uninf_sb_none 
##             1

2.3 Contrasts used in this document

Given the various ways we have chopped up this dataset, there are a few general types of contrasts we will perform, which will then be combined into greater complexity:

  • drug treatment: Antimonal treated or not.
  • strains used: Uninfected, z2.3, and z2.2.
  • cellltypes: U937 or macrophage.
  • donors: The person from whom the macrophages were taken.

In the end, our actual goal is to consider the variable effects of drug+strain and see if we can discern patterns which lead to better or worse drug treatment outcome.

There is a set of contrasts in which we are primarily interested in this data, these follow. I created one ratio of ratios contrast which I think has the potential to ask our biggest question.

## Each of the following lists has the name of the contrast as the key
## followed by a two element vector comprised of the numerator and
## denominator as the value.  In the case of this first contrast, that
## is comprised of a string which manually defines a series of more
## complex contrasts than the usual/simple pairwise.
tmrc2_human_extra <- "z23drugnodrug_vs_z22drugnodrug = (conditioninf_sb_z23 - conditioninf_z23) - (conditioninf_sb_z22 - conditioninf_z22), z23z22drug_vs_z23z22nodrug = (conditioninf_sb_z23 - conditioninf_sb_z22) - (conditioninf_z23 - conditioninf_z22)"
tmrc2_human_keepers <- list(
  "z23nosb_vs_uninf" = c("inf_z23", "uninf_none"),
  "z22nosb_vs_uninf" = c("inf_z22", "uninf_none"),
  "z23nosb_vs_z22nosb" = c("inf_z23", "inf_z22"),
  "z23sb_vs_z22sb" = c("inf_sb_z23", "inf_sb_z22"),
  "z23sb_vs_z23nosb" = c("inf_sb_z23", "inf_z23"),
  "z22sb_vs_z22nosb" = c("inf_sb_z22", "inf_z22"),
  "z23sb_vs_sb" = c("inf_sb_z23", "uninf_sb_none"),
  "z22sb_vs_sb" = c("inf_sb_z22", "uninf_sb_none"),
  "z23sb_vs_uninf" = c("inf_sb_z23", "uninf_none"),
  "z22sb_vs_uninf" = c("inf_sb_z22", "uninf_none"),
  "sb_vs_uninf" = c("uninf_sb_none", "uninf_none"),
  "extra_z2322" = c("z23drugnodrug", "z22drugnodrug"),
  "extra_drugnodrug" = c("z23z22drug", "z23z22nodrug"))
single_tmrc2_keeper <- list(
  "z22sb_vs_sb" = c("inf_sb_z22", "uninf_sb_none"))
tmrc2_drug_keepers <- list(
  "drug" = c("antimony", "none"))
tmrc2_type_keepers <- list(
  "type" = c("U937", "Macrophages"))
tmrc2_strain_keepers <- list(
  "strain" = c("z23", "z22"))
type_zymo_extra <- "zymos_vs_types = (conditionU937_z23 - conditionU937_z22) - (conditionMacrophages_z23 - conditionMacrophages_z22)"
tmrc2_typezymo_keepers <- list(
  "u937_macr" = c("Macrophages_none", "U937_none"),
  "zymo_macr" = c("Macrophages_z23", "Macrophages_z22"),
  "zymo_u937" = c("U937_z23", "U937_z22"),
  "z23_types" = c("U937_z23", "Macrophages_z23"),
  "z22_types" = c("U937_z22", "Macrophages_z22"),
  "zymos_types" = c("zymos_vs_types"))
tmrc2_typedrug_keepers <- list(
  "type_nodrug" = c("U937_none", "Macrophages_none"),
  "type_drug" = c("U937_antimony", "Macrophages_antimony"),
  "macr_drugs" = c("Macrophages_antimony", "Macrophages_none"),
  "u937_drugs" = c("U937_antimony", "U937_none"))
u937_keepers <- list(
  "z23nosb_vs_uninf" = c("inf_z23", "uninf_none"),
  "z22nosb_vs_uninf" = c("inf_z22", "uninf_none"),
  "z23nosb_vs_z22nosb" = c("inf_z23", "inf_z22"),
  "z23sb_vs_z22sb" = c("inf_sb_z23", "inf_sb_z22"),
  "z23sb_vs_z23nosb" = c("inf_sb_z23", "inf_z23"),
  "z22sb_vs_z22nosb" = c("inf_sb_z22", "inf_z22"),
  "z23sb_vs_sb" = c("inf_sb_z23", "uninf_sb_none"),
  "z22sb_vs_sb" = c("inf_sb_z22", "uninf_sb_none"),
  "z23sb_vs_uninf" = c("inf_sb_z23", "uninf_none"),
  "z22sb_vs_uninf" = c("inf_sb_z22", "uninf_none"),
  "sb_vs_uninf" = c("uninf_sb_none", "uninf_none"))
## If some cases, when the set of significant genes was chosen, an
## additional filter was added to exclude genes with expression values
## less than 'high_expression' according to the
## 'high_expression_column' in the table.
high_expression <- 128
high_expression_column <- "deseq_basemean"

combined_to_tsv <- function(combined, celltype = "all") {
  keepers <- combined[["keepers"]]
  for (k in seq_len(length(keepers))) {
    kname <- names(keepers)[k]
    numerator <- keepers[[k]][1]
    denominator <- keepers[[k]][2]
    filename <- glue("analyses/macrophage_de/tsv_tables/tmrc2_{celltype}_{kname}_n{numerator}_d{denominator}-v{ver}.xlsx")
    kdata <- combined[["data"]][[kname]]
    if (is.null(kdata[["basic_num"]])) {
      next
    }
    wanted <- c("hgnc_symbol", "deseq_logfc", "deseq_adjp",
                "deseq_basemean", "deseq_num", "deseq_den")
    wanted_data <- kdata[, wanted]
    colnames(wanted_data) <- c("hgnc_symbol", "deseq_logfc", "deseq_adjp",
                               "deseq_mean", "deseq_numerator", "deseq_denominator")
    write_xlsx(data = wanted_data, excel = filename)
  }
}

write_all_gp <- function(all_gp, suffix = NULL) {
  all_written <- list()
  for (g in seq_len(length(all_gp))) {
    name <- names(all_gp)[g]
    datum <- all_gp[[name]]
    filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler-v{ver}.xlsx")
    if (!is.null(suffix)) {
      filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler{suffix}-v{ver}.xlsx")
    }
    written <- sm(write_gprofiler_data(datum, excel = filename))
    all_written[[g]] <- written
  }
  return(all_written)
}

2.4 Primary queries

There is a series of initial questions which make some sense to me, but these do not necessarily match the set of questions which are most pressing. I am hoping to pull both of these sets of queries in one.

Before extracting these groups of queries, let us invoke the all_pairwise() function and get all of the likely contrasts along with one or more extras that might prove useful (the ‘extra’ argument).

The structure of these blocks will all basically be identical:

  • Perform a set of pairwise contrasts of all the conditions against each other. Optionally use sva.
  • Given that result, dump it in its entirety to an xlsx file in the analyses/ directory.
  • Given those combined tables, extract from them the set deemed ‘significant’ by whatever criteria we want to try. (Usually |lfc| >= 1.0, adjusted p <= 0.05; but potentially also expression >= x and sometimes a set of less stringent values (|lfc| >= 0.6))
  • Given one or more gene sets deemed ‘significant’ pass them to gProfiler2 and see what pops out.

2.4.1 Combined U937 and Macrophages: Compare drug effects

When we have the u937 cells in the same dataset as the macrophages, that provides an interesting opportunity to see if we can observe drug-dependant effects which are shared across both cell types.

Note to self: given the changes to hpgltools I may need to specify the statistical model string when I am using svaseq for some/many/all of these comparisons.

drug_de <- all_pairwise(all_human, filter = TRUE, model_svs = "svaseq",
                        model_fstring = "~ 0 + condition")
## antimony     none 
##       34       34
## Running normalize_se.
## Removing 9198 low-count genes (12283 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 85798 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## antimony     none 
##       34       34
## conditions
## antimony     none 
##       34       34
## conditions
## antimony     none 
##       34       34
drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 1 comparisons.
## The logFC agreement among the methods follows:
##                 nn_vs_ntmn
## basic_vs_deseq      0.8651
## basic_vs_dream      0.8738
## basic_vs_ebseq      0.8367
## basic_vs_edger      0.8671
## basic_vs_limma      0.8771
## basic_vs_noiseq     0.8726
## deseq_vs_dream      0.9719
## deseq_vs_ebseq      0.9344
## deseq_vs_edger      0.9988
## deseq_vs_limma      0.9665
## deseq_vs_noiseq     0.9738
## dream_vs_ebseq      0.9770
## dream_vs_edger      0.9741
## dream_vs_limma      0.9953
## dream_vs_noiseq     0.9616
## ebseq_vs_edger      0.9383
## ebseq_vs_limma      0.9728
## ebseq_vs_noiseq     0.9486
## edger_vs_limma      0.9688
## edger_vs_noiseq     0.9745
## limma_vs_noiseq     0.9584
drug_table <- combine_de_tables(
  drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_drug_comparison-v{ver}.xlsx"))
drug_table
## A set of combined differential expression results.
##                       table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 none_vs_antimony-inverted         480           764         480           759
##   limma_sigup limma_sigdown
## 1         471           700
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(drug_table, celltype = "all")

drug_sig <- extract_significant_genes(
  drug_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_drug_sig-v{ver}.xlsx"))
drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      471        700      480        759      480        764      323
##      ebseq_down basic_up basic_down
## drug        577      256        393

drug_highsig <- extract_significant_genes(
  drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_drug_highsig-v{ver}.xlsx"))
drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      222        388      233        427      231        429      162
##      ebseq_down basic_up basic_down
## drug        346      208        343

drug_lesssig <- extract_significant_genes(
  drug_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_drug_lesssig-v{ver}.xlsx"))
drug_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug     1120       1326     1098       1451     1082       1461      647
##      ebseq_down basic_up basic_down
## drug        983      772        945

2.4.1.1 gProfiler2 results of the significant drug genes

all_drug_gp <- all_gprofiler(drug_sig, enrich_id_column = "hgnc_symbol")
all_drug_gp
##            BP CC CORUM HP HPA KEGG MIRNA MF REAC  TF WP
## drug_up    88 32     0  0   0    5     0 38   78  26  9
## drug_down 320 61     0  0   0    1     1 32    2 297  2
written <- write_all_gp(all_drug_gp)

all_drug_lesssig <- all_gprofiler(drug_lesssig, enrich_id_column = "hgnc_symbol")
written <- write_all_gp(all_drug_lesssig, suffix = "_lfc0.6_")

2.4.2 Combined U937 and Macrophages: compare cell types

There are a couple of ways one might want to directly compare the two cell types.

  • Given that the variance between the two celltypes is so huge, just compare all samples.
  • One might want to compare them with the interaction effects of drug/zymodeme.
type_de <- all_pairwise(all_human_types, filter = TRUE, model_fstring = "~ 0 + condition",
                        model_svs = "svaseq")
## Macrophages        U937 
##          54          14
## Running normalize_se.
## Removing 9198 low-count genes (12283 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 85798 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## Macrophages        U937 
##          54          14
## conditions
## Macrophages        U937 
##          54          14
## conditions
## Macrophages        U937 
##          54          14
type_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 1 comparisons.
## The logFC agreement among the methods follows:
##                 U937_vs_Mc
## basic_vs_deseq      0.8587
## basic_vs_dream      0.8853
## basic_vs_ebseq      0.8125
## basic_vs_edger      0.8576
## basic_vs_limma      0.8982
## basic_vs_noiseq     0.9114
## deseq_vs_dream      0.9938
## deseq_vs_ebseq      0.9805
## deseq_vs_edger      0.9974
## deseq_vs_limma      0.9849
## deseq_vs_noiseq     0.9835
## dream_vs_ebseq      0.9682
## dream_vs_edger      0.9947
## dream_vs_limma      0.9932
## dream_vs_noiseq     0.9910
## ebseq_vs_edger      0.9836
## ebseq_vs_limma      0.9490
## ebseq_vs_noiseq     0.9652
## edger_vs_limma      0.9859
## edger_vs_noiseq     0.9829
## limma_vs_noiseq     0.9817
type_table <- combine_de_tables(
  type_de, keepers = tmrc2_type_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_type_comparison-v{ver}.xlsx"))
type_table
## A set of combined differential expression results.
##                 table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 U937_vs_Macrophages        2105          2436        2076          2460
##   limma_sigup limma_sigdown
## 1        2247          2129
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(type_table, celltype = "all")

type_sig <- extract_significant_genes(
  type_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_sig-v{ver}.xlsx"))
type_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## type     2247       2129     2076       2460     2105       2436     1880
##      ebseq_down basic_up basic_down
## type       2485     1972       1784

type_highsig <- extract_significant_genes(
  type_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_highsig-v{ver}.xlsx"))
type_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## type     1365       1632     1297       1762     1322       1736     1181
##      ebseq_down basic_up basic_down
## type       1789     1345       1613

type_lesssig <- extract_significant_genes(
  type_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_lesssig-v{ver}.xlsx"))
type_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## type     2247       2129     2076       2460     2105       2436     1880
##      ebseq_down basic_up basic_down
## type       2485     1972       1784

2.4.2.1 Combined factors of interest: celltype+zymodeme

Given the above explicit comparison of all samples comprising the two cell types, now let us look at the drug treatment+zymodeme status with all samples, macrophages and U937.

type_zymo_de <- all_pairwise(type_zymo, filter = TRUE, model_svs = "svaseq",
                             model_fstring = "~ 0 + condition",
                             extra_contrasts = type_zymo_extra)
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
## Running normalize_se.
## Removing 9198 low-count genes (12283 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 85798 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast zymos is not in the results.
## If this is not an extra contrast, then this is an error.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
## conditions
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
## conditions
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6

type_zymo_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.

Strangely, as of 20250903, the following line throws an error in the container: “Error: subscript contains invalid names.”

But when I run it manually I get no error. I assume this means that I fell behind while maintaining hpgltools HEAD with the container?

In addition, the functions actually did return successfully.

type_zymo_table <- combine_de_tables(
  type_zymo_de, keepers = tmrc2_typezymo_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_type_zymo_comparison-v{ver}.xlsx"))
## Error : subscript contains invalid names
## coefficient limma did not find NA or zymos_vs_types.
## coefficient edger did not find conditionNA or conditionzymos_vs_types.
## coefficient limma did not find NA or zymos_vs_types.
type_zymo_table
## A set of combined differential expression results.
##                                    table deseq_sigup deseq_sigdown edger_sigup
## 1 U937_none_vs_Macrophages_none-inverted        2353          2081        2360
## 2     Macrophages_z23_vs_Macrophages_z22         300           459         297
## 3                   U937_z23_vs_U937_z22           1             2           1
## 4            U937_z23_vs_Macrophages_z23        2153          2468        2122
## 5            U937_z22_vs_Macrophages_z22        2024          2539        2005
## 6                         zymos_vs_types           0             0         334
##   edger_sigdown limma_sigup limma_sigdown
## 1          2088        2017          2206
## 2           460         382           318
## 3             3           0             0
## 4          2498        2294          2182
## 5          2558        2272          2154
## 6           219         185           222
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(type_zymo_table, celltype = "all")

type_zymo_sig <- extract_significant_genes(
  type_zymo_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_zymo_sig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
type_zymo_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## u937_macr       2017       2206     2360       2088     2353       2081
## zymo_macr        382        318      297        460      300        459
## zymo_u937          0          0        1          3        1          2
## z23_types       2294       2182     2122       2498     2153       2468
## z22_types       2272       2154     2005       2558     2024       2539
## zymos_types      185        222      334        219        0          0
##             ebseq_up ebseq_down basic_up basic_down
## u937_macr       1720       1867        0          0
## zymo_macr        211        255      213        113
## zymo_u937          0          1        0          0
## z23_types       1971       2021     1997       1808
## z22_types       1899       2423     2001       1804
## zymos_types        0          0        0          0

type_zymo_highsig <- extract_significant_genes(
  type_zymo_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_zymo_highsig-v{ver}.xlsx"))
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
type_zymo_lesssig <- extract_significant_genes(
  type_zymo_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_zymo_lesssig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
type_zymo_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## u937_macr       2865       3266     3296       3115     3292       3121
## zymo_macr        737        704      624        946      622        936
## zymo_u937          1          1        1          3        1          2
## z23_types       3432       3109     3197       3550     3242       3501
## z22_types       3465       3072     3141       3536     3181       3509
## zymos_types      338        331      527        374        0          0
##             ebseq_up ebseq_down basic_up basic_down
## u937_macr       2157       2696        0          0
## zymo_macr        343        418      442        389
## zymo_u937          2          1        0          0
## z23_types       2933       2687     3151       2748
## z22_types       2949       3250     3231       2760
## zymos_types        0          0        0          0

2.4.2.2 Combined factors of interest: celltype+drug

The ‘type_drug’ datastructure is the same as above, but the condition is created from the concatenation of the cell type and drug treatment.

type_drug_de <- all_pairwise(type_drug, filter = TRUE, model_svs = "svaseq",
                             model_fstring = "~ 0 + condition")
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
## Running normalize_se.
## Removing 9198 low-count genes (12283 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 85798 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
## conditions
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
## conditions
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7

type_drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
type_drug_table <- combine_de_tables(
  type_drug_de, keepers = tmrc2_typedrug_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_type_drug_comparison-v{ver}.xlsx"))
type_drug_table
## A set of combined differential expression results.
##                                               table deseq_sigup deseq_sigdown
## 1                     U937_none_vs_Macrophages_none        2094          2644
## 2             U937_antimony_vs_Macrophages_antimony        2102          2375
## 3 Macrophages_none_vs_Macrophages_antimony-inverted         606           966
## 4               U937_none_vs_U937_antimony-inverted         421           167
##   edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1        2059          2668        2295          2194
## 2        2083          2385        2254          2133
## 3         605           960         672           910
## 4         442           176         211           162
## Plot describing unique/shared genes in a differential expression table.

#combined_to_tsv(type_drug_table, celltype = "all")

type_drug_sig <- extract_significant_genes(
  type_drug_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_drug_sig-v{ver}.xlsx"))
type_drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## type_nodrug     2295       2194     2059       2668     2094       2644
## type_drug       2254       2133     2083       2385     2102       2375
## macr_drugs       672        910      605        960      606        966
## u937_drugs       211        162      442        176      421        167
##             ebseq_up ebseq_down basic_up basic_down
## type_nodrug     1956       2465     2041       1852
## type_drug       2008       2312     1989       1856
## macr_drugs       482        881      369        569
## u937_drugs       359        157      168        146

type_drug_highsig <- extract_significant_genes(
  type_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_drug_highsig-v{ver}.xlsx"))
type_drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## type_nodrug     1394       1690     1312       1895     1343       1869
## type_drug       1386       1649     1302       1734     1321       1721
## macr_drugs       330        520      300        564      303        565
## u937_drugs       102         84      210        105      202        101
##             ebseq_up ebseq_down basic_up basic_down
## type_nodrug     1275       1789     1402       1659
## type_drug       1266       1719     1379       1657
## macr_drugs       243        517      294        471
## u937_drugs       168        100       99         85

type_drug_lesssig <- extract_significant_genes(
  type_drug_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophage_type_drug_lesssig-v{ver}.xlsx"))
type_drug_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## type_nodrug     3497       3150     3169       3694     3220       3659
## type_drug       3387       3114     3181       3438     3210       3414
## macr_drugs      1422       1592     1311       1747     1303       1743
## u937_drugs       459        429      800        417      770        414
##             ebseq_up ebseq_down basic_up basic_down
## type_nodrug     3010       3258     3246       2824
## type_drug       3048       3198     3187       2886
## macr_drugs       999       1467     1071       1179
## u937_drugs       720        454      470        449

3 Individual cell types

At this point, I think it is fair to say that the two cell types are sufficiently different that they do not really belong together in a single analysis.

3.1 drug or strain effects, single cell type

One of the queries Najib asked which I think I misinterpreted was to look at drug and/or strain effects. My interpretation is somewhere below and was not what he was looking for. Instead, he was looking to see all(macrophage) drug/nodrug and all(macrophage) z23/z22 and compare them to each other. It may be that this is still a wrong interpretation, if so the most likely comparison is either:

  • (z23drug/z22drug) / (z23nodrug/z22nodrug), or perhaps
  • (z23drug/z23nodrug) / (z22drug/z22nodrug),

I am not sure those confuse me, and at least one of them is below

3.2 Macrophages

In these blocks we will explicitly query only one factor at a time, drug and strain. The eventual goal is to look for effects of drug treatment and/or strain treatment which are shared?

3.2.1 Macrophage Drug only

Thus we will start with the pure drug query. In this block we will look only at the drug/nodrug effect.

hs_macr_drug_de <- all_pairwise(hs_macr_drug_se, filter = TRUE, model_svs = "svaseq",
                                model_fstring = "~ 0 + condition")
## antimony     none 
##       27       27
## Running normalize_se.
## Removing 9725 low-count genes (11756 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 40036 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## antimony     none 
##       27       27
## conditions
## antimony     none 
##       27       27
## conditions
## antimony     none 
##       27       27
hs_macr_drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 1 comparisons.
## The logFC agreement among the methods follows:
##                 nn_vs_ntmn
## basic_vs_deseq      0.8930
## basic_vs_dream      0.8901
## basic_vs_ebseq      0.8651
## basic_vs_edger      0.8942
## basic_vs_limma      0.8948
## basic_vs_noiseq     0.9048
## deseq_vs_dream      0.9951
## deseq_vs_ebseq      0.9647
## deseq_vs_edger      0.9997
## deseq_vs_limma      0.9911
## deseq_vs_noiseq     0.9837
## dream_vs_ebseq      0.9725
## dream_vs_edger      0.9952
## dream_vs_limma      0.9961
## dream_vs_noiseq     0.9880
## ebseq_vs_edger      0.9643
## ebseq_vs_limma      0.9694
## ebseq_vs_noiseq     0.9891
## edger_vs_limma      0.9911
## edger_vs_noiseq     0.9837
## limma_vs_noiseq     0.9852
hs_macr_drug_table <- combine_de_tables(
  hs_macr_drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_onlydrug_table-v{ver}.xlsx"))
hs_macr_drug_table
## A set of combined differential expression results.
##                       table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 none_vs_antimony-inverted         519           862         525           852
##   limma_sigup limma_sigdown
## 1         556           808
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

#combined_to_tsv(hs_macr_drug_table, celltype = "macrophage")

hs_macr_drug_sig <- extract_significant_genes(
  hs_macr_drug_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_drug_sig-v{ver}.xlsx"))
hs_macr_drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      556        808      525        852      519        862      425
##      ebseq_down basic_up basic_down
## drug        821      366        562

hs_macr_drug_highsig <- extract_significant_genes(
  hs_macr_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_drug_highsig-v{ver}.xlsx"))
hs_macr_drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      283        492      273        539      268        548      225
##      ebseq_down basic_up basic_down
## drug        511      285        479

## Creating the following to see how it affects gProfiler.
hs_macr_drug_lesssig <- extract_significant_genes(
  hs_macr_drug_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_drug_sig_lfc0.6-v{ver}.xlsx"))

3.2.2 Macrophage Strain only

In a similar fashion, let us look for effects which are observed when we consider only the strain used during infection.

hs_macr_strain_de <- all_pairwise(hs_macr_strain_se, filter = TRUE, model_svs = "svaseq",
                                  model_fstring = "~ 0 + condition")
## z22 z23 
##  23  23
## Running normalize_se.
## Removing 9761 low-count genes (11720 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 32467 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## z22 z23 
##  23  23
## conditions
## z22 z23 
##  23  23
## conditions
## z22 z23 
##  23  23
hs_macr_strain_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 1 comparisons.
## The logFC agreement among the methods follows:
##                 z23_vs_z22
## basic_vs_deseq      0.8273
## basic_vs_dream      0.8408
## basic_vs_ebseq      0.8021
## basic_vs_edger      0.8313
## basic_vs_limma      0.8629
## basic_vs_noiseq     0.1082
## deseq_vs_dream      0.9850
## deseq_vs_ebseq      0.9721
## deseq_vs_edger      0.9991
## deseq_vs_limma      0.9637
## deseq_vs_noiseq     0.2183
## dream_vs_ebseq      0.9734
## dream_vs_edger      0.9876
## dream_vs_limma      0.9782
## dream_vs_noiseq     0.2520
## ebseq_vs_edger      0.9726
## ebseq_vs_limma      0.9614
## ebseq_vs_noiseq     0.4180
## edger_vs_limma      0.9668
## edger_vs_noiseq     0.2127
## limma_vs_noiseq     0.2754
hs_macr_strain_table <- combine_de_tables(
  hs_macr_strain_de, keepers = tmrc2_strain_keepers,
  excel = glue("analyses/macrophage_de/de_tables/macrophage_onlystrain_table-v{ver}.xlsx"))
hs_macr_strain_table
## A set of combined differential expression results.
##        table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 z23_vs_z22         291           371         288           363         337
##   limma_sigdown
## 1           275
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(hs_macr_strain_table, celltype = "macrophage")

hs_macr_strain_sig <- extract_significant_genes(
  hs_macr_strain_table,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_onlystrain_sig-v{ver}.xlsx"))
hs_macr_strain_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## strain      337        275      288        363      291        371      199
##        ebseq_down basic_up basic_down
## strain        216      210        112

hs_macr_strain_highsig <- extract_significant_genes(
  hs_macr_strain_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_onlystrain_highsig-v{ver}.xlsx"))
hs_macr_strain_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## strain      193        101      194        110      194        112      156
##        ebseq_down basic_up basic_down
## strain         51      184         93

hs_macr_strain_lesssig <- extract_significant_genes(
  hs_macr_strain_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/macrophageonly_onlystrain_lesssig-v{ver}.xlsx"))
hs_macr_strain_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## strain      667        662      608        822      607        830      331
##        ebseq_down basic_up basic_down
## strain        370      446        387

3.2.3 Compare Drug and Strain Effects

Now let us consider the above two comparisons together. First, I will plot the logFC values of them against each other (drug on x-axis and strain on the y-axis). Then we can extract the significant genes in a few combined categories of interest. I assume these will focus exclusively on the categories which include the introduction of the drug.

drug_strain_comp_df <- merge(hs_macr_drug_table[["data"]][["drug"]],
                             hs_macr_strain_table[["data"]][["strain"]],
                             by = "row.names")
drug_strain_comp_plot <- plot_linear_scatter(
  drug_strain_comp_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Contrasts: antimony/none, z23/z22; x-axis: drug, y-axis: strain
## top left: higher no drug, z23; top right: higher drug z23
## bottom left: higher no drug, z22; bottom right: higher drug z22
drug_strain_comp_plot[["scatter"]]

As I noted in the comments above, some quadrants of the scatter plot are likely to be of greater interest to us than others (the right side). Because I get confused sometimes, the following block will explicitly name the categories of likely interest, then ask which genes are shared among them, and finally use UpSetR to extract the various gene intersection/union categories.

higher_drug <- hs_macr_drug_sig[["deseq"]][["downs"]][[1]]
higher_nodrug <- hs_macr_drug_sig[["deseq"]][["ups"]][[1]]
higher_z23 <- hs_macr_strain_sig[["deseq"]][["ups"]][[1]]
higher_z22 <- hs_macr_strain_sig[["deseq"]][["downs"]][[1]]
sum(rownames(higher_drug) %in% rownames(higher_z23))
## [1] 94
sum(rownames(higher_drug) %in% rownames(higher_z22))
## [1] 87
sum(rownames(higher_nodrug) %in% rownames(higher_z23))
## [1] 26
sum(rownames(higher_nodrug) %in% rownames(higher_z22))
## [1] 73
drug_z23_lst <- list("drug" = rownames(higher_drug),
                     "z23" = rownames(higher_z23))
upset_input <- UpSetR::fromList(drug_z23_lst)
higher_drug_z23 <- upset(upset_input, text.scale = 2)
higher_drug_z23

drug_z23_shared_genes <- overlap_groups(drug_z23_lst)
shared_genes_drug_z23 <- overlap_geneids(drug_z23_shared_genes, "drug:z23")
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[drug_z23_shared_genes[["drug:z23"]]]

drug_z22_lst <- list("drug" = rownames(higher_drug),
                     "z22" = rownames(higher_z22))
higher_drug_z22 <- upset(UpSetR::fromList(drug_z22_lst), text.scale = 2)
higher_drug_z22

drug_z22_shared_genes <- overlap_groups(drug_z22_lst)
shared_genes_drug_z22 <- overlap_geneids(drug_z22_shared_genes, "drug:z22")
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[drug_z22_shared_genes[["drug:z22"]]]

3.2.4 Perform gProfiler on drug/strain effect shared genes

Now that we have some populations of genes which are shared across the drug/strain effects, let us pass them to some GSEA analyses and see what pops out.

wanted <- drug_z23_shared_genes[["drug:z23"]]
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[wanted]
shared_drug_z23_gp <- simple_gprofiler(shared_genes_drug_z23)
shared_drug_z23_gp[["pvalue_plots"]][["MF"]]
## NULL
shared_drug_z23_gp[["pvalue_plots"]][["BP"]]
## NULL
shared_drug_z23_gp[["pvalue_plots"]][["REAC"]]

wanted <- drug_z22_shared_genes[["drug:z22"]]
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[wanted]
shared_drug_z22_gp <- simple_gprofiler(shared_genes_drug_z22)
shared_drug_z22_gp[["pvalue_plots"]][["BP"]]
## NULL

4 Our main question of interest

The data structure hs_macr contains our primary macrophages, which are, as shown above, the data we can really sink our teeth into.

Note, we expect some errors when running the combine_de_tables() because not all methods I use are comfortable using the ratio or ratios contrasts we added in the ‘extras’ argument. As a result, when we combine them into the larger output tables, those peculiar contrasts fail. This does not stop it from writing the rest of the results, however.

hs_macr_de_noextra <- all_pairwise(hs_macr, model_svs = "svaseq",
                                   model_fstring = "~ 0 + condition", filter = TRUE)
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## Running normalize_se.
## Removing 9725 low-count genes (11756 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 40036 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4

hs_macr_de <- all_pairwise(hs_macr, model_svs = "svaseq", model_fstring = "~ 0 + condition",
                           filter = TRUE, extra_contrasts = tmrc2_human_extra)
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## Running normalize_se.
## Removing 9725 low-count genes (11756 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 40036 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast z23drugnodrug is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast z23z22drug is not in the results.
## If this is not an extra contrast, then this is an error.
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##            12            11            11            12             4 
## uninf_sb_none 
##             4
hs_macr_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.

Write out the results.

hs_single_table <- combine_de_tables(
  hs_macr_de_noextra, keepers = single_tmrc2_keeper,
  excel = glue("analyses/macrophage_de/de_tables/hs_macr_drug_zymo_z22sb_sb-v{ver}.xlsx"))
hs_single_table
## A set of combined differential expression results.
##                                  table deseq_sigup deseq_sigdown edger_sigup
## 1 uninf_sb_none_vs_inf_sb_z22-inverted          33             0          32
##   edger_sigdown limma_sigup limma_sigdown
## 1             0           2             0
## Only z22sb_vs_sb_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
hs_macr_table <- combine_de_tables(
  hs_macr_de, keepers = tmrc2_human_keepers,
  excel = glue("analyses/macrophage_de/de_tables/hs_macr_drug_zymo_table_macr_only-v{ver}.xlsx"))
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_z2322 using basic does not appear in the pairwise data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_z2322 using ebseq does not appear in the pairwise data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_z2322 using noiseq does not appear in the pairwise data.
## Error : subscript contains invalid names
## coefficient limma did not find z22drugnodrug or z23drugnodrug.
## coefficient edger did not find conditionz22drugnodrug or conditionz23drugnodrug.
## coefficient limma did not find z22drugnodrug or z23drugnodrug.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_drugnodrug using basic does not appear in the pairwise
## data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_drugnodrug using ebseq does not appear in the pairwise
## data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_drugnodrug using noiseq does not appear in the pairwise
## data.
## Error : subscript contains invalid names
## coefficient limma did not find z23z22nodrug or z23z22drug.
## coefficient edger did not find conditionz23z22nodrug or conditionz23z22drug.
## coefficient limma did not find z23z22nodrug or z23z22drug.
hs_macr_table
## A set of combined differential expression results.
##                                   table deseq_sigup deseq_sigdown edger_sigup
## 1        uninf_none_vs_inf_z23-inverted         478           265         470
## 2        uninf_none_vs_inf_z22-inverted         359             6         340
## 3                    inf_z23_vs_inf_z22         349           539         359
## 4              inf_sb_z23_vs_inf_sb_z22         343           252         339
## 5        inf_z23_vs_inf_sb_z23-inverted         619           828         625
## 6        inf_z22_vs_inf_sb_z22-inverted         505          1040         520
## 7  uninf_sb_none_vs_inf_sb_z23-inverted         461           247         461
## 8  uninf_sb_none_vs_inf_sb_z22-inverted          33             0          32
## 9     uninf_none_vs_inf_sb_z23-inverted         839           923         854
## 10    uninf_none_vs_inf_sb_z22-inverted         660           746         672
## 11          uninf_sb_none_vs_uninf_none         561           748         563
## 12                                FALSE           0             0         329
## 13                                FALSE           0             0         329
##    edger_sigdown limma_sigup limma_sigdown
## 1            270         392           251
## 2              6         264            71
## 3            528         450           390
## 4            253         377           215
## 5            821         571           746
## 6           1009         671           925
## 7            249         374           232
## 8              0           2             0
## 9            906         805           914
## 10           733         555           744
## 11           742         513           696
## 12            63         243           135
## 13            63         243           135
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(hs_macr_table, "macrophage")

hs_macr_sig <- extract_significant_genes(
  hs_macr_table,
  excel = glue("analyses/macrophage_de/sig_tables/hs_macr_drug_zymo_sig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
hs_macr_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf        392        251      470        270      478        265
## z22nosb_vs_uninf        264         71      340          6      359          6
## z23nosb_vs_z22nosb      450        390      359        528      349        539
## z23sb_vs_z22sb          377        215      339        253      343        252
## z23sb_vs_z23nosb        571        746      625        821      619        828
## z22sb_vs_z22nosb        671        925      520       1009      505       1040
## z23sb_vs_sb             374        232      461        249      461        247
## z22sb_vs_sb               2          0       32          0       33          0
## z23sb_vs_uninf          805        914      854        906      839        923
## z22sb_vs_uninf          555        744      672        733      660        746
## sb_vs_uninf             513        696      563        742      561        748
## extra_z2322             243        135      329         63        0          0
## extra_drugnodrug        243        135      329         63        0          0
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf        111        112        0          0
## z22nosb_vs_uninf        160          2        0          0
## z23nosb_vs_z22nosb      257        408      281        259
## z23sb_vs_z22sb          106        108      117         44
## z23sb_vs_z23nosb        412        699      371        540
## z22sb_vs_z22nosb        458        886      437        680
## z23sb_vs_sb              33         58        0          0
## z22sb_vs_sb              25          0        0          0
## z23sb_vs_uninf          280        767      350        489
## z22sb_vs_uninf          444        551      276        396
## sb_vs_uninf             316        495        0          0
## extra_z2322               0          0        0          0
## extra_drugnodrug          0          0        0          0

hs_macr_highsig <- extract_significant_genes(
  hs_macr_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/hs_macr_drug_zymo_highsig-v{ver}.xlsx"))
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
hs_macr_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf        269        139      317        139      314        138
## z22nosb_vs_uninf        103          4      110          0      115          0
## z23nosb_vs_z22nosb      221        154      247        174      238        178
## z23sb_vs_z22sb          211        105      210         86      211         84
## z23sb_vs_z23nosb        305        482      306        566      303        570
## z22sb_vs_z22nosb        330        545      301        572      288        598
## z23sb_vs_sb             250        130      278        140      274        140
## z22sb_vs_sb               2          0        9          0       13          0
## z23sb_vs_uninf          499        603      491        605      482        618
## z22sb_vs_uninf          310        479      318        501      303        513
## sb_vs_uninf             291        459      294        495      291        498
## extra_z2322             243        135      329         63        0          0
## extra_drugnodrug        243        135      329         63        0          0
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf         87         64        0          0
## z22nosb_vs_uninf         41          0        0          0
## z23nosb_vs_z22nosb      207        140      214        169
## z23sb_vs_z22sb           80         37       99         35
## z23sb_vs_z23nosb        212        529      293        465
## z22sb_vs_z22nosb        276        519      326        540
## z23sb_vs_sb              21         28        0          0
## z22sb_vs_sb               5          0        0          0
## z23sb_vs_uninf          177        550      295        401
## z22sb_vs_uninf          235        393      216        321
## sb_vs_uninf             191        352        0          0
## extra_z2322               0          0        0          0
## extra_drugnodrug          0          0        0          0

hs_macr_lesssig <- extract_significant_genes(
  hs_macr_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/hs_macr_drug_zymo_sig_lfc0.6-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no deseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no ebseq_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensembl_gene_id, ensembl_transcript_id, version, transcript_version, description, gene_biotype, cds_length, chromosome_name, strand, start_position, end_position, hgnc_symbol, transcript, dream_logfc, dream_adjp, edger_logfc, edger_adjp, limma_logfc, limma_adjp, dream_ave, dream_t, dream_p, dream_b, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_p, limma_b, limma_adjp_fdr, dream_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
hs_macr_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf        701        587      856        594      865        587
## z22nosb_vs_uninf        378        128      464         21      505         22
## z23nosb_vs_z22nosb      867        786      746        964      728        981
## z23sb_vs_z22sb          670        587      649        645      655        641
## z23sb_vs_z23nosb       1237       1395     1297       1536     1279       1545
## z22sb_vs_z22nosb       1492       1542     1287       1692     1211       1747
## z23sb_vs_sb             622        643      761        610      772        614
## z22sb_vs_sb               2          0       33          0       34          0
## z23sb_vs_uninf         1516       1656     1595       1671     1557       1700
## z22sb_vs_uninf         1127       1297     1246       1293     1222       1340
## sb_vs_uninf            1037       1148     1055       1262     1042       1291
## extra_z2322             381        288      482        210        0          0
## extra_drugnodrug        381        288      482        210        0          0
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf        141        196        0          0
## z22nosb_vs_uninf        186          4        0          0
## z23nosb_vs_z22nosb      463        602      593        580
## z23sb_vs_z22sb          144        237      193        144
## z23sb_vs_z23nosb        774       1166     1016       1047
## z22sb_vs_z22nosb       1044       1349     1228       1264
## z23sb_vs_sb              39        115        0          0
## z22sb_vs_sb              30          0        0          0
## z23sb_vs_uninf          458       1259      665        812
## z22sb_vs_uninf          746        907      535        686
## sb_vs_uninf             495        788        0          0
## extra_z2322               0          0        0          0
## extra_drugnodrug          0          0        0          0

4.1 gene group upset

4.1.1 2.3 vs 2.2 up and down vs. uninfected

This is my version of the Venn diagram which includes the text:

“Differentially expressed genes in macrophages infected with subpopulations 2.2 or 2.3. Volcano plots contrast of: A. Venn diagram for upregulated and downregulated genes by infection with 2.3 and 2.2 strains. B. infected cells with 2.3 strains and uninfected cells; C. infected cells with 2.2 strains and uninfected cells; D. infected cells with 2.3 strains and infected cells with 2.2 strains”

The following upset plot is currently Figure 2E.

nodrug_upset <- upsetr_combined_de(hs_macr_table,
                                   desired_contrasts = c("z22nosb_vs_uninf", "z23nosb_vs_uninf"))
pp(file = "images/nodrug_upset.svg")
nodrug_upset[["plot"]]
dev.off()
## png 
##   2
nodrug_upset
## Plot describing unique/shared genes in a differential expression table.

4.1.1.1 A point of interest while Olga visits Umd

Najib and Olga asked about pulling the 9 gene IDs which are in the peculiar situation of increased expression in z2.2/uninf and decreased in z2.3/uninf. In the previous upset plot, these are visible in the 6th bar. I can access these via the attr() function, which I should admit I can never remember how to use, so I am going to use the code under the ‘Compare(no)Sb z2.3/z2.2 treatment’ heading to remember how to extract these genes.

all_groups <- nodrug_upset[["groups"]]
wanted_group <- "z23nosb_vs_uninf_down:z22nosb_vs_uninf_up"
gene_idx <- all_groups[[wanted_group]]
wanted_genes <- attr(all_groups, "elements")[gene_idx]
wanted_genes
## [1] "ENSG00000004846" "ENSG00000111783" "ENSG00000118298" "ENSG00000120738"
## [5] "ENSG00000126217" "ENSG00000163687" "ENSG00000170345" "ENSG00000244242"
## [9] "ENSG00000277481"
gene_symbol_idx <- rownames(rowData(hs_macr)) %in% as.character(wanted_genes)
rowData(hs_macr)[gene_symbol_idx, "hgnc_symbol"]
## [1] "ABCB5"    "RFX4"     "CA14"     "EGR1"     "MCF2L"    "DNASE1L3" "FOS"     
## [8] "IFITM10"  "PKD1L3"
  • ABCB5: ATB Binding Cassette Subfamily B Member #5, wide range of functions in this diverse paralogous family. Associated with skin diseases (melanoma and Epidermolysis Bullosa; participate in ATP-dependent transmembrane transport).
  • RFX4: Regulatory Factor X #4: transcription factor.
  • CA14: Carbonic anhydrase #14: Zync metalloenzyme catalyzes reversible hydration of CO2. This gene looks pretty neat, but not really relevant to anything we are likely to care about.
  • EGR1: Early Growth Response Protein #1: Another Tx factor (zinc-finger) – important for cell survival/proliferation/cell death. Presumably important for healing?
  • MCF2L: MCF.2 Cell Line Derived Transforming Sequence Like? guanine nucleotide exchange factor interacting with GTP-bound Rac1. Apparently associated with ostroarthritis; potentially relevant to regulation of RHOA and CDC42 signalling.
  • DNASE1L3: Deoxyribonuclease I family member: not inhibited by actin, breaks down DNA during apoptosis. Important during necrosis.
  • FOS: Proto-Oncogene, AP-1 Transcription Factor: leucine zipper dimerizes with JUN family proteins, forming tx factor complex AP-1. Important for cell proliferation, differentiation, and transformation.
  • IFITM10: Interferon-Induced Transmembrane Protein #10
  • PKD1L3: Polycystin 1 Like #3, Transient Receptor Potential Channel Interacting: 11 transmembrane domain protein which might help create cation channels.

As some comparison points, the Venn in the current figure has:

  • 387 up z2.3
  • 259 up z2.2
  • 83 shared up z2.3 and z2.2
  • 247 down z2.3
  • 3 down z2.2
  • 3 shared down z2.3 and z2.2

4.1.2 2.2 and 2.3 with SbV vs 2.2 and 2.3 without SbV

This is my version of the Venn with the text:

“Differentially expressed genes in macrophages infected with subpopulations 2.2 or 2.3, in presence of SbV. Volcano plots contrast of: A. infected cells with 2.3 strains + SbV and infected cells with 2.3 strains; B. infected cells with 2.2 strains + SbV and infected cells with 2.2 strains; C. infected cells with 2.3 strains + SbV and infected cells with 2.2 strains + SbV. D. Venn diagram for upregulated and downregulated genes by infection with 2.3+SbV and 2.2+SbV strains.”

A query from Olga (20240801): Please include in the upset in figure 3 the contrast of uninfected cells + SbV vs uninfected without SbV.

## I keep mis-interpreting this text, it is z2.3/z2.3SbV and z2.2/z2.2SbV
drugnodrug_upset <- upsetr_combined_de(hs_macr_table,
                                       desired_contrasts = c("z23sb_vs_z23nosb", "z22sb_vs_z22nosb"))
pp(file = "images/drugnodrug_upset.pdf")
drugnodrug_upset[["plot"]]
dev.off()
## png 
##   2
drugnodrug_upset
## Plot describing unique/shared genes in a differential expression table.

drugnodrug_uninf_contrasts <- c("z23sb_vs_z23nosb", "z22sb_vs_z22nosb", "sb_vs_uninf")
drugnodrug_upset_with_uninf <- upsetr_combined_de(hs_macr_table,
                                       desired_contrasts = drugnodrug_uninf_contrasts)
pp(file = "figures/drugnodrug_with_uninf_upset.svg")
drugnodrug_upset_with_uninf[["plot"]]
dev.off()
## png 
##   2
drugnodrug_upset_with_uninf
## Plot describing unique/shared genes in a differential expression table.

For some comparison points, the venn image has:

  • 222 up z2.3 SbV
  • 134 up z2.2 SbV
  • 182 down z2.3 SbV
  • 396 down z2.2 SbV
  • 605 shared down z2.2 and z2.3 SbV
  • 34 shared down z2.2 SbV and up z2.3 SbV
  • 363 shared up z2.2 SbV and z2.3 SbV

4.1.3 Compare z2.2SbV vs SbV and z2.3SbV and SbV

drug_upset <- upsetr_combined_de(hs_macr_table,
                                 desired_contrasts = c("z22sb_vs_sb", "z23sb_vs_sb"))
pp(file = "images/drug_upset.pdf")
drug_upset[["plot"]]
dev.off()
## png 
##   2
drug_upset
## Plot describing unique/shared genes in a differential expression table.

4.2 Significance barplot of interest

Olga kindly sent a set of particularly interesting contrasts and colors for a significance barplot, they include the following:

  • z2.3 vs. uninfected.
  • z2.2 vs. uninfected.
  • z2.3 vs z2.2
  • z2.3Sbv vs z2.3
  • z2.2Sbv vs z2.2
  • z2.3Sbv vs z2.2Sbv
  • Sbv vs uninfected.

The existing set of ‘keepers’ exvised to these is taken from the extant set of ‘tmrc2_human_keepers’ and is as follows:

barplot_keepers <- list(
  ## z2.3 vs uninfected
  "z23nosb_vs_uninf" = c("inf_z23", "uninf_none"),
  ## z2.2 vs uninfected
  "z22nosb_vs_uninf" = c("inf_z22", "uninf_none"),
  ## z2.3 vs z2.2
  "z23nosb_vs_z22nosb" = c("inf_z23", "inf_z22"),
  ## z2.3Sbv vs z2.3
  "z23sb_vs_z23nosb" = c("inf_sb_z23", "inf_z23"),
  ## z2.2Sbv vs z2.2
  "z22sb_vs_z22nosb" = c("inf_sb_z22", "inf_z22"),
  ## z2.3Sbv vs z2.2Sbv
  "z23sb_vs_z22sb" = c("inf_sb_z23", "inf_sb_z22"),
  ## Sbv vs uninfected.
  "sb_vs_uninf" = c("uninf_sb_none", "uninf_none"))
barplot_combined <- combine_de_tables(
  hs_macr_de, keepers = barplot_keepers,
  excel = glue("analyses/macrophage_de/de_tables/hs_macr_drug_zymo_7contrasts-v{ver}.xlsx"))

Now let us use the colors suggested by Olga to make a barplot of these…

color_list <-  c( "#de8bf9", "#ad07e3","#410257", "#ffa0a0", "#f94040", "#a00000")
barplot_sig <- extract_significant_genes(
  barplot_combined, color_list = color_list, according_to = "deseq",
  excel = glue("analyses/macrophage_de/sig_tables/hs_macr_drug_zymo_7contrasts_sig-v{ver}.xlsx"))
barplot_sig
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    deseq_up deseq_down
## z23nosb_vs_uninf        478        265
## z22nosb_vs_uninf        359          6
## z23nosb_vs_z22nosb      349        539
## z23sb_vs_z23nosb        619        828
## z22sb_vs_z22nosb        505       1040
## z23sb_vs_z22sb          343        252
## sb_vs_uninf             561        748

5 PROPER

In our last meeting there were some questions about the statistical power of different future experimental designs. One thing I can do is to use PROPER to estimate the power of an extant dataset and infer from that the likely power of other designs.

In order to use proper, one must feed it one or more DE tables.

power_estimate <- simple_proper(hs_single_table)
## Error in if (all_coverage < cutoff) {: missing value where TRUE/FALSE needed
power_estimate[[1]][["power_plot"]]
## Error: object 'power_estimate' not found
power_estimate[[1]][["powertd_plot"]]
## Error: object 'power_estimate' not found
power_estimate[[1]][["powerfd_plot"]]
## Error: object 'power_estimate' not found

6 Our main questions in U937

Let us do the same comparisons in the U937 samples, though I will not do the extra contrasts, primarily because I think the dataset is less likely to support them.

u937_de <- all_pairwise(u937_se, model_svs = "svaseq",
                        filter = TRUE, model_fstring = "~ 0 + condition")
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##             3             3             3             3             1 
## uninf_sb_none 
##             1
## Running normalize_se.
## Removing 10730 low-count genes (10751 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 938 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##             3             3             3             3             1 
## uninf_sb_none 
##             1
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##             3             3             3             3             1 
## uninf_sb_none 
##             1
## conditions
##    inf_sb_z22    inf_sb_z23       inf_z22       inf_z23    uninf_none 
##             3             3             3             3             1 
## uninf_sb_none 
##             1 
## Error in NOISeq::noiseqbio(norm_input, k = k, norm = norm, factor = condition_column,  : 
##   ERROR: To run NOISeqBIO at least two replicates per condition are needed.
##          Please, run NOISeq if there are not enough replicates in your experiment.
u937_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
u937_table <- combine_de_tables(
  u937_de, keepers = u937_keepers,
  excel = glue("analyses/macrophage_de/de_tables/u937_drug_zymo_table-v{ver}.xlsx"))
u937_table
## A set of combined differential expression results.
##                                   table deseq_sigup deseq_sigdown edger_sigup
## 1        uninf_none_vs_inf_z23-inverted           0             5           2
## 2        uninf_none_vs_inf_z22-inverted           0             0           0
## 3                    inf_z23_vs_inf_z22           1             0          17
## 4              inf_sb_z23_vs_inf_sb_z22           0             0           0
## 5        inf_z23_vs_inf_sb_z23-inverted         256           171         311
## 6        inf_z22_vs_inf_sb_z22-inverted         298           154         305
## 7  uninf_sb_none_vs_inf_sb_z23-inverted           0             0           2
## 8  uninf_sb_none_vs_inf_sb_z22-inverted           0             0           2
## 9     uninf_none_vs_inf_sb_z23-inverted         296           151         306
## 10    uninf_none_vs_inf_sb_z22-inverted         294           169         300
## 11          uninf_sb_none_vs_uninf_none         239           119         261
##    edger_sigdown limma_sigup limma_sigdown
## 1              5           0             3
## 2              5           0             3
## 3              6           3             3
## 4              1           0             2
## 5            176         221           192
## 6            149         220           190
## 7              0           0             0
## 8              5           1             3
## 9            155         233           181
## 10           175         227           210
## 11           127         192           154
## Plot describing unique/shared genes in a differential expression table.

combined_to_tsv(u937_table, celltype = "u937")

u937_sig <- extract_significant_genes(
  u937_table,
  excel = glue("analyses/macrophage_de/sig_tables/u937_drug_zymo_sig-v{ver}.xlsx"))
u937_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf          0          3        2          5        0          5
## z22nosb_vs_uninf          0          3        0          5        0          0
## z23nosb_vs_z22nosb        3          3       17          6        1          0
## z23sb_vs_z22sb            0          2        0          1        0          0
## z23sb_vs_z23nosb        221        192      311        176      256        171
## z22sb_vs_z22nosb        220        190      305        149      298        154
## z23sb_vs_sb               0          0        2          0        0          0
## z22sb_vs_sb               1          3        2          5        0          0
## z23sb_vs_uninf          233        181      306        155      296        151
## z22sb_vs_uninf          227        210      300        175      294        169
## sb_vs_uninf             192        154      261        127      239        119
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf          5         14        0          0
## z22nosb_vs_uninf          0          7        0          0
## z23nosb_vs_z22nosb        8         42        0          0
## z23sb_vs_z22sb            0          0        0          0
## z23sb_vs_z23nosb        328        179        0          0
## z22sb_vs_z22nosb        279        150        0          0
## z23sb_vs_sb               5          4        0          0
## z22sb_vs_sb               7          6        0          0
## z23sb_vs_uninf          267        122        0          0
## z22sb_vs_uninf          226        163        0          0
## sb_vs_uninf             152        175        0          0

u937_highsig <- extract_significant_genes(
  u937_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/sig_tables/u937_drug_zymo_highsig-v{ver}.xlsx"))
u937_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf          0          3        0          4        0          4
## z22nosb_vs_uninf          0          2        0          4        0          0
## z23nosb_vs_z22nosb        2          3        6          4        1          0
## z23sb_vs_z22sb            0          0        0          0        0          0
## z23sb_vs_z23nosb        149        125      174        116      160        120
## z22sb_vs_z22nosb        130        111      152        104      149        107
## z23sb_vs_sb               0          0        0          0        0          0
## z22sb_vs_sb               0          1        0          1        0          0
## z23sb_vs_uninf          145         99      155         97      154         96
## z22sb_vs_uninf          143        119      155        115      155        116
## sb_vs_uninf             126         91      137         89      136         89
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf          2          4        0          0
## z22nosb_vs_uninf          0          2        0          0
## z23nosb_vs_z22nosb        0         25        0          0
## z23sb_vs_z22sb            0          0        0          0
## z23sb_vs_z23nosb        182        119        0          0
## z22sb_vs_z22nosb        139        103        0          0
## z23sb_vs_sb               0          0        0          0
## z22sb_vs_sb               0          0        0          0
## z23sb_vs_uninf          161         83        0          0
## z22sb_vs_uninf          136        112        0          0
## sb_vs_uninf              89        139        0          0

u937_lesssig <- extract_significant_genes(
  u937_table, lfc = 0.6,
  excel = glue("analyses/macrophage_de/sig_tables/u937_drug_zymo_lesssig-v{ver}.xlsx"))
u937_lesssig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 0.6 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf          1          5        6         13        2          8
## z22nosb_vs_uninf          0          7        0         15        0          2
## z23nosb_vs_z22nosb       17         10       40         23        2          2
## z23sb_vs_z22sb            1          3        7          3        1          0
## z23sb_vs_z23nosb        478        433      627        416      499        421
## z22sb_vs_z22nosb        568        506      739        451      678        467
## z23sb_vs_sb               0          1        4          3        0          2
## z22sb_vs_sb               1         16        7         26        2         11
## z23sb_vs_uninf          487        472      600        439      566        409
## z22sb_vs_uninf          517        568      641        522      596        500
## sb_vs_uninf             430        400      535        373      466        345
##                    ebseq_up ebseq_down basic_up basic_down
## z23nosb_vs_uninf         14        113        0          0
## z22nosb_vs_uninf          0         27        0          0
## z23nosb_vs_z22nosb       78        427        0          0
## z23sb_vs_z22sb            5          1        0          0
## z23sb_vs_z23nosb        765        656        0          0
## z22sb_vs_z22nosb        582        442        0          0
## z23sb_vs_sb              20         10        0          0
## z22sb_vs_sb              26         18        0          0
## z23sb_vs_uninf          488        332        0          0
## z22sb_vs_uninf          406        488        0          0
## sb_vs_uninf             160        209        0          0

7 Compare (no)Sb z2.3/z2.2 treatments among macrophages

In the following block, I will jump back to the macrophage samples and look for genes which are shared/unique when comparing z2.3/z2.2 for the drug treated samples and the untreated samples.

upset_plots_hs_macr <- upsetr_sig(
  hs_macr_sig, both = TRUE,
  contrasts = c("z23sb_vs_z22sb", "z23nosb_vs_z22nosb"))
upset_plots_hs_macr[["both"]]
## [1] TRUE
groups <- upset_plots_hs_macr[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[2]]] %>%
  gsub(pattern = "^gene:", replacement = "")
length(shared_genes)
## [1] 387
shared_gp <- simple_gprofiler(shared_genes)
shared_gp[["pvalue_plots"]][["MF"]]
## NULL
shared_gp[["pvalue_plots"]][["BP"]]
## NULL
shared_gp[["pvalue_plots"]][["REAC"]]

drug_genes <- attr(groups, "elements")[groups[["z23sb_vs_z22sb"]]] %>%
  gsub(pattern = "^gene:", replacement = "")
drugonly_gp <- simple_gprofiler(drug_genes)
drugonly_gp[["pvalue_plots"]][["BP"]]
## NULL

I want to try something, directly include the u937 data in this. Thus, in the following block I will repeat but compare all samples and the U937 using the same logic.

both_sig <- hs_macr_sig
names(both_sig[["deseq"]][["ups"]]) <- paste0("macr_", names(both_sig[["deseq"]][["ups"]]))
names(both_sig[["deseq"]][["downs"]]) <- paste0("macr_", names(both_sig[["deseq"]][["downs"]]))
u937_deseq <- u937_sig[["deseq"]]
names(u937_deseq[["ups"]]) <- paste0("u937_", names(u937_deseq[["ups"]]))
names(u937_deseq[["downs"]]) <- paste0("u937_", names(u937_deseq[["downs"]]))
both_sig[["deseq"]][["ups"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["ups"]])
both_sig[["deseq"]][["downs"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["downs"]])
summary(both_sig[["deseq"]][["ups"]])
##                         Length Class      Mode
## macr_z23nosb_vs_uninf   73     DFrame     S4  
## macr_z22nosb_vs_uninf   73     DFrame     S4  
## macr_z23nosb_vs_z22nosb 73     DFrame     S4  
## macr_z23sb_vs_z22sb     73     DFrame     S4  
## macr_z23sb_vs_z23nosb   73     DFrame     S4  
## macr_z22sb_vs_z22nosb   73     DFrame     S4  
## macr_z23sb_vs_sb        73     DFrame     S4  
## macr_z22sb_vs_sb        73     DFrame     S4  
## macr_z23sb_vs_uninf     73     DFrame     S4  
## macr_z22sb_vs_uninf     73     DFrame     S4  
## macr_sb_vs_uninf        73     DFrame     S4  
## macr_extra_z2322         0     data.frame list
## macr_extra_drugnodrug    0     data.frame list
## u937_z23nosb_vs_uninf   64     DFrame     S4  
## u937_z22nosb_vs_uninf   64     DFrame     S4  
## u937_z23nosb_vs_z22nosb 64     DFrame     S4  
## u937_z23sb_vs_z22sb     64     DFrame     S4  
## u937_z23sb_vs_z23nosb   64     DFrame     S4  
## u937_z22sb_vs_z22nosb   64     DFrame     S4  
## u937_z23sb_vs_sb        64     DFrame     S4  
## u937_z22sb_vs_sb        64     DFrame     S4  
## u937_z23sb_vs_uninf     64     DFrame     S4  
## u937_z22sb_vs_uninf     64     DFrame     S4  
## u937_sb_vs_uninf        64     DFrame     S4
upset_plots_both <- upsetr_sig(
  both_sig, both = TRUE,
  contrasts = c("macr_z23sb_vs_z22sb", "macr_z23nosb_vs_z22nosb",
                "u937_z23sb_vs_z22sb", "u937_z23nosb_vs_z22nosb"))
upset_plots_both[["both"]]
## [1] TRUE

7.1 Compare DE results from macrophages and U937 samples

Looking a bit more closely at these, I think the u937 data is too sparse to effectively compare.

macr_u937_comparison <- compare_de_results(hs_macr_table, u937_table)

macr_u937_comparison[["lfc_heat"]]

macr_u937_venns <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                 contrasts = "z23sb_vs_z23nosb")

macr_u937_venns[["up_plot"]]

macr_u937_venns[["down_plot"]]

macr_u937_venns_v2 <- compare_significant_contrasts(
  hs_macr_sig, second_sig_tables = u937_sig, contrasts = "z22sb_vs_z22nosb")

macr_u937_venns_v2[["up_plot"]]

macr_u937_venns_v2[["down_plot"]]

macr_u937_venns_v3 <- compare_significant_contrasts(
  hs_macr_sig, second_sig_tables = u937_sig, contrasts = "sb_vs_uninf")

macr_u937_venns_v3[["up_plot"]]

macr_u937_venns_v3[["down_plot"]]

7.2 Compare macrophage/u937 with respect to z2.3/z2.2

comparison_df <- merge(hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
                       u937_table[["data"]][["z23sb_vs_z22sb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot[["scatter"]]

comparison_df <- merge(hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
                       u937_table[["data"]][["z23nosb_vs_z22nosb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot[["scatter"]]

8 Add donor to the contrasts, no sva

In the following block, I will change the sample condition to include the donor.

no_power_fact <- paste0(colData(hs_macr)[["donor"]], "_",
                        colData(hs_macr)[["condition"]])
table(colData(hs_macr)[["donor"]])
## 
## d01 d02 d09 d81 
##  13  14  13  14
table(no_power_fact)
## no_power_fact
##    d01_inf_sb_z22    d01_inf_sb_z23       d01_inf_z22       d01_inf_z23 
##                 3                 3                 2                 3 
##    d01_uninf_none d01_uninf_sb_none    d02_inf_sb_z22    d02_inf_sb_z23 
##                 1                 1                 3                 3 
##       d02_inf_z22       d02_inf_z23    d02_uninf_none d02_uninf_sb_none 
##                 3                 3                 1                 1 
##    d09_inf_sb_z22    d09_inf_sb_z23       d09_inf_z22       d09_inf_z23 
##                 3                 2                 3                 3 
##    d09_uninf_none d09_uninf_sb_none    d81_inf_sb_z22    d81_inf_sb_z23 
##                 1                 1                 3                 3 
##       d81_inf_z22       d81_inf_z23    d81_uninf_none d81_uninf_sb_none 
##                 3                 3                 1                 1
hs_nopower <- set_conditions(hs_macr, fact = no_power_fact)
## The numbers of samples by condition are:
## 
##    d01_inf_sb_z22    d01_inf_sb_z23       d01_inf_z22       d01_inf_z23 
##                 3                 3                 2                 3 
##    d01_uninf_none d01_uninf_sb_none    d02_inf_sb_z22    d02_inf_sb_z23 
##                 1                 1                 3                 3 
##       d02_inf_z22       d02_inf_z23    d02_uninf_none d02_uninf_sb_none 
##                 3                 3                 1                 1 
##    d09_inf_sb_z22    d09_inf_sb_z23       d09_inf_z22       d09_inf_z23 
##                 3                 2                 3                 3 
##    d09_uninf_none d09_uninf_sb_none    d81_inf_sb_z22    d81_inf_sb_z23 
##                 1                 1                 3                 3 
##       d81_inf_z22       d81_inf_z23    d81_uninf_none d81_uninf_sb_none 
##                 3                 3                 1                 1
hs_nopower <- subset_se(hs_nopower, subset = "macrophagezymodeme!='none'")
hs_nopower_nosva_de <- all_pairwise(hs_nopower, model_svs = FALSE, filter = TRUE)
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3 
## z2.2 z2.3 
##   23   23
## Warning: attributes are not identical across measure variables; they will be
## dropped
## Running normalize_se.
## Removing 9761 low-count genes (11720 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 32467 entries to zero.
## converting counts to integer mode
## Error in checkFullRank(modelMatrix) : 
##   the model matrix is not full rank, so the model cannot be fit as specified.
##   One or more variables or interaction terms in the design formula are linear
##   combinations of the others and must be removed.
## 
##   Please read the vignette section 'Model matrix not full rank':
## 
##   vignette('DESeq2')
## Coefficients not estimable: batchz23
## Warning: Partial NA coefficients for 11720 probe(s)
## Error in variancePartition::dream(exprObj = voom_result, formula = model_fstring,  : 
##   Design matrix is singular, covariates are very correlated
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3 
## Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
##   Design matrix not of full rank.  The following coefficients not estimable:
##  batchz23
## Warning in edger_pairwise(...): estimateGLMCommonDisp() failed.  Trying again
## with estimateDisp().
## Warning in edger_pairwise(...): There was a failure when doing the estimations.
## There was a failure when doing the estimations, using estimateDisp().
## Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  : 
##   Design matrix not of full rank.  The following coefficients not estimable:
##  batchz23
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3 
## Coefficients not estimable: batchz23
## Warning: Partial NA coefficients for 11720 probe(s)
## Coefficients not estimable: batchz23
## Warning: Partial NA coefficients for 11720 probe(s)
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_nosva_table <- combine_de_tables(
  hs_nopower_nosva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/de_tables/hs_nopower_table-v{ver}.xlsx"))
## The keepers has no elements in the coefficients.
## Here are the keepers: d01infz23, d01infz22, d01infsbz23, d01infsbz22, d02infz23, d02infz22, d02infsbz23, d02infsbz22, d09infz23, d09infz22, d09infsbz23, d09infsbz22, d81infz23, d81infz22, d81infsbz23, d81infsbz22
## Here are the coefficients: d81_inf_z23, d81_inf_z22, d81_inf_z23, d81_inf_sb_z23, d81_inf_z22, d81_inf_sb_z23, d81_inf_z23, d81_inf_sb_z22, d81_inf_z22, d81_inf_sb_z22, d81_inf_sb_z23, d81_inf_sb_z22, d81_inf_z23, d09_inf_z23, d81_inf_z22, d09_inf_z23, d81_inf_sb_z23, d09_inf_z23, d81_inf_sb_z22, d09_inf_z23, d81_inf_z23, d09_inf_z22, d81_inf_z22, d09_inf_z22, d81_inf_sb_z23, d09_inf_z22, d81_inf_sb_z22, d09_inf_z22, d09_inf_z23, d09_inf_z22, d81_inf_z23, d09_inf_sb_z23, d81_inf_z22, d09_inf_sb_z23, d81_inf_sb_z23, d09_inf_sb_z23, d81_inf_sb_z22, d09_inf_sb_z23, d09_inf_z23, d09_inf_sb_z23, d09_inf_z22, d09_inf_sb_z23, d81_inf_z23, d09_inf_sb_z22, d81_inf_z22, d09_inf_sb_z22, d81_inf_sb_z23, d09_inf_sb_z22, d81_inf_sb_z22, d09_inf_sb_z22, d09_inf_z23, d09_inf_sb_z22, d09_inf_z22, d09_inf_sb_z22, d09_inf_sb_z23, d09_inf_sb_z22, d81_inf_z23, d02_inf_z23, d81_inf_z22, d02_inf_z23, d81_inf_sb_z23, d02_inf_z23, d81_inf_sb_z22, d02_inf_z23, d09_inf_z23, d02_inf_z23, d09_inf_z22, d02_inf_z23, d09_inf_sb_z23, d02_inf_z23, d09_inf_sb_z22, d02_inf_z23, d81_inf_z23, d02_inf_z22, d81_inf_z22, d02_inf_z22, d81_inf_sb_z23, d02_inf_z22, d81_inf_sb_z22, d02_inf_z22, d09_inf_z23, d02_inf_z22, d09_inf_z22, d02_inf_z22, d09_inf_sb_z23, d02_inf_z22, d09_inf_sb_z22, d02_inf_z22, d02_inf_z23, d02_inf_z22, d81_inf_z23, d02_inf_sb_z23, d81_inf_z22, d02_inf_sb_z23, d81_inf_sb_z23, d02_inf_sb_z23, d81_inf_sb_z22, d02_inf_sb_z23, d09_inf_z23, d02_inf_sb_z23, d09_inf_z22, d02_inf_sb_z23, d09_inf_sb_z23, d02_inf_sb_z23, d09_inf_sb_z22, d02_inf_sb_z23, d02_inf_z23, d02_inf_sb_z23, d02_inf_z22, d02_inf_sb_z23, d81_inf_z23, d02_inf_sb_z22, d81_inf_z22, d02_inf_sb_z22, d81_inf_sb_z23, d02_inf_sb_z22, d81_inf_sb_z22, d02_inf_sb_z22, d09_inf_z23, d02_inf_sb_z22, d09_inf_z22, d02_inf_sb_z22, d09_inf_sb_z23, d02_inf_sb_z22, d09_inf_sb_z22, d02_inf_sb_z22, d02_inf_z23, d02_inf_sb_z22, d02_inf_z22, d02_inf_sb_z22, d02_inf_sb_z23, d02_inf_sb_z22, d81_inf_z23, d01_inf_z23, d81_inf_z22, d01_inf_z23, d81_inf_sb_z23, d01_inf_z23, d81_inf_sb_z22, d01_inf_z23, d09_inf_z23, d01_inf_z23, d09_inf_z22, d01_inf_z23, d09_inf_sb_z23, d01_inf_z23, d09_inf_sb_z22, d01_inf_z23, d02_inf_z23, d01_inf_z23, d02_inf_z22, d01_inf_z23, d02_inf_sb_z23, d01_inf_z23, d02_inf_sb_z22, d01_inf_z23, d81_inf_z23, d01_inf_z22, d81_inf_z22, d01_inf_z22, d81_inf_sb_z23, d01_inf_z22, d81_inf_sb_z22, d01_inf_z22, d09_inf_z23, d01_inf_z22, d09_inf_z22, d01_inf_z22, d09_inf_sb_z23, d01_inf_z22, d09_inf_sb_z22, d01_inf_z22, d02_inf_z23, d01_inf_z22, d02_inf_z22, d01_inf_z22, d02_inf_sb_z23, d01_inf_z22, d02_inf_sb_z22, d01_inf_z22, d01_inf_z23, d01_inf_z22, d81_inf_z23, d01_inf_sb_z23, d81_inf_z22, d01_inf_sb_z23, d81_inf_sb_z23, d01_inf_sb_z23, d81_inf_sb_z22, d01_inf_sb_z23, d09_inf_z23, d01_inf_sb_z23, d09_inf_z22, d01_inf_sb_z23, d09_inf_sb_z23, d01_inf_sb_z23, d09_inf_sb_z22, d01_inf_sb_z23, d02_inf_z23, d01_inf_sb_z23, d02_inf_z22, d01_inf_sb_z23, d02_inf_sb_z23, d01_inf_sb_z23, d02_inf_sb_z22, d01_inf_sb_z23, d01_inf_z23, d01_inf_sb_z23, d01_inf_z22, d01_inf_sb_z23, d81_inf_z23, d01_inf_sb_z22, d81_inf_z22, d01_inf_sb_z22, d81_inf_sb_z23, d01_inf_sb_z22, d81_inf_sb_z22, d01_inf_sb_z22, d09_inf_z23, d01_inf_sb_z22, d09_inf_z22, d01_inf_sb_z22, d09_inf_sb_z23, d01_inf_sb_z22, d09_inf_sb_z22, d01_inf_sb_z22, d02_inf_z23, d01_inf_sb_z22, d02_inf_z22, d01_inf_sb_z22, d02_inf_sb_z23, d01_inf_sb_z22, d02_inf_sb_z22, d01_inf_sb_z22, d01_inf_z23, d01_inf_sb_z22, d01_inf_z22, d01_inf_sb_z22, d01_inf_sb_z23, d01_inf_sb_z22
## Error in extract_keepers(extracted, keepers, table_names, all_coefficients, : Unable to find the set of contrasts to keep, fix this and try again.
## extra_contrasts = extra)
hs_nopower_nosva_sig <- extract_significant_genes(
  hs_nopower_nosva_table,
  excel = glue("analyses/macrophage_de/sig_tables/hs_nopower_nosva_sig-v{ver}.xlsx"))
## Error: object 'hs_nopower_nosva_table' not found
d01d02_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d02_zymo"]],
                                by = "row.names")
## Error: object 'hs_nopower_nosva_table' not found
d0102_zymo_nosva_plot <- plot_linear_scatter(d01d02_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd01d02_zymo_nosva_comp' not found
d0102_zymo_nosva_plot[["scatter"]]
## Error: object 'd0102_zymo_nosva_plot' not found
d0102_zymo_nosva_plot[["correlation"]]
## Error: object 'd0102_zymo_nosva_plot' not found
d0102_zymo_nosva_plot[["lm_rsq"]]
## Error: object 'd0102_zymo_nosva_plot' not found
d09d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d09_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by = "row.names")
## Error: object 'hs_nopower_nosva_table' not found
d0981_zymo_nosva_plot <- plot_linear_scatter(d09d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd09d81_zymo_nosva_comp' not found
d0981_zymo_nosva_plot[["scatter"]]
## Error: object 'd0981_zymo_nosva_plot' not found
d0981_zymo_nosva_plot[["correlation"]]
## Error: object 'd0981_zymo_nosva_plot' not found
d0981_zymo_nosva_plot[["lm_rsq"]]
## Error: object 'd0981_zymo_nosva_plot' not found
d01d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by = "row.names")
## Error: object 'hs_nopower_nosva_table' not found
d0181_zymo_nosva_plot <- plot_linear_scatter(d01d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd01d81_zymo_nosva_comp' not found
d0181_zymo_nosva_plot[["scatter"]]
## Error: object 'd0181_zymo_nosva_plot' not found
d0181_zymo_nosva_plot[["correlation"]]
## Error: object 'd0181_zymo_nosva_plot' not found
d0181_zymo_nosva_plot[["lm_rsq"]]
## Error: object 'd0181_zymo_nosva_plot' not found
upset_plots_nosva <- upsetr_sig(hs_nopower_nosva_sig, both = TRUE,
                                contrasts = c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
## Error: object 'hs_nopower_nosva_sig' not found
upset_plots_nosva[["up"]]
## Error: object 'upset_plots_nosva' not found
upset_plots_nosva[["down"]]
## Error: object 'upset_plots_nosva' not found
upset_plots_nosva[["both"]]
## Error: object 'upset_plots_nosva' not found
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_nosva[["both_groups"]]
## Error: object 'upset_plots_nosva' not found
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
## Error in groups[[7]]: subscript out of bounds
shared_gp <- simple_gprofiler(shared_genes)
shared_gp[["pvalue_plots"]][["MF"]]
## NULL
shared_gp[["pvalue_plots"]][["BP"]]
## NULL
shared_gp[["pvalue_plots"]][["REAC"]]

shared_gp[["pvalue_plots"]][["WP"]]

9 Add donor to the contrasts, sva

Same deal as the last block, but this time add SVA into the mix!

hs_nopower_sva_de <- all_pairwise(hs_nopower, model_svs = "svaseq",
                                  model_fstring = "~ 0 + condition", filter = TRUE)
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3
## Running normalize_se.
## Removing 9761 low-count genes (11720 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 32467 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3
## conditions
## d01_inf_sb_z22 d01_inf_sb_z23    d01_inf_z22    d01_inf_z23 d02_inf_sb_z22 
##              3              3              2              3              3 
## d02_inf_sb_z23    d02_inf_z22    d02_inf_z23 d09_inf_sb_z22 d09_inf_sb_z23 
##              3              3              3              3              2 
##    d09_inf_z22    d09_inf_z23 d81_inf_sb_z22 d81_inf_sb_z23    d81_inf_z22 
##              3              3              3              3              3 
##    d81_inf_z23 
##              3

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_sva_table <- combine_de_tables(
  hs_nopower_sva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/de_tables/hs_nopower_table-v{ver}.xlsx"))
## The keepers has no elements in the coefficients.
## Here are the keepers: d01infz23, d01infz22, d01infsbz23, d01infsbz22, d02infz23, d02infz22, d02infsbz23, d02infsbz22, d09infz23, d09infz22, d09infsbz23, d09infsbz22, d81infz23, d81infz22, d81infsbz23, d81infsbz22
## Here are the coefficients: d81_inf_z23, d81_inf_z22, d81_inf_z23, d81_inf_sb_z23, d81_inf_z22, d81_inf_sb_z23, d81_inf_z23, d81_inf_sb_z22, d81_inf_z22, d81_inf_sb_z22, d81_inf_sb_z23, d81_inf_sb_z22, d81_inf_z23, d09_inf_z23, d81_inf_z22, d09_inf_z23, d81_inf_sb_z23, d09_inf_z23, d81_inf_sb_z22, d09_inf_z23, d81_inf_z23, d09_inf_z22, d81_inf_z22, d09_inf_z22, d81_inf_sb_z23, d09_inf_z22, d81_inf_sb_z22, d09_inf_z22, d09_inf_z23, d09_inf_z22, d81_inf_z23, d09_inf_sb_z23, d81_inf_z22, d09_inf_sb_z23, d81_inf_sb_z23, d09_inf_sb_z23, d81_inf_sb_z22, d09_inf_sb_z23, d09_inf_z23, d09_inf_sb_z23, d09_inf_z22, d09_inf_sb_z23, d81_inf_z23, d09_inf_sb_z22, d81_inf_z22, d09_inf_sb_z22, d81_inf_sb_z23, d09_inf_sb_z22, d81_inf_sb_z22, d09_inf_sb_z22, d09_inf_z23, d09_inf_sb_z22, d09_inf_z22, d09_inf_sb_z22, d09_inf_sb_z23, d09_inf_sb_z22, d81_inf_z23, d02_inf_z23, d81_inf_z22, d02_inf_z23, d81_inf_sb_z23, d02_inf_z23, d81_inf_sb_z22, d02_inf_z23, d09_inf_z23, d02_inf_z23, d09_inf_z22, d02_inf_z23, d09_inf_sb_z23, d02_inf_z23, d09_inf_sb_z22, d02_inf_z23, d81_inf_z23, d02_inf_z22, d81_inf_z22, d02_inf_z22, d81_inf_sb_z23, d02_inf_z22, d81_inf_sb_z22, d02_inf_z22, d09_inf_z23, d02_inf_z22, d09_inf_z22, d02_inf_z22, d09_inf_sb_z23, d02_inf_z22, d09_inf_sb_z22, d02_inf_z22, d02_inf_z23, d02_inf_z22, d81_inf_z23, d02_inf_sb_z23, d81_inf_z22, d02_inf_sb_z23, d81_inf_sb_z23, d02_inf_sb_z23, d81_inf_sb_z22, d02_inf_sb_z23, d09_inf_z23, d02_inf_sb_z23, d09_inf_z22, d02_inf_sb_z23, d09_inf_sb_z23, d02_inf_sb_z23, d09_inf_sb_z22, d02_inf_sb_z23, d02_inf_z23, d02_inf_sb_z23, d02_inf_z22, d02_inf_sb_z23, d81_inf_z23, d02_inf_sb_z22, d81_inf_z22, d02_inf_sb_z22, d81_inf_sb_z23, d02_inf_sb_z22, d81_inf_sb_z22, d02_inf_sb_z22, d09_inf_z23, d02_inf_sb_z22, d09_inf_z22, d02_inf_sb_z22, d09_inf_sb_z23, d02_inf_sb_z22, d09_inf_sb_z22, d02_inf_sb_z22, d02_inf_z23, d02_inf_sb_z22, d02_inf_z22, d02_inf_sb_z22, d02_inf_sb_z23, d02_inf_sb_z22, d81_inf_z23, d01_inf_z23, d81_inf_z22, d01_inf_z23, d81_inf_sb_z23, d01_inf_z23, d81_inf_sb_z22, d01_inf_z23, d09_inf_z23, d01_inf_z23, d09_inf_z22, d01_inf_z23, d09_inf_sb_z23, d01_inf_z23, d09_inf_sb_z22, d01_inf_z23, d02_inf_z23, d01_inf_z23, d02_inf_z22, d01_inf_z23, d02_inf_sb_z23, d01_inf_z23, d02_inf_sb_z22, d01_inf_z23, d81_inf_z23, d01_inf_z22, d81_inf_z22, d01_inf_z22, d81_inf_sb_z23, d01_inf_z22, d81_inf_sb_z22, d01_inf_z22, d09_inf_z23, d01_inf_z22, d09_inf_z22, d01_inf_z22, d09_inf_sb_z23, d01_inf_z22, d09_inf_sb_z22, d01_inf_z22, d02_inf_z23, d01_inf_z22, d02_inf_z22, d01_inf_z22, d02_inf_sb_z23, d01_inf_z22, d02_inf_sb_z22, d01_inf_z22, d01_inf_z23, d01_inf_z22, d81_inf_z23, d01_inf_sb_z23, d81_inf_z22, d01_inf_sb_z23, d81_inf_sb_z23, d01_inf_sb_z23, d81_inf_sb_z22, d01_inf_sb_z23, d09_inf_z23, d01_inf_sb_z23, d09_inf_z22, d01_inf_sb_z23, d09_inf_sb_z23, d01_inf_sb_z23, d09_inf_sb_z22, d01_inf_sb_z23, d02_inf_z23, d01_inf_sb_z23, d02_inf_z22, d01_inf_sb_z23, d02_inf_sb_z23, d01_inf_sb_z23, d02_inf_sb_z22, d01_inf_sb_z23, d01_inf_z23, d01_inf_sb_z23, d01_inf_z22, d01_inf_sb_z23, d81_inf_z23, d01_inf_sb_z22, d81_inf_z22, d01_inf_sb_z22, d81_inf_sb_z23, d01_inf_sb_z22, d81_inf_sb_z22, d01_inf_sb_z22, d09_inf_z23, d01_inf_sb_z22, d09_inf_z22, d01_inf_sb_z22, d09_inf_sb_z23, d01_inf_sb_z22, d09_inf_sb_z22, d01_inf_sb_z22, d02_inf_z23, d01_inf_sb_z22, d02_inf_z22, d01_inf_sb_z22, d02_inf_sb_z23, d01_inf_sb_z22, d02_inf_sb_z22, d01_inf_sb_z22, d01_inf_z23, d01_inf_sb_z22, d01_inf_z22, d01_inf_sb_z22, d01_inf_sb_z23, d01_inf_sb_z22
## Error in extract_keepers(extracted, keepers, table_names, all_coefficients, : Unable to find the set of contrasts to keep, fix this and try again.
## extra_contrasts = extra)
hs_nopower_sva_sig <- extract_significant_genes(
  hs_nopower_sva_table,
  excel = glue("analyses/macrophage_de/sig_tables/hs_nopower_sva_sig-v{ver}.xlsx"))
## Error: object 'hs_nopower_sva_table' not found
d01d02_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d02_zymo"]],
                              by = "row.names")
## Error: object 'hs_nopower_sva_table' not found
d0102_zymo_sva_plot <- plot_linear_scatter(d01d02_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd01d02_zymo_sva_comp' not found
d0102_zymo_sva_plot[["scatter"]]
## Error: object 'd0102_zymo_sva_plot' not found
d0102_zymo_sva_plot[["correlation"]]
## Error: object 'd0102_zymo_sva_plot' not found
d0102_zymo_sva_plot[["lm_rsq"]]
## Error: object 'd0102_zymo_sva_plot' not found
d09d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d09_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by = "row.names")
## Error: object 'hs_nopower_sva_table' not found
d0981_zymo_sva_plot <- plot_linear_scatter(d09d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd09d81_zymo_sva_comp' not found
d0981_zymo_sva_plot[["scatter"]]
## Error: object 'd0981_zymo_sva_plot' not found
d0981_zymo_sva_plot[["correlation"]]
## Error: object 'd0981_zymo_sva_plot' not found
d0981_zymo_sva_plot[["lm_rsq"]]
## Error: object 'd0981_zymo_sva_plot' not found
d01d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by = "row.names")
## Error: object 'hs_nopower_sva_table' not found
d0181_zymo_sva_plot <- plot_linear_scatter(d01d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'd01d81_zymo_sva_comp' not found
d0181_zymo_sva_plot[["scatter"]]
## Error: object 'd0181_zymo_sva_plot' not found
d0181_zymo_sva_plot[["correlation"]]
## Error: object 'd0181_zymo_sva_plot' not found
d0181_zymo_sva_plot[["lm_rsq"]]
## Error: object 'd0181_zymo_sva_plot' not found
upset_plots_sva <- upsetr_sig(hs_nopower_sva_sig, both = TRUE,
                              contrasts = c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
## Error: object 'hs_nopower_sva_sig' not found
upset_plots_sva[["up"]]
## Error: object 'upset_plots_sva' not found
upset_plots_sva[["down"]]
## Error: object 'upset_plots_sva' not found
upset_plots_sva[["both"]]
## Error: object 'upset_plots_sva' not found
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_sva[["both_groups"]]
## Error: object 'upset_plots_sva' not found
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
## Error in groups[[7]]: subscript out of bounds
shared_gp <- simple_gprofiler(shared_genes)
shared_gp[["pvalue_plots"]][["MF"]]
## NULL
shared_gp[["pvalue_plots"]][["BP"]]
## NULL
shared_gp[["pvalue_plots"]][["REAC"]]

shared_gp[["pvalue_plots"]][["WP"]]

10 Donor comparison

Now compare the donors to each other directly.

hs_donors <- set_conditions(hs_macr, fact = "donor")
## The numbers of samples by condition are:
## 
## d01 d02 d09 d81 
##  13  14  13  14
donor_de <- all_pairwise(hs_donors, model_svs = "svaseq",
                         model_fstring = "~ 0 + condition", filter = TRUE)
## d01 d02 d09 d81 
##  13  14  13  14
## Running normalize_se.
## Removing 9725 low-count genes (11756 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 40036 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## d01 d02 d09 d81 
##  13  14  13  14
## conditions
## d01 d02 d09 d81 
##  13  14  13  14
## conditions
## d01 d02 d09 d81 
##  13  14  13  14

donor_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
donor_table <- combine_de_tables(
  donor_de,
  excel = glue("analyses/macrophage_de/de_tables/donor_tables-v{ver}.xlsx"))
donor_table
## A set of combined differential expression results.
##        table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 d02_vs_d01         310           389         318           381         350
## 2 d09_vs_d01         532           457         533           451         513
## 3 d81_vs_d01         668           753         669           744         663
## 4 d09_vs_d02         414           267         412           272         373
## 5 d81_vs_d02         572           650         561           658         532
## 6 d81_vs_d09         221           421         212           423         218
##   limma_sigdown
## 1           359
## 2           467
## 3           753
## 4           309
## 5           672
## 6           416
## Plot describing unique/shared genes in a differential expression table.

donor_sig <- extract_significant_genes(
  donor_table,
  excel = glue("analyses/macrophage_de/sig_tables/donor_sig-v{ver}.xlsx"))
donor_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##            limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## d02_vs_d01      350        359      318        381      310        389      242
## d09_vs_d01      513        467      533        451      532        457      485
## d81_vs_d01      663        753      669        744      668        753      576
## d09_vs_d02      373        309      412        272      414        267      211
## d81_vs_d02      532        672      561        658      572        650      299
## d81_vs_d09      218        416      212        423      221        421       86
##            ebseq_down basic_up basic_down
## d02_vs_d01        136      169        185
## d09_vs_d01        190      334        257
## d81_vs_d01        385      435        446
## d09_vs_d02        115      165        105
## d81_vs_d02        378      235        291
## d81_vs_d09        200       73        150

11 Primary query contrasts

The final contrast in this list is interesting because it depends on the extra contrasts applied to the all_pairwise() above. In my way of thinking, the primary comparisons to consider are either cross-drug or cross-strain, but not both. However I think in at least a few instances Olga is interested in strain+drug / uninfected+nodrug.

11.1 Write contrast results

Now let us write out the xlsx file containing the above contrasts. The file with the suffix _table-version will therefore contain all genes and the file with the suffix _sig-version will contain only those deemed significant via our default criteria of DESeq2 |logFC| >= 1.0 and adjusted p-value <= 0.05.

11.2 Over representation searches

I decided to make one initially small, but I think quickly big change to the organization of this document: I am moving the GSEA searches up to immediately after the DE. I will then move the plots of the gprofiler results to immediately after the various volcano plots so that it is easier to interpret them.

I am reasonably certain this is the place to check that z23no drug / uninfected has the expected set of genes and that there is or is not a reactome result.

Reproducibility note: Given that this is entirely dependent on an online service, I must assume that the results will change over time; in addition their web servers undergo maintenance regularly, which may result in systematic failure of these analyses. I like gProfiler quite a lot for this type of stuff, but this is an important caveat.

Conversely, the clusterProfiler results later depend on a consistent orgdb annotation set (or reactome or whatever); those versions are fixed by the container installation.

all_gp <- all_gprofiler(hs_macr_sig, enrich_id_column = "hgnc_symbol")
for (g in seq_len(length(all_gp))) {
  name <- names(all_gp)[g]
  datum <- all_gp[[name]]
  filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler-v{ver}.xlsx")
  written <- sm(write_gprofiler_data(datum, excel = filename))
}
lesssig_all_gp <- all_gprofiler(hs_macr_lesssig, enrich_id_column = "hgnc_symbol")
for (g in seq_len(length(lesssig_all_gp))) {
  name <- names(lesssig_all_gp)[g]
  datum <- lesssig_all_gp[[name]]
  filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler_lesssig-v{ver}.xlsx")
  written <- sm(write_gprofiler_data(datum, excel = filename))
}

11.3 Explicit GSEA search vis clusterProfiler

all_cp <- all_cprofiler(hs_macr_sig, hs_macr_table)
## Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
## ReactomePA v1.52.0 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
## 
## Please cite:
## 
## Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
## reactome pathway analysis and visualization. Molecular BioSystems.
## 2016, 12(2):477-479
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(up, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this DOSE organism, leaving it as human.
## Warning in simple_clusterprofiler(down, table, orgdb = orgdb, orgdb_from =
## orgdb_from, : I do not know this mesh organism, leaving it as human.

11.4 Specific desires in Reactome results

In previous analyses (I think by Dr. Colmenares), a specific Tryptophan biosynthesis pathway was observed. Partciularly in the 2.3/uninfected comparison. I think my gprofiler analysis is too stringent and therefore not observing this. Olga asked if I could look at that and see if there are trivial settings I can change to highlight this pathway. The two most likely things I can change are the stringencies of the DE analysis and/or gProfiler.

test_z23_uninf_up <- hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]]
nrow(test_z23_uninf_up)
## [1] 478
test_z23_uninf_down <- hs_macr_sig[["deseq"]][["downs"]][["z23nosb_vs_uninf"]]
nrow(test_z23_uninf_down)
## [1] 265
test_gp_up <- simple_gprofiler(test_z23_uninf_up, enrich_id_column = "hgnc_symbol",
                               threshold = 1.0)
test_gp_up
written_up <- write_gprofiler_data(test_gp_up, excel = "excel/z23_uninf_gp_up_all.xlsx")

test_gp_down <- simple_gprofiler(test_z23_uninf_down, enrich_id_column = "hgnc_symbol",
                                 threshold = 1.0)
test_gp_down
written_down <- write_gprofiler_data(test_gp_down, excel = "excel/z23_uninf_gp_down_all.xlsx")

11.5 Plot contrasts of interest

One suggestion I received recently was to set the axes for these volcano plots to be static rather than let ggplot choose its own. I am assuming this is only relevant for pairs of contrasts, but that might not be true.

11.6 Individual zymodemes vs. uninfected

The following blocks will be a lot of repetition. In each case I am yanking out the volcano plot for a specific contrast and showing the original followed by a version with different colors/labelling.

11.6.1 Infected with z2.3 no Antimonial vs. Uninfected

plot_colors <- get_se_colors(hs_macr_table[["input"]][["input"]])
## Error in get_se_colors(hs_macr_table[["input"]][["input"]]): could not find function "get_se_colors"
## The original plot from my xlsx file
hs_macr_table[["plots"]][["z23nosb_vs_uninf"]][["deseq_vol_plots"]]

z23nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_low = plot_colors[["uninfnone"]], color_high = plot_colors[["infz23"]])
## Error: object 'plot_colors' not found
labeled <- z23nosb_vs_uninf_volcano[["plot"]] +
  scale_x_continuous(limits = c(-6, 21), breaks = c(-6, -4, -2, 0, 2, 4, 6, 8, 10, 20)) +
  ggbreak::scale_x_break(c(10, 19), scales = 0.2, space = 0.02)
## Error: object 'z23nosb_vs_uninf_volcano' not found
pp(file = "figures/fig2a_labeled_with_break.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found
plotly::ggplotly(z23nosb_vs_uninf_volcano[["plot"]])
## Error: object 'z23nosb_vs_uninf_volcano' not found

The following provides some of the over-representation plots from gProfiler2.

all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["KEGG"]]

## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
##all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["interactive_plots"]][["WP"]]
message("Olga received a query about the following result, I think it is null.")
## Olga received a query about the following result, I think it is null.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
message("Is the previous plot null?")
## Is the previous plot null?
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

We have some other categorical enrichment plots available via enrichplot, let us try a few out for contrasts of interest and see if any of them prove helpful.

First, as a reminder, here are the contrasts which are available to examine, in each case there is an _up and _down enrichment object in the data. Thus in the following list I am going to arbitrarily print out some invocations which extract putatively interesting bits of data.

  • z23nosb_vs_uninf: all_gp[[“z23nosb_vs_uninf_up”]][[“BP_enrich”]]
  • z22nosb_vs_uninf.
  • z23nosb_vs_z22nosb.
  • z23sb_vs_z22sb.
  • z23sb_vs_z23nosb.
  • z22sb_vs_z22nosb.
  • z23sb_vs_sb.
  • z22sb_vs_sb.
  • z23sb_vs_uninf.
  • z22sb_vs_uninf.
  • sb_vs_uninf.
  • extra_z2322.
  • extra_drugnodrug.
z23nosb_uninf_up_go <- all_gp[["z23nosb_vs_uninf_up"]][["BP_enrich"]]
z23nosb_uninf_up_go_pair <- pairwise_termsim(z23nosb_uninf_up_go)
dotplot(z23nosb_uninf_up_go)

emapplot(z23nosb_uninf_up_go_pair)

##ssplot(z23nosb_uninf_up_go_pair)
treeplot(z23nosb_uninf_up_go_pair)

upsetplot(z23nosb_uninf_up_go)

cnetplot(z23nosb_uninf_up_go)
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

11.6.2 Repeat, but using a less strict set of ‘significant genes’

I am not entirely certain if the Reactome results Olga showed me included both up and down genes? I am going to assume for the moment that it was just up/down, but if that proves intractable I will go back to the manuscript and read more carefully (e.g. I just remembered where the picture came from!)

11.6.2.1 Add a little topgo

In the process of exploring the various parameters used with gProfiler2, I found myself thinking that it would be nice to have some topgo results to compare against. The following block is the result of that thought.

test_genes_up <- hs_macr_lesssig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]]
test_query_up <- simple_gprofiler(test_genes_up, threshold = 0.1)
test_query_up[["pvalue_plots"]][["REAC"]]

pdf(file = "images/test_query_biological_process_z23_vs_uninf_up.pdf", height = 12, width = 9)
test_query_up[["pvalue_plots"]][["BP"]]
## NULL
dev.off()
## png 
##   2
enrichplot::dotplot(test_query_up[["BP_enrich"]])

test_genes_down <- hs_macr_lesssig[["deseq"]][["downs"]][["z23nosb_vs_uninf"]]
test_query_down <- simple_gprofiler(test_genes_down)
test_query_down[["pvalue_plots"]][["REAC"]]
## NULL
## I keep getting all sorts of annoying biomart errors.
hs_go <- try(load_biomart_go(archive = FALSE, overwrite = TRUE))
## Using mart: ENSEMBL_MART_ENSEMBL from host: useast.ensembl.org.
## Successfully connected to the hsapiens_gene_ensembl database.
## Error in httr2::req_perform(req) : Failed to perform HTTP request.
## Caused by error in `curl::curl_fetch_memory()` at httr2/R/req-perform.R:201:5:
## ! Timeout was reached [useast.ensembl.org]:
## Operation timed out after 300002 milliseconds with 10486132 bytes received
## Unable to download annotation data.
if ("try-error" %in% class(hs_go)) {
  hs_go <- load_biomart_go(archive = TRUE, month = "04", year = "2020", overwrite = TRUE)
}
test_topgo_up <- simple_topgo(test_genes_up, go_db = hs_go[["go"]], parallel = FALSE)
## Error in go_db[, c("L1", "value")]: incorrect number of dimensions
written_topgo <- write_topgo_data(
  test_topgo_up,
  excel = glue("analyses/macrophage_de/ontology_topgo/topgo_z23_uninf_less_strict.xlsx"))
## Error: object 'test_topgo_up' not found

11.6.3 Infected with z2.2 no Antimonial vs. Uninfected

Here is where things will get most repetitive. In each instance I am creating a couple of volcano plots followed by printing some of the gProfiler2 results (when I get the itch).

The following should be a slightly improved version of our extant figure 2B.

## The original plot
hs_macr_table[["plots"]][["z22nosb_vs_uninf"]][["deseq_vol_plots"]]

z22nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_low = plot_colors[["uninfnone"]], color_high = plot_colors[["infz22"]])
## Error: object 'plot_colors' not found
labeled <- z22nosb_vs_uninf_volcano[["plot"]] +
  scale_x_continuous(limits = c(-2, 21), breaks = c(-2, 0, 2, 4, 6, 8, 10, 21, 22)) +
  ggbreak::scale_x_break(c(11, 20), scales = 0.2, space = 0.02)
## Error: object 'z22nosb_vs_uninf_volcano' not found
pp(file = "figures/fig2b_labeled_with_break.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found
plotly::ggplotly(z22nosb_vs_uninf_volcano[["plot"]])
## Error: object 'z22nosb_vs_uninf_volcano' not found

Add some pvalue barplots from gProfiler for this contrast.

all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.2 without drug vs. uninfected without drug, up.

all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]
## NULL
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

11.6.4 Infected with z2.3 treated vs. Uninfected treated

I do not think this plot is used at this time.

## The original plot
hs_macr_table[["plots"]][["z23sb_vs_sb"]][["deseq_vol_plots"]]

z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["uninfsbnone"]])
## Error: object 'plot_colors' not found
z23sb_vs_uninfsb_volcano[["plot"]]
## Error: object 'z23sb_vs_uninfsb_volcano' not found
plotly::ggplotly(z23sb_vs_uninfsb_volcano[["plot"]])
## Error: object 'z23sb_vs_uninfsb_volcano' not found

11.6.5 Infected with z2.3 untreated vs. z2.2 untreated

This is figure 2C at this time.

## The original plot
hs_macr_table[["plots"]][["z23nosb_vs_z22nosb"]][["deseq_vol_plots"]]

z23nosb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]], "z23nosb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_low = plot_colors[["infz23"]], color_high = plot_colors[["infz22"]])
## Error: object 'plot_colors' not found
labeled <- z23nosb_vs_z22nosb_volcano[["plot"]] +
  scale_x_continuous(breaks = c(-10, -8, -6, -4, -2, 0, 2, 4, 6))
## Error: object 'z23nosb_vs_z22nosb_volcano' not found
pp(file = "figures/fig2c_labeled.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found

11.6.6 Infected with z2.3 treated vs. z2.2 treated

This is currently figure 3C.

FIXME: The axis label isn’t quite right for the ggbreak.

## The original plot
hs_macr_table[["plots"]][["z23sb_vs_z22sb"]][["deseq_vol_plots"]]

z23sb_vs_z22sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z22sb"]], "z23sb_vs_z22sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["infsbz22"]])
## Error: object 'plot_colors' not found
labeled <- z23sb_vs_z22sb_volcano[["plot"]] +
  scale_x_continuous(breaks = c(-23, -6, -4, -2, 0, 2, 4, 6)) +
  ggbreak::scale_x_break(c(-5, -22.5), scales = 10, space = 0.02)
## Error: object 'z23sb_vs_z22sb_volcano' not found
pp(file = "figures/fig3c_labeled_breaks.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found

11.6.7 Infected with z2.3 SB treated vs. z2.3 untreated

I think this is currently figure 3A.

FIXME: The axis label for the ggbreak isn’t quite right.

## The original plot
hs_macr_table[["plots"]][["z23sb_vs_z23nosb"]][["deseq_vol_plots"]]

z23sb_vs_z23nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z23nosb"]], "z23sb_vs_z23nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["infz23"]])
## Error: object 'plot_colors' not found
labeled <- z23sb_vs_z23nosb_volcano[["plot"]] +
  scale_x_continuous(limits = c(-19, 6),
                     breaks = c(-20, -18, -16, -14, -12, -10, -6, -4, -2, 0, 2, 4, 6)) +
  ggbreak::scale_x_break(c(-17, -8), scales = 17, space = 0.02)
## Error: object 'z23sb_vs_z23nosb_volcano' not found
pp(file = "figures/fig3a_labeled_with_break.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found

11.6.8 Infected with z2.3 SB treated vs. z2.3 untreated

## The original plot
hs_macr_table[["plots"]][["z22sb_vs_z22nosb"]][["deseq_vol_plots"]]

z22sb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_z22nosb"]], "z22sb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_high = plot_colors[["infsbz22"]], color_low = plot_colors[["infz22"]])
## Error: object 'plot_colors' not found
labeled <- z22sb_vs_z22nosb_volcano[["plot"]] +
  scale_x_continuous(breaks = c(-6, -4, -2, 0, 2, 4, 6))
## Error: object 'z22sb_vs_z22nosb_volcano' not found
pp(file = "figures/fig3b_labeled.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found

11.6.9 Infected with z2.3 SB treated vs. uninfected treated

x_limits <- c(-6, 6)
## The original plot
hs_macr_table[["plots"]][["z23sb_vs_sb"]][["deseq_vol_plots"]]

z23sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["uninfsbnone"]])
## Error: object 'plot_colors' not found
z23sb_vs_sb_volcano[["plot"]]
## Error: object 'z23sb_vs_sb_volcano' not found

11.6.10 Infected with z2.2 SB treated vs. uninfected treated

## The original plot
hs_macr_table[["plots"]][["z22sb_vs_sb"]][["deseq_vol_plots"]]

z22sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_sb"]], "z22sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["uninfsbnone"]])
## Error: object 'plot_colors' not found
z22sb_vs_sb_volcano[["plot"]]
## Error: object 'z22sb_vs_sb_volcano' not found

11.6.11 Uninfected+SbV vs. Uninfected-SbV

This is currently figure 3D.

FIXME: This needs the BOLA2B ggbreak.

## The original plot
hs_macr_table[["plots"]][["sb_vs_uninf"]][["deseq_vol_plots"]]

sb_vs_uninf_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["sb_vs_uninf"]], "sb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgnc_symbol",
  color_high = plot_colors[["uninfsbnone"]], color_low = plot_colors[["uninfnone"]])
## Error: object 'plot_colors' not found
labeled <- sb_vs_uninf_volcano[["plot"]] +
  scale_x_continuous(breaks = c(-23, -6, -4, -2, 0, 2, 4, 6)) +
  ggbreak::scale_x_break(c(-5, -22.5), scales = 10, space = 0.02)
## Error: object 'sb_vs_uninf_volcano' not found
pp(file = "figures/fig3d_labeled_breaks.svg")
labeled
## Error: object 'labeled' not found
dev.off()
## png 
##   2
labeled
## Error: object 'labeled' not found

11.7 Double-check that gene counts match my perceptions

Check that my perception of the number of significant up/down genes matches what the table/venn says. In the following block I am performing some venn/upset analyses to see if the numbers of genes match what we have in the current version of the manuscript (plus or minus a gene) and thus if my interpretation of the figure/legend text matches what I think it means.

shared <- Vennerable::Venn(list(
  "drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]]),
  "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]])))
pp(file = "images/z23_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

## I see 910 z23sb/uninf and 670 no z23nosb/uninf genes in the venn diagram.
length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 839
dim(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]])
## [1] 839  73
shared <- Vennerable::Venn(list(
  "drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]]),
  "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22nosb_vs_uninf"]])))
pp(file = "images/z22_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 660
dim(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]])
## [1] 660  73

Note to self: There is an error in my volcano plot code which takes effect when the numerator and denominator of the all_pairwise contrasts are different than those in combine_de_tables. It is putting the ups/downs on the correct sides of the plot, but calling the down genes ‘up’ and vice-versa. The reason for this is that I did a check for this happening, but used the wrong argument to handle it.

A likely bit of text for these volcano plots:

The set of genes differentially expressed between the zymodeme 2.3 and uninfected samples without druge treatment was quantified with DESeq2 and included surrogate estimates from SVA. Given the criteria of significance of a abs(logFC) >= 1.0 and false discovery rate adjusted p-value <= 0.05, 670 genes were observed as significantly increased between the infected and uninfected samples and 386 were observed as decreased. The most increased genes from the uninfected samples include some which are potentially indicative of a strong innate immune response and the inflammatory response.

In contrast, when the set of genes differentially expressed between the zymodeme 2.2 and uninfected samples was visualized, only 7 genes were observed as decreased and 435 increased. The inflammatory response was significantly less apparent in this set, but instead included genes related to transporter activity and oxidoreductases.

11.8 Direct zymodeme comparisons

An orthogonal comparison to that performed above is to directly compare the zymodeme 2.3 and 2.2 samples with and without antimonial treatment.

11.8.1 Z2.3 / z2.2 without drug

z23nosb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgnc_symbol")
## Error in plot_volcano_de(table = hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]], : could not find function "plot_volcano_de"
plotly::ggplotly(z23nosb_vs_z22nosb_volcano[["plot"]])
## Error: object 'z23nosb_vs_z22nosb_volcano' not found
z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgnc_symbol")
## Error in plot_volcano_de(table = hs_macr_table[["data"]][["z23sb_vs_z22sb"]], : could not find function "plot_volcano_de"
plotly::ggplotly(z23sb_vs_z22sb_volcano[["plot"]])
## Error: object 'z23sb_vs_z22sb_volcano' not found
z23nosb_vs_z22nosb_volcano[["plot"]] +
  xlim(-10, 10) +
  ylim(0, 60)
## Error: object 'z23nosb_vs_z22nosb_volcano' not found
pp(file = "images/z23nosb_vs_z22nosb_reactome_up.svg",
   image = all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]],
   height = 12, width = 9)
## Warning: ImageMagick was built without librsvg which causes poor quality of SVG rendering.
## For better results use image_read_svg() which uses the rsvg package.
## Error in eval(expr, envir): R: geometry does not contain image `/lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/images/z23nosb_vs_z22nosb_reactome_up.svg' @ warning/attribute.c/GetImageBoundingBox/554
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]

## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
pp(file = "images/z23nosb_vs_z22nosb_reactome_down.svg",
   image = all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]],
   height = 12, width = 9)
## Warning: ImageMagick was built without librsvg which causes poor quality of SVG rendering.
## For better results use image_read_svg() which uses the rsvg package.
## Error in eval(expr, envir): R: geometry does not contain image `/lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/images/z23nosb_vs_z22nosb_reactome_down.svg' @ warning/attribute.c/GetImageBoundingBox/554
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

11.8.2 z2.3 / z2.2 with drug

z23sb_vs_z22sb_volcano[["plot"]] +
  xlim(-10, 10) +
  ylim(0, 60)
## Error: object 'z23sb_vs_z22sb_volcano' not found
pp(
  file = "images/z23sb_vs_z22sb_reactome_up.png",
  image = all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]],
  height = 12, width = 9)
## Warning in pp(file = "images/z23sb_vs_z22sb_reactome_up.png", image =
## all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], : There is no device
## to shut down.
## Error in eval(expr, envir): R: improper image header `/lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/images/z23sb_vs_z22sb_reactome_up.png' @ error/png.c/ReadPNGImage/3941
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

11.8.3 Venn to see shared/unique genes

Once again I wish to pull out the significant genes and see how my numbers match against the text.

shared <- Vennerable::Venn(list(
  "drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z22sb"]]),
  "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_z22nosb"]])))
pp(file = "images/drug_nodrug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(
  list("drug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z22sb"]]),
       "nodrug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23nosb_vs_z22nosb"]])))
pp(file = "images/drug_nodrug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

A slightly different way of looking at the differences between the two zymodeme infections is to directly compare the infected samples with and without drug. Thus, when a volcano plot showing the comparison of the zymodeme 2.3 vs. 2.2 samples was plotted, 484 genes were observed as increased and 422 decreased; these groups include many of the same inflammatory (up) and membrane (down) genes.

Similar patterns were observed when the antimonial was included. Thus, when a Venn diagram of the two sets of increased genes was plotted, a significant number of the genes was observed as increased (313) and decreased (244) in both the untreated and antimonial treated samples.

11.9 Drug effects on each zymodeme infection

Another likely question is to directly compare the treated vs untreated samples for each zymodeme infection in order to visualize the effects of antimonial.

11.9.1 z2.3 with and without drug

z23sb_vs_z23nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z23nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgnc_symbol")
## Error in plot_volcano_de(table = hs_macr_table[["data"]][["z23sb_vs_z23nosb"]], : could not find function "plot_volcano_de"
plotly::ggplotly(z23sb_vs_z23nosb_volcano[["plot"]])
## Error: object 'z23sb_vs_z23nosb_volcano' not found
z22sb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z22sb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgnc_symbol")
## Error in plot_volcano_de(table = hs_macr_table[["data"]][["z22sb_vs_z22nosb"]], : could not find function "plot_volcano_de"
plotly::ggplotly(z22sb_vs_z22nosb_volcano[["plot"]])
## Error: object 'z22sb_vs_z22nosb_volcano' not found
z23sb_vs_z23nosb_volcano[["plot"]] +
  xlim(-8, 8) +
  ylim(0, 210)
## Error: object 'z23sb_vs_z23nosb_volcano' not found
pp(file = "images/z23sb_vs_z23nosb_reactome_up.png",
   image = all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]],
   height = 12, width = 9)
## Warning in pp(file = "images/z23sb_vs_z23nosb_reactome_up.png", image =
## all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no
## device to shut down.
## Error in eval(expr, envir): R: improper image header `/lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/images/z23sb_vs_z23nosb_reactome_up.png' @ error/png.c/ReadPNGImage/3941
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["TF"]]
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

11.9.2 z2.2 with and without drug

z22sb_vs_z22nosb_volcano[["plot"]] +
  xlim(-8, 8) +
  ylim(0, 210)
## Error: object 'z22sb_vs_z22nosb_volcano' not found
pp(file = "images/z22sb_vs_z22nosb_reactome_up.png",
   image = all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]],
   height = 12, width = 9)
## Warning in pp(file = "images/z22sb_vs_z22nosb_reactome_up.png", image =
## all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no
## device to shut down.
## Error in eval(expr, envir): R: improper image header `/lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/images/z22sb_vs_z22nosb_reactome_up.png' @ error/png.c/ReadPNGImage/3941
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

11.9.3 Shared and unique genes after/before drug

shared <- Vennerable::Venn(list(
  "z23" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z23nosb"]]),
  "z22" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_z22nosb"]])))
pp(file = "images/z23_z22_drug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(list(
  "z23" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z23nosb"]]),
  "z22" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z22sb_vs_z22nosb"]])))
pp(file = "images/z23_z22_drug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

Note: I am settig the x and y-axis boundaries by allowing the plotter to pick its own axis the first time, writing down the ranges I observe, and then setting them to the largest of the pair. It is therefore possible that I missed one or more genes which lies outside that range.

The previous plotted contrasts sought to show changes between the two strains z2.3 and z2.2. Conversely, the previous volcano plots seek to directly compare each strain before/after drug treatment.

12 LRT of the Human Macrophage

A slightly different tack to examine the data is to perform a likelihood ratio test in order to look for trends which are shared among genes when examining different conditions in the data.

tmrc2_lrt_strain_drug <- deseq_lrt(hs_macr, interactor_column = "drug",
                                   interest_column = "macrophagezymodeme",
                                   factors = c("drug", "macrophagezymodeme"))
## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 38 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## rlog() may take a long time with 50 or more samples,
## vst() is a much faster transformation
## Working with 858 genes.
## Working with 855 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`

tmrc2_lrt_strain_drug[["cluster_data"]][["plot"]]

13 Parasite

Let us consider for a moment differences among the parasite transcriptomes for the samples which were not drug treated.

One thing I did in the initial implementation of this document was to repeat the variable ‘up_genes’ for each comparison; I think this time I will make a different variable for each comparison so I can play with them a little further.

comparison <- "z23_vs_z22"
lp_macrophage_de <- all_pairwise(lp_macrophage_nosb, model_svs = "svaseq",
                                 model_fstring = "~ 0 + condition", filter = TRUE)
## z2.2 z2.3 
##   14   15
## Running normalize_se.
## Removing 119 low-count genes (8591 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Running normalize_se.
## Setting 2387 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Warning in createContrastL(objFlt$formula, objFlt$data, L): Contrasts with only
## a single non-zero term are already evaluated by default.
## conditions
## z22 z23 
##  14  15
## conditions
## z22 z23 
##  14  15
## conditions
## z22 z23 
##  14  15
tmrc2_parasite_keepers <- list(
  "z23_vs_z22" = c("z23", "z22"))
lp_macrophage_table <- combine_de_tables(
  lp_macrophage_de, keepers = tmrc2_parasite_keepers,
  excel = glue("analyses/macrophage_de/de_tables/parasite_infection_de-v{ver}.xlsx"))
lp_macrophage_sig <- extract_significant_genes(
  lp_macrophage_table,
  excel = glue("analyses/macrophage_de/sig_tables/parasite_sig-v{ver}.xlsx"))

lp_macrophage_table[["plots"]][[comparison]][["deseq_vol_plots"]]

lp_macrophage_table[["plots"]][[comparison]][["deseq_ma_plots"]]

up_genes_z23z22 <- lp_macrophage_sig[["deseq"]][["ups"]][[comparison]]
dim(up_genes_z23z22)
## [1] 48 69
down_genes_z23z22 <- lp_macrophage_sig[["deseq"]][["downs"]][[comparison]]
dim(down_genes_z23z22)
## [1] 91 69
lp_z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = lp_macrophage_table[["data"]][["z23_vs_z22"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgnc_symbol")
## Error in plot_volcano_de(table = lp_macrophage_table[["data"]][["z23_vs_z22"]], : could not find function "plot_volcano_de"
plotly::ggplotly(lp_z23sb_vs_z22sb_volcano[["plot"]])
## Error: object 'lp_z23sb_vs_z22sb_volcano' not found
lp_z23sb_vs_z22sb_volcano[["plot"]]
## Error: object 'lp_z23sb_vs_z22sb_volcano' not found

14 GSVA

Note: The following block assumes one is able to download a fresh copy of msigDB, which I am not sure is possible within the constraints of a container (I mean it is trivial to do, but I am not sure if it is ok due to licensing). However, Broad provides a data package of a msigdb release. As a result, the following block will be repeated using that.

hs_infected <- subset_se(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_se(subset = "macrophagetreatment!='uninf_sb'")
hs_gsva_c2 <- simple_gsva(hs_infected)
hs_gsva_c2_meta <- get_msigdb_metadata(hs_gsva_c2, msig_xml = "reference/msigdb_v7.2.xml")
hs_gsva_c2_sig <- get_sig_gsva_categories(
  hs_gsva_c2_meta,
  excel = glue("analyses/macrophage_de/gsva/hs_macrophage_gsva_c2_sig.xlsx"))
hs_gsva_c2_sig[["raw_plot"]]

hs_gsva_c7 <- simple_gsva(hs_infected, signature_category = "c7")
hs_gsva_c7_meta <- get_msigdb_metadata(hs_gsva_c7, msig_xml = "reference/msigdb_v7.2.xml")
hs_gsva_c7_sig <- get_sig_gsva_categories(
  hs_gsva_c7,
  excel = glue("analyses/macrophage_de/gsva/hs_macrophage_gsva_c7_sig.xlsx"))
hs_gsva_c7_sig[["raw_plot"]]

14.1 Repeat using the GSVAdata package.

hs_infected <- subset_se(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_se(subset = "macrophagetreatment!='uninf_sb'")
hs_gsva_c2 <- simple_gsva(hs_infected)
## Error: unable to find an inherited method for function 'annotation' for signature 'object = "SummarizedExperiment"'
##hs_gsva_c2_meta <- get_msigdb_metadata(hs_gsva_c2, msig_xml="reference/msigdb_v7.2.xml")
hs_gsva_c2_sig <- get_sig_gsva_categories(
  hs_gsva_c2,
  excel = glue("analyses/macrophage_de/gsva/hs_macrophage_gsva_c2_sig.xlsx"))
## Error: object 'hs_gsva_c2' not found
hs_gsva_c2_sig[["raw_plot"]]
## Error: object 'hs_gsva_c2_sig' not found
hs_gsva_c7 <- simple_gsva(hs_infected, signature_category = "c7")
## Error: unable to find an inherited method for function 'annotation' for signature 'object = "SummarizedExperiment"'
##hs_gsva_c7_meta <- get_msigdb_metadata(hs_gsva_c7, msig_xml="reference/msigdb_v7.2.xml")
hs_gsva_c7_sig <- get_sig_gsva_categories(
  hs_gsva_c7,
  excel = glue("analyses/macrophage_de/gsva/hs_macrophage_gsva_c7_sig.xlsx"))
## Error: object 'hs_gsva_c7' not found
hs_gsva_c7_sig[["raw_plot"]]
## Error: object 'hs_gsva_c7_sig' not found

15 Try out a new tool

Two reasons: Najib loves him some PCA, this uses wikipathways, which is something I think is neat.

Ok, I spent some time looking through the code and I have some problems with some of the design decisions.

Most importantly, it requires a data.frame() which has the following format:

  1. No rownames, instead column #1 is the sample ID.
  2. Columns 2-m are the categorical/survival/etc metrics.
  3. Columns m-n are 1 gene-per-column with log2 values.

But when I think about it I think I get the idea, they want to be able to do modelling stuff more easily with response factors.

library(pathwayPCA)
library(rWikiPathways)
## 
## Attaching package: 'rWikiPathways'
## The following object is masked from 'package:edgeR':
## 
##     getCounts
downloaded <- downloadPathwayArchive(organism = "Homo sapiens", format = "gmt")
data_path <- system.file("extdata", package = "pathwayPCA")
wikipathways <- read_gmt(paste0(data_path, "/wikipathways_human_symbol.gmt"),
                         description = TRUE)

se <- subset_se(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_se(subset = "macrophagetreatment!='uninf_sb'")
se <- set_conditions(se, fact = "macrophagezymodeme")
## The numbers of samples by condition are:
## 
## none  z22  z23 
##    0   29   29
symbol_column <- "hgnc_symbol"
symbol_vector <- rowData(se)[[symbol_column]]
names(symbol_vector) <- rownames(rowData(se))
symbol_df <- as.data.frame(symbol_vector)

assay_df <- merge(symbol_df, as.data.frame(assay(se)), by = "row.names")
assay_df[["Row.names"]] <- NULL
rownames(assay_df) <- make.names(assay_df[["symbol_vector"]], unique = TRUE)
assay_df[["symbol_vector"]] <- NULL
assay_df <- as.data.frame(t(assay_df))
assay_df[["SampleID"]] <- rownames(assay_df)
assay_df <- dplyr::select(assay_df, "SampleID", everything())

factor_df <- as.data.frame(colData(se))
factor_df[["SampleID"]] <- rownames(factor_df)
factor_df <- dplyr::select(factor_df, "SampleID", everything())
factor_df <- factor_df[, c("SampleID", factors)]
## Error: object 'factors' not found
tt <- CreateOmics(
  assayData_df = assay_df,
  pathwayCollection_ls = wikipathways,
  response = factor_df,
  respType = "categorical",
  minPathSize = 5)
## 3190 genes have variance < epsilon and will be removed. These gene(s) include:
##   [1] "TNMD"         "CYP51A1"      "KRIT1"        "MAD1L1"       "ARF5"        
##   [6] "REXO5"        "FBXL3"        "REX1BD"       "KRT33A"       "TAC1"        
##  [11] "LGALS14"      "SLC13A2"      "TRAPPC6A"     "SELE"         "TFAP2B"      
##  [16] "SS18L2"       "IDS"          "SLC7A14"      "CLDN11"       "MDH1"        
##  [21] "COX15"        "MATR3"        "ISL1"         "INSRR"        "EFCAB1"      
##  [26] "TMSB10"       "OTC"          "HOXC8"        "XK"           "NOP16"       
##  [31] "TNFRSF17"     "GUCA1A"       "NNAT"         "NRIP2"        "MCOLN3"      
##  [36] "SERPINB3"     "MRPS24"       "SEZ6"         "AHRR"         "BORCS8.MEF2B"
##  [41] "KDM4A"        "THUMPD1"      "IFT80"        "ERLEC1"       "PAGE1"       
##  [46] "FRMPD1"       "LNX1"         "IPCEF1"       "ZNF37A"       "TUBA3D"      
##  [51] "SPAG5"        "EXOSC5"       "TIGAR"        "TP53INP2"     "LXN"         
##  [56] "AFM"          "CFHR2"        "UBA5"         "JMJD4"        "PCDHA6"      
##  [61] "PCDHGA2"      "C1QTNF3"      "RNF13"        "ZNF671"       "RRN3"        
##  [66] "CHERP"        "DIMT1"        "NME8"         "PIGS"         "DEFB127"     
##  [71] "FXYD3"        "CMTM1"        "FLT3LG"       "RBM27"        "ANGPT2"      
##  [76] "RNF31"        "SEMA4G"       "NUBP2"        "KCNK16"       "MAGEB2"      
##  [81] "MTAP"         "SERPIND1"     "DDT"          "SEC14L2"      "GGT1"        
##  [86] "PRODH"        "SOX10"        "TIMP3"        "PSMA3"        "SNW1"        
##  [91] "SERPINA4"     "PCK2"         "PRORP"        "TM9SF1"       "RAB5IF"      
##  [96] "CST9L"        "CST4"         "SPINT3"       "EPPIN"        "RBFA"        
## [101] "CEP76"        "H2BW2"        "MCTS2P"       "SRPX"         "F9"          
## [106] "PPP1R2C"      "BRS3"         "TIMP1"        "GLA"          "ACP5"        
## [111] "DHRS12"       "ZNF821"       "CMC2"         "ZNF174"       "CORO7.PAM16" 
## [116] "SALL1"        "AQP9"         "OIP5"         "ARHGEF10"     "CGB2"        
## [121] "CGB3"         "PPP1R37"      "RNASEH2A"     "OAZ1"         "C19orf44"    
## [126] "MED26"        "ZNF419"       "LGALS13"      "CEACAM5"      "BABAM1"      
## [131] "ATP1A3"       "ZNRF4"        "TMEM205"      "WDR83OS"      "PIK3R2"      
## [136] "PDE4C"        "DDX49"        "ERF"          "RASA4"        "TFPI2"       
## [141] "DUS4L"        "NPVF"         "WNT2"         "HOXA5"        "HOXA6"       
## [146] "CHN2"         "MINDY4"       "PSMA2"        "OGN"          "ASPN"        
## [151] "ECM2"         "EXOSC3"       "VSIR"         "RNF43"        "ASPA"        
## [156] "HOXB6"        "SLC16A6"      "RANGRF"       "VTN"          "FOXN1"       
## [161] "UNC119"       "ALDOC"        "ODAM"         "SMR3A"        "CHIC2"       
## [166] "IL2"          "CPZ"          "DBX1"         "SNX15"        "APOC3"       
## [171] "SCGB2A2"      "PTPMT1"       "CALCA"        "MYF6"         "MYF5"        
## [176] "PRR4"         "AKAP3"        "GSG1"         "OGFOD2"       "ART4"        
## [181] "MGP"          "FZD10"        "LPCAT3"       "GYS2"         "MAK"         
## [186] "ASF1A"        "IL17A"        "TSPO2"        "CCNC"         "HDGFL1"      
## [191] "OR12D3"       "MRPL2"        "TMCO6"        "PDE8B"        "IL5"         
## [196] "SMC4"         "NPHP3"        "BCHE"         "ABHD14B"      "ABHD14A.ACY1"
## [201] "TP53I3"       "INO80B"       "REG1A"        "KCNJ13"       "NEU2"        
## [206] "HSPE1"        "ABCB6"        "PNO1"         "ATP6V1B1"     "ANGPTL1"     
## [211] "NCF2"         "PRAMEF1"      "AGMAT"        "TNNI3K"       "TSNAX"       
## [216] "CRYGD"        "ZC2HC1B"      "CCND2"        "FGF23"        "TRIM32"      
## [221] "TGFB3"        "ZNF410"       "GPR75"        "IFIT3"        "NKX2.3"      
## [226] "IFIT2"        "HOXB8"        "HOXB5"        "CRHR1"        "HOXB1"       
## [231] "MLANA"        "IFNA6"        "GRIA2"        "LRP11"        "MAGEB4"      
## [236] "SLC25A2"      "GPR31"        "TNFSF11"      "TRIM6"        "TAS2R10"     
## [241] "IAPP"         "SLITRK3"      "CLCC1"        "GPSM2"        "OBP2A"       
## [246] "TBX22"        "PRM2"         "RWDD3"        "MRM2"         "SLC25A51"    
## [251] "DCAF10"       "WDR83"        "ACTRT1"       "TSFM"         "ORMDL2"      
## [256] "CDK2"         "KBTBD4"       "COL10A1"      "SERPINA7"     "H2BW1"       
## [261] "ESX1"         "B9D2"         "MC3R"         "GCNT7"        "ANKRD60"     
## [266] "C20orf85"     "TP53TG5"      "MAGEA10"      "FASTKD3"      "CRISP2"      
## [271] "AARS2"        "RPS10"        "APOBEC2"      "GCM2"         "TRIM51"      
## [276] "SCGB2A1"      "GPR18"        "IRF1"         "AMELX"        "NPBWR2"      
## [281] "BHLHE23"      "FOSB"         "DEFB126"      "FOXA2"        "NKX2.4"      
## [286] "NKX2.2"       "CSTL1"        "FLRT3"        "MGME1"        "TMEM74B"     
## [291] "CITED1"       "MMP24"        "TMEM115"      "KIRREL2"      "X.5"         
## [296] "STATH"        "HTN1"         "TIMM17B"      "EVI2A"        "OMG"         
## [301] "AVPR2"        "OMD"          "TAS2R3"       "TAS2R4"       "OR7A10"      
## [306] "SLC35E1"      "OR7C2"        "GNG13"        "EMC6"         "OR1E2"       
## [311] "FGL2"         "GNAZ"         "ADORA2A"      "TAS2R16"      "ATP6V1F"     
## [316] "LRRC4"        "LRRC17"       "FEZF1"        "MRPS12"       "HOXD1"       
## [321] "HAT1"         "HOXD9"        "HOXD13"       "ELL3"         "CALML4"      
## [326] "THAP10"       "ACKR4"        "SOX15"        "KLK8"         "NEDD8"       
## [331] "VCY1B"        "VCY"          "CDY2B"        "INS.IGF2"     "KCNA5"       
## [336] "ANGPTL8"      "ACE2"         "GDF1"         "MRPL34"       "LSM7"        
## [341] "ACSBG2"       "BMP15"        "ARPC1B"       "OR11H1"       "CALY"        
## [346] "PPAN"         "HSD17B3"      "PRRG1"        "BPIFA3"       "DEFB118"     
## [351] "GJA9"         "CDX4"         "NAPSA"        "PDLIM4"       "TMEM204"     
## [356] "KRT33B"       "FSHB"         "USP29"        "NR0B2"        "ACTR10"      
## [361] "ABHD12B"      "RTBDN"        "TRIM22"       "TIMM10B"      "SCLY"        
## [366] "FTHL17"       "NIP7"         "VPS4A"        "SCP2D1"       "SSTR4"       
## [371] "APCS"         "TOE1"         "PPP1R3D"      "BHMT2"        "ZBED3"       
## [376] "ANGPTL3"      "STOML3"       "IRS4"         "ERG28"        "GSC"         
## [381] "CAMK1"        "GSTM1"        "TSHB"         "GSTM3"        "FKBP11"      
## [386] "GRP"          "PRH2"         "SOX3"         "BIVM"         "ERCC5"       
## [391] "UGT2A3"       "CSN2"         "LACRT"        "GLS2"         "FAM186B"     
## [396] "BLOC1S1"      "ZC3H10"       "SLC26A10"     "MIP"          "CHST5"       
## [401] "HTR2B"        "TMBIM1"       "RCBTB2"       "KDELR2"       "CIDEB"       
## [406] "NKX2.8"       "NKX2.1"       "SRSF1"        "CHAD"         "GH2"         
## [411] "LIMD2"        "CFC1"         "OR13C9"       "ANGPTL2"      "TLR4"        
## [416] "HINT2"        "YIPF3"        "CLPS"         "FXYD6"        "SLTM"        
## [421] "LHCGR"        "SLC3A1"       "LBX1"         "CUZD1"        "RBP4"        
## [426] "GPR87"        "MSTN"         "CARF"         "IL21"         "GSTCD"       
## [431] "ZCRB1"        "NDUFA9"       "KERA"         "SYCP3"        "CCDC65"      
## [436] "NPFF"         "LPAR6"        "RNF113B"      "SSTR1"        "TSSK4"       
## [441] "RAB15"        "SERF2"        "FGF7"         "CELF6"        "IGSF6"       
## [446] "CHST4"        "ZSCAN32"      "CTRL"         "KIF2B"        "ADCYAP1"     
## [451] "MEP1B"        "SAT2"         "ZNF750"       "ELAC1"        "SLC39A3"     
## [456] "CCDC97"       "TMEM91"       "ZNF593"       "EVA1B"        "DMRTB1"      
## [461] "BARHL2"       "PIGM"         "CRNN"         "JTB"          "CREB3L4"     
## [466] "PYCR2"        "CHAC2"        "ANKRD53"      "TEX261"       "LIPT1"       
## [471] "LYG1"         "GPR17"        "PHOSPHO2"     "RPL32"        "LRTM1"       
## [476] "ZNF660"       "EIF2A"        "AMT"          "ECE2"         "SLC26A1"     
## [481] "CABS1"        "BHMT"         "KCNMB1"       "LRRTM2"       "SLC17A4"     
## [486] "H2BC1"        "HIGD2A"       "TCTE1"        "CLVS2"        "TAAR2"       
## [491] "TAAR6"        "TAAR8"        "WTAP"         "RBAK"         "FERD3L"      
## [496] "TMEM140"      "CLTRN"        "LANCL3"       "SYTL5"        "AKAP4"
## 1103 gene name(s) are invalid. Invalid name(s) include:
##   [1] "NME1.NME2"       "RTEL1.TNFRSF6B"  "STON1.GTF2A1L"   "X.1"            
##   [5] "PTGES3L.AARSD1"  "NKX3.2"          "X.2"             "TMEM189.UBE2V1" 
##   [9] "H1.3"            "X.3"             "H1.1"            "X.4"            
##  [13] "CHURC1.FNTB"     "X.6"             "H3.3B"           "ZNF670.ZNF695"  
##  [17] "X.7"             "ERVK3.1"         "X.8"             "X.9"            
##  [21] "NKX6.2"          "X.10"            "H3.3A"           "NKX6.1"         
##  [25] "NKX6.3"          "NKX3.1"          "X.11"            "X.12"           
##  [29] "H3.4"            "H1.4"            "JMJD7.PLA2G4B"   "X.14"           
##  [33] "KRTAP4.4"        "RAB4B.EGLN2"     "X.15"            "X.16"           
##  [37] "H1.8"            "HLA.DQB1"        "X.17"            "NKX2.6"         
##  [41] "KRTAP9.7"        "KRTAP11.1"       "NKX2.5"          "KRTAP8.1"       
##  [45] "X.19"            "KRTAP19.1"       "H1.5"            "KRTAP6.1"       
##  [49] "H1.10"           "KRTAP5.5"        "KRTAP17.1"       "KRTAP21.2"      
##  [53] "H1.7"            "X.20"            "H1.6"            "H1.2"           
##  [57] "X.21"            "H3.5"            "X.22"            "H1.0"           
##  [61] "HLA.DRB1"        "KRTAP5.3"        "HLA.DQA1"        "X.23"           
##  [65] "H4.16"           "X.25"            "KRTAP4.1"        "HLA.DRB5"       
##  [69] "MT.ND6"          "MT.CO2"          "MT.CYB"          "MT.ND2"         
##  [73] "MT.ND5"          "MT.CO1"          "MT.ND3"          "MT.ND4"         
##  [77] "MT.ND1"          "MT.ATP6"         "MT.CO3"          "X.26"           
##  [81] "HLA.DOA"         "HLA.DMA"         "HLA.DRA"         "X.28"           
##  [85] "HLA.C"           "KRTAP5.11"       "KRTAP5.10"       "HLA.E"          
##  [89] "HLA.G"           "HLA.F"           "X.29"            "CLLU1.AS1"      
##  [93] "X.30"            "TRBV20OR9.2"     "KRTAP5.6"        "KRTAP5.2"       
##  [97] "KRTAP5.1"        "KRTAP19.8"       "HLA.A"           "X.31"           
## [101] "IGKV4.1"         "IGKV6.21"        "IGKV3D.20"       "IGLV10.54"      
## [105] "IGLV5.52"        "IGLV1.51"        "IGLV1.50"        "IGLV1.47"       
## [109] "IGLV7.46"        "IGLV1.44"        "IGLV7.43"        "IGLV1.40"       
## [113] "IGLV3.25"        "IGLV2.23"        "IGLV3.21"        "IGLV3.19"       
## [117] "IGLV3.16"        "IGLV2.14"        "IGLV2.11"        "IGLV3.9"        
## [121] "IGLV3.1"         "TRBV7.3"         "TRBV5.3"         "TRBV10.1"       
## [125] "TRBV6.5"         "TRBV6.6"         "TRBV7.6"         "TRBV5.1"        
## [129] "TRBV20.1"        "TRBV24.1"        "TRBJ2.1"         "TRBJ2.2P"       
## [133] "TRBJ2.3"         "TRBJ2.6"         "TRBJ2.7"         "TRAV12.3"       
## [137] "TRAV8.7"         "IGHD3.10"        "IGHV6.1"         "IGHV1.2"        
## [141] "IGHV1.3"         "IGHV2.5"         "IGHV3.7"         "IGHV3.11"       
## [145] "IGHV3.13"        "IGHV3.15"        "IGHV1.18"        "IGHV3.20"       
## [149] "IGHV3.21"        "IGHV3.23"        "IGHV1.24"        "IGHV2.26"       
## [153] "IGHV4.28"        "IGHV3.33"        "IGHV4.34"        "IGHV4.39"       
## [157] "IGHV3.49"        "IGHV5.51"        "IGHV3.66"        "IGHV3.73"       
## [161] "KRTAP16.1"       "KRTAP3.3"        "KRTAP3.2"        "MT.ND4L"        
## [165] "X.32"            "ZNF625.ZNF20"    "ERV3.1"          "X.33"           
## [169] "RPL17.C18orf32"  "KRTAP1.5"        "ZNF816.ZNF321P"  "IGHV3.64"       
## [173] "HLA.DPB1"        "IGHV4.59"        "IGHV3.74"        "APOC4.APOC2"    
## [177] "X.36"            "X.38"            "ERVMER34.1"      "X.39"           
## [181] "MT.ATP8"         "IGKV3D.7"        "TRBV5.4"         "X.40"           
## [185] "IGKV1OR2.108"    "HLA.DPA1"        "IGHV3.43"        "HLA.DQB2"       
## [189] "TRBV29.1"        "X.41"            "IGKV3OR2.268"    "HLA.B"          
## [193] "HNRNPUL2.BSCL2"  "NKX1.1"          "X.44"            "HLA.DQA2"       
## [197] "IGKV2D.30"       "IGKV1D.8"        "IGKV1.6"         "X.47"           
## [201] "IGKV3.20"        "IGKV1D.33"       "IGKV1.17"        "IGKV1.8"        
## [205] "IGKV1.16"        "HLA.DOB"         "KRTAP5.8"        "IGKV2.24"       
## [209] "IGKV3.11"        "X.48"            "KRTAP5.4"        "IGKV1.9"        
## [213] "X.50"            "IGKV1.33"        "IGKV1.39"        "IGKV2D.28"      
## [217] "HLA.DMB"         "IGKV1D.17"       "ERVW.1"          "PPAN.P2RY11"    
## [221] "IGKV2.30"        "IGKV2D.29"       "IGKV1.12"        "IGKV1.5"        
## [225] "X.51"            "X.52"            "DNAJC25.GNG10"   "KRTAP5.7"       
## [229] "IGKV3.15"        "KRTAP4.2"        "IGKV1.27"        "TRIM39.RPP21"   
## [233] "X.54"            "PRR5.ARHGAP8"    "STIMATE.MUSTN1"  "RBM14.RBM4"     
## [237] "LY75.CD302"      "X.55"            "X.56"            "TNFSF12.TNFSF13"
## [241] "ATP5MF.PTCD1"    "X.57"            "EPPIN.WFDC6"     "X.58"           
## [245] "X.59"            "X.60"            "X.61"            "X.62"           
## [249] "X.63"            "X.64"            "RNF103.CHMP3"    "X.66"           
## [253] "ARPIN.AP3S2"     "ARPC4.TTLL3"     "X.67"            "X.68"           
## [257] "X.69"            "LY6G6F.LY6G6D"   "X.70"            "CCDC169.SOHLH2" 
## [261] "NT5C1B.RDH14"    "X.71"            "X.72"            "X.73"           
## [265] "TMED7.TICAM2"    "X.74"            "MSANTD3.TMEFF1"  "X.75"           
## [269] "CENPS.CORT"      "X.76"            "X.77"            "TRBV7.4"        
## [273] "CHKB.CPT1B"      "X.78"            "X.79"            "X.80"           
## [277] "X.81"            "X.82"            "X.83"            "CKLF.CMTM1"     
## [281] "ATP6V1G2.DDX39B" "INMT.MINDY4"     "X.85"            "STX16.NPEPL1"   
## [285] "KRTAP5.9"        "X.86"            "SAA2.SAA4"       "ZFP91.CNTF"     
## [289] "X.87"            "MSH5.SAPCD1"     "FXYD6.FXYD2"     "X.88"           
## [293] "X.89"            "X.90"            "X.91"            "X.92"           
## [297] "X.93"            "NEDD8.MDP1"      "TRAV1.1"         "X.94"           
## [301] "X.96"            "X.97"            "KLRC4.KLRK1"     "X.98"           
## [305] "X.99"            "X.100"           "X.101"           "X.102"          
## [309] "X.103"           "X.104"           "TRAV1.2"         "X.107"          
## [313] "X.108"           "X.109"           "X.110"           "X.111"          
## [317] "X.112"           "X.113"           "SLCO1B3.SLCO1B7" "X.114"          
## [321] "X.115"           "X.117"           "X.118"           "X.120"          
## [325] "X.121"           "X.122"           "RPL36A.HNRNPH2"  "X.123"          
## [329] "X.124"           "P2RX5.TAX1BP3"   "X.125"           "X.126"          
## [333] "X.127"           "PPT2.EGFL8"      "X.129"           "X.130"          
## [337] "X.131"           "X.132"           "X.133"           "X.134"          
## [341] "SPECC1L.ADORA2A" "BCL2L2.PABPN1"   "X.137"           "X.138"          
## [345] "PINX1.1"         "X.140"           "X.141"           "X.142"          
## [349] "X.143"           "UBE2F.SCLY"      "X.144"           "FPGT.TNNI3K"    
## [353] "BLOC1S5.TXNDC5"  "X.146"           "POC1B.GALNT4"    "NDUFC2.KCTD14"  
## [357] "X.147"           "ZHX1.C8orf76"    "X.150"           "ST20.MTHFS"     
## [361] "X.151"           "TGIF2.RAB5IF"    "X.152"           "X.153"          
## [365] "X.155"           "X.156"           "X.158"           "X.159"          
## [369] "X.160"           "X.161"           "X.162"           "PMF1.BGLAP"     
## [373] "X.163"           "X.164"           "X.165"           "X.166"          
## [377] "X.169"           "X.170"           "X.171"           "X.172"          
## [381] "X.173"           "X.174"           "X.175"           "X.176"          
## [385] "TEN1.CDK3"       "X.177"           "X.178"           "X.179"          
## [389] "X.180"           "X.181"           "X.182"           "ISY1.RAB43"     
## [393] "X.183"           "X.184"           "X.185"           "X.186"          
## [397] "TMEM256.PLSCR3"  "X.191"           "X.192"           "X.193"          
## [401] "X.195"           "X.196"           "LINC02210.CRHR1" "X.197"          
## [405] "X.198"           "X.200"           "X.201"           "X.202"          
## [409] "X.203"           "X.205"           "CFAP298.TCP10L"  "X.206"          
## [413] "EEF1E1.BLOC1S5"  "X.207"           "X.208"           "X.209"          
## [417] "X.210"           "X.211"           "X.213"           "X.214"          
## [421] "X.215"           "X.216"           "X.217"           "X.218"          
## [425] "X.221"           "X.222"           "X.223"           "X.224"          
## [429] "X.225"           "X.226"           "X.227"           "X.228"          
## [433] "X.229"           "X.230"           "X.231"           "X.232"          
## [437] "X.233"           "X.234"           "X.235"           "X.236"          
## [441] "X.237"           "X.238"           "X.239"           "X.240"          
## [445] "X.241"           "X.242"           "X.244"           "X.245"          
## [449] "X.246"           "X.247"           "X.248"           "X.249"          
## [453] "X.250"           "X.251"           "X.252"           "X.253"          
## [457] "X.254"           "X.255"           "X.256"           "X.257"          
## [461] "X.258"           "X.259"           "X.260"           "X.261"          
## [465] "X.263"           "X.264"           "X.266"           "X.267"          
## [469] "X.268"           "X.269"           "X.270"           "MIA.RAB4B"      
## [473] "X.271"           "X.272"           "X.273"           "X.274"          
## [477] "X.275"           "X.276"           "X.277"           "X.279"          
## [481] "X.280"           "X.281"           "X.283"           "X.285"          
## [485] "X.286"           "X.288"           "ARHGAP19.SLIT1"  "COMMD3.BMI1"    
## [489] "ZNF559.ZNF177"   "X.289"           "TSNAX.DISC1"     "X.290"          
## [493] "X.291"           "X.292"           "BORCS7.ASMT"     "IGHV3.30"       
## [497] "URGCP.MRPS24"    "RPS10.NUDT3"     "TLCD4.RWDD3"     "X.293"
## These genes may be excluded from analysis. Proper gene names
## contain alphanumeric characters only, and start with a letter.
## Warning in CheckSampleIDs(assayData_df): Row names will be ignored. Sample IDs must be in the first column of the data
##   frame.
## Error in .convertPhenoDF(response, type = respType): Regression and categorical data must be a data frame with two columns, sample ID
##   and response, in exactly that order.
super <- AESPCA_pVals(
  object = tt,
  numPCs = 2,
  parallel = FALSE,
  numCores = 8,
  numReps = 2,
  adjustment = "BH")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'AESPCA_pVals': object 'tt' not found

16 Evaluating a log2FC barplot

Figure 2E is now comprised of a plot which shows log2FC values with error bars for selected genes and seeks to show differences between 2.3/uninfected and 2.2/uninfected.

Here is the table Olga used to generate it:

I went looking in the xlsx files produced in 202405 and found that these are the log2FC values and standard errors produced by DESeq2.

It should be noted that in my most recent version of these analyses, these numbers did shift slightly. I am looking into that now.

  • Data witout drug

** 2.3 vs Uninfected MØ 2.2 vs Uninfected MØ

16.1 | Gene | Mean | SEM | n | Mean | SEM |n |

|IFI27 | 7.224 | 0.5662 |6 | 2.702 | 0.5669 | 6| |RSAD2 | 6.29 | 0.7312 |6 | 1.623 | 0.7303 | 6| |CCL8 | 6.225 | 0.928 |6 | -0.314| 0.941 | 6| |IFI44L| 5.895 | 0.612 |6 | 2.06 | 0.611 | 6| |OASL | 4.726 | 0.4974 |6 | 1.392 | 0.4973 | 6| |USP18 | 3.644 | 0.483 |6 | 0.999 | 0.4826 | 6| |IDO1 | 7.145 | 1.107 |6 | 1.257 | 1.141 | 6| |IDO2 | 3.935 | 1.3 |6 | 2.557 | 1.341 | 6| |KYNU | 1.07 | 0.2186 |6 | 0.0207| 0.2184 | 6| |AHR | 0.9382 | 0.2236 |6 | 0.5032| 0.2239 | 6| |IL4I1 | 2.593 | 0.4623 |6 | 0.039 | 0.4618 | 6| |SOD2 | 2.76 | 0.349 |6 | 0.4241| 0.3528 | 6| |NOTCH1| 0.7572| 0.275 |6 | 1.495 | 0.2744 | 6| |DLL1 | 0.8268| 0.5285 |6 | 3.455 | 0.5228 | 6| |DLL4 | 1.116 | 0.737 |6 | 4.243 | 0.71 | 6| |HES1 | -0.0183| 0.8599 |6 | 6.536 | 0.7973 | 6| |HEY1 | 0.5533| 0.5789 |6 | 4.181 | 0.6273 | 6|

Ok, I think I found a problem: The NOTCH1 value is actually the adjusted p-value.

  • Transporters without drug

** 2.3 vs Uninfected MØ 2.2 vs Uninfected MØ

16.2 | Gene | Mean | SEM | n| Mean | SEM | n|

|ABCB1 | -2.354 | 0.442 | 6| -0.406| 0.431| 6| |ABCG4 | -3.715 | 0.648 | 6| -0.653| 0.630| 6| |ABCB5 | -1.192 | 0.380 | 6| 1.351 | 0.363| 6| |ABCA9 | 1.880 | 0.648 | 6| 3.444 | 0.637| 6| |ABCC2 | 0.454 | 0.321 | 6| 1.818 | 0.314| 6| |AQP2 | -1.191 | 0.529 | 6| 0.745 | 0.514| 6| |AQP3 | -0.940 | 0.402 | 6| 0.431 | 0.395| 6|

  • Transporters with drug

** 2.3 vs Uninfected MØ 2.2 vs Uninfected MØ

16.3 |Gene | Mean | SEM | n| Mean | SEM | n |

|ABCB1 | -0.697| 0.349 | 6| -1.255| 0.337 | 6| |ABCG4 | 1.231 | 0.503 | 6| 0.547 | 0.484 | 6| |AQP2 | 0.816 | 0.399 | 6| 0.043 | 0.387 | 6| |AQP3 | -1.286| 0.320 | 6| -1.613| 0.309 | 6| |AQP8 | 0.634 | 0.370 | 6| 0.943 | 0.365 | 6|

Let us now see if I can recapitulate the plot…

nodrug_contrasts <- c("z23nosb_vs_uninf", "z22nosb_vs_uninf")
genes_no_drug <- c("IFI27", "RSAD2", "CCL8", "IFI44L", "OASL", "USP18", "IDO1", "IDO2", "KYNU", "AHR", "IL4I1", "SOD2", "NOTCH1", "DLL1", "DLL4", "HES1", "HEY1")
transporters_no_drug <- c("ABCB1", "ABCG4", "ABCB5", "ABCA9", "ABCC2", "AQP2", "AQP3")
drug_contrasts <- c("z23sb_vs_sb", "z22sb_vs_sb")
transporters_drug <- c("ABCB1", "ABCG4", "AQP2", "AQP3", "AQP8")

These values came out of the data structure called ‘hs_macr_table’

z23nosb_uninf_values <- hs_macr_table[["data"]][["z23nosb_vs_uninf"]]
gene_idx <- z23nosb_uninf_values[["hgnc_symbol"]] %in% genes_no_drug
nodrug_rows <-  z23nosb_uninf_values[gene_idx, ]
rownames(nodrug_rows) <- nodrug_rows[["hgnc_symbol"]]
z23_nodrug_values <- nodrug_rows[, c("deseq_logfc", "deseq_lfcse")]
z23_nodrug_values
## DataFrame with 17 rows and 2 columns
##       deseq_logfc deseq_lfcse
##         <numeric>   <numeric>
## IL4I1     2.59300      0.4623
## AHR       0.93810      0.2236
## CCL8      6.22500      0.9280
## SOD2      2.76000      0.3490
## HES1     -0.01786      0.8599
## ...           ...         ...
## HEY1       0.5531      0.6520
## IFI27      7.2240      0.5662
## USP18      3.6440      0.4830
## IDO2       3.9340      1.2990
## DLL1       0.8268      0.5284
z22nosb_uninf_values <- hs_macr_table[["data"]][["z22nosb_vs_uninf"]]
gene_idx <- z22nosb_uninf_values[["hgnc_symbol"]] %in% genes_no_drug
nodrug_rows <-  z22nosb_uninf_values[gene_idx, ]
rownames(nodrug_rows) <- nodrug_rows[["hgnc_symbol"]]
z22_nodrug_values <- nodrug_rows[, c("deseq_logfc", "deseq_lfcse")]
z22_nodrug_values
## DataFrame with 17 rows and 2 columns
##       deseq_logfc deseq_lfcse
##         <numeric>   <numeric>
## IL4I1     0.03995      0.4618
## AHR       0.50310      0.2239
## CCL8     -0.31360      0.9406
## SOD2      0.42410      0.3528
## HES1      6.53600      0.7973
## ...           ...         ...
## HEY1        4.181      0.6273
## IFI27       2.702      0.5669
## USP18       0.999      0.4826
## IDO2        2.557      1.3410
## DLL1        3.455      0.5228
z23_nodrug_values[["state"]] <- "z23_vs_uninfected"
z22_nodrug_values[["state"]] <- "z22_vs_uninfected"
plot_df <- rbind.data.frame(as.data.frame(z23_nodrug_values), as.data.frame(z22_nodrug_values))
plot_df[["gene"]] <- rownames(plot_df)

## I just realized that this is actually just a comparison of z23/z22
## we should just take the adjusted p-values from that contrast for this.
z23_z22_comparison <- hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]]
nodrug_rows <- z23_z22_comparison[gene_idx, ]
nodrug_pvalues <- nodrug_rows[, c("deseq_p", "deseq_adjp")]
rownames(nodrug_pvalues) <- nodrug_rows[["hgnc_symbol"]]
nodrug_pvalues
## DataFrame with 17 rows and 2 columns
##         deseq_p deseq_adjp
##       <numeric>  <numeric>
## IL4I1 1.250e-13  3.949e-12
## AHR   8.308e-03  2.421e-02
## CCL8  3.677e-21  4.197e-19
## SOD2  6.181e-20  5.813e-18
## HES1  9.422e-38  2.215e-34
## ...         ...        ...
## HEY1  9.854e-17  5.410e-15
## IFI27 6.486e-28  2.310e-25
## USP18 1.772e-13  5.467e-12
## IDO2  1.047e-01  1.895e-01
## DLL1  4.352e-12  1.103e-10
ggplot(plot_df, aes(x = gene, y = deseq_logfc, fill = state)) +
  geom_bar(position = position_dodge(), stat = "identity") +
  geom_errorbar(aes(ymin = deseq_logfc - deseq_lfcse,
                    ymax = deseq_logfc + deseq_lfcse),
                width = 0.2, position = position_dodge(0.9)) +
  scale_fill_manual(values = c("#1B9E77", "#7570B3")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

comparison <- c("z23_vs_uninfected", "z22_vs_uninfected")
comparisons <- rep(list(comparison), nrow(plot_df) / 2)
ggplot(plot_df, aes(x = gene, y = deseq_logfc, fill = state, add = deseq_lfcse, facet.by = "state")) +
  geom_bar(position = position_dodge(), stat = "identity") +
  geom_errorbar(aes(ymin = deseq_logfc - deseq_lfcse,
                    ymax = deseq_logfc + deseq_lfcse),
                width = 0.2, position = position_dodge(0.9)) +
  stat_compare_means() +
  stat_compare_means(comparisons = comparisons, label.y = rownames(z23_nodrug_values)) +
  scale_fill_manual(values = c("#1B9E77", "#7570B3")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
## Error in stat_compare_means(): could not find function "stat_compare_means"

Excellent, the values now match up. Now I ust need to figure out why the stupid hgnc IDs got lost… I can see them in the hs_annot data structure, so I must have messed up when I regenered the input to the de. Ok, I got to the same starting point now with identical values. As soon as I did that, I looked at the resulting plot and realized that we are actually just comparing z23 / z22.

Here is why: the plot as it stands is a comparison of the log2FC values of the following two contrasts: z23/uninfected and z22/uninfected; stated differently, this is (z23/uninf)/(z22/uninf) which of course cancels out to just z23/z22.

Therefore it is much more parsimonious to just use the values from z23/z22. I swear I have gone through this exact exercise on so so many occasions in the past it is terrible.

16.4 ggsignificance of the immune modulators

wanted_genes <- c("IFI27", "RSAD2", "CCL8", "IFI44L", "OASL",
                  "USP18", "IDO1", "IDO2", "KYNU", "AHR", "IL4I1",
                  "SOD2", "NOTCH1", "DLL1", "DLL4", "HES1", "HEY1")
modulator_plot <- ggsignif_paired_genes(
  hs_macr, conditions = c("inf_z23", "inf_z22"), genes = wanted_genes)
## Running normalize_se.
## Warning in normalize_se(exp, ...): Quantile normalization and sva do not always
## play well together.
## Removing 9725 low-count genes (11756 remaining).
## transform_counts: Found 2226 values less than 0.
## Warning in transform_counts(count_table, method = transform, ...): NaNs
## produced
## Setting 34233 entries to zero.
## Using Row.names, ensembl_gene_id, ensembl_transcript_id, description, gene_biotype, cds_length, chromosome_name, strand, hgnc_symbol, transcript as id variables
## Error in ggsignif_paired_genes(hs_macr, conditions = c("inf_z23", "inf_z22"), : object 'merged' not found
modulator_plot
## Error: object 'modulator_plot' not found

16.5 ggsignificance of the transporters

## First line is without drug
wanted_genes <- c("ABCB1", "ABCG4", "ABCB5", "AQP2", "AQP3",
                  ## with drug
                  "ABCB1", "ABCG4", "AQP2", "AQP3", "AQP8")
transporter_plot <- ggsignif_paired_genes(
  hs_macr, conditions = c("inf_z23", "inf_z22"), genes = wanted_genes)
## Running normalize_se.
## Warning in normalize_se(exp, ...): Quantile normalization and sva do not always
## play well together.
## Removing 9725 low-count genes (11756 remaining).
## transform_counts: Found 2226 values less than 0.
## Warning in transform_counts(count_table, method = transform, ...): NaNs
## produced
## Setting 34233 entries to zero.
## Error in `dplyr::arrange()` at magrittr/R/pipe.R:136:3:
## i In argument: `..1 = factor(hgnc_symbol, levels = genes)`.
## Caused by error in `levels<-`:
## ! factor level [6] is duplicated
transporter_plot
## Error: object 'transporter_plot' not found
pander::pander(sessionInfo())
## Warning: Your system is mis-configured: '/etc/localtime' is not a symlink
## Warning: It is strongly recommended to set envionment variable TZ to
## 'America/New_York' (or equivalent)

R version 4.5.0 (2025-04-11)

Platform: x86_64-pc-linux-gnu

locale: C

attached base packages: grid, stats4, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: rWikiPathways(v.1.28.0), pathwayPCA(v.1.24.0), Rgraphviz(v.2.52.0), graph(v.1.86.0), SparseM(v.1.84-2), topGO(v.2.60.1), GSVAdata(v.1.44.0), org.Hs.eg.db(v.3.21.0), AnnotationDbi(v.1.70.0), IRanges(v.2.42.0), S4Vectors(v.0.46.0), Biobase(v.2.68.0), BiocGenerics(v.0.54.0), generics(v.0.1.4), ReactomePA(v.1.52.0), edgeR(v.4.6.3), ruv(v.0.9.7.1), ggstatsplot(v.0.13.1), enrichplot(v.1.28.4), tidyr(v.1.3.1), tibble(v.3.3.0), UpSetR(v.1.4.0), hpgltools(v.1.2), Heatplus(v.3.16.0), glue(v.1.8.0), ggplot2(v.3.5.2) and ggbreak(v.0.1.6)

loaded via a namespace (and not attached): R.methodsS3(v.1.8.2), dichromat(v.2.0-0.1), GSEABase(v.1.70.0), progress(v.1.2.3), Biostrings(v.2.76.0), vctrs(v.0.6.5), ggtangle(v.0.0.7), shape(v.1.4.6.1), effectsize(v.1.0.1), digest(v.0.6.37), png(v.0.1-8), corpcor(v.1.6.10), DEGreport(v.1.44.0), ggrepel(v.0.9.6), bayestestR(v.0.17.0), correlation(v.0.8.8), magick(v.2.9.0), MASS(v.7.3-65), reshape(v.0.8.10), reshape2(v.1.4.4), httpuv(v.1.6.16), foreach(v.1.5.2), qvalue(v.2.40.0), withr(v.3.0.2), psych(v.2.5.6), xfun(v.0.53), ggfun(v.0.2.0), survival(v.3.8-3), memoise(v.2.0.1), clusterProfiler(v.4.16.0), gson(v.0.1.0), BiasedUrn(v.2.0.12), parameters(v.0.28.1), GlobalOptions(v.0.1.2), tidytree(v.0.4.6), gtools(v.3.9.5), logging(v.0.10-108), R.oo(v.1.27.1), DEoptimR(v.1.1-4), prettyunits(v.1.2.0), datawizard(v.1.2.0), rematch2(v.2.1.2), KEGGREST(v.1.48.1), promises(v.1.3.3), httr(v.1.4.7), restfulr(v.0.0.16), meshes(v.1.34.0), UCSC.utils(v.1.4.0), DOSE(v.4.2.0), reactome.db(v.1.92.0), curl(v.7.0.0), ggraph(v.2.2.2), polyclip(v.1.10-7), GenomeInfoDbData(v.1.2.14), SparseArray(v.1.8.1), RBGL(v.1.84.0), RcppEigen(v.0.3.4.0.2), doParallel(v.1.0.17), xtable(v.1.8-4), stringr(v.1.5.1), desc(v.1.4.3), evaluate(v.1.0.4), S4Arrays(v.1.8.1), BiocFileCache(v.2.16.1), preprocessCore(v.1.70.0), hms(v.1.1.3), GenomicRanges(v.1.60.0), colorspace(v.2.1-1), filelock(v.1.0.3), magrittr(v.2.0.3), later(v.1.4.3), viridis(v.0.6.5), ggtree(v.3.17.1.001), lattice(v.0.22-7), genefilter(v.1.90.0), robustbase(v.0.99-6), XML(v.3.99-0.19), cowplot(v.1.2.0), matrixStats(v.1.5.0), ggupset(v.0.4.1), pillar(v.1.11.0), nlme(v.3.1-168), iterators(v.1.0.14), caTools(v.1.18.3), compiler(v.4.5.0), stringi(v.1.8.7), minqa(v.1.2.8), SummarizedExperiment(v.1.38.1), GenomicAlignments(v.1.44.0), plyr(v.1.8.9), BiocIO(v.1.18.0), crayon(v.1.5.3), abind(v.1.4-8), ggdendro(v.0.2.0), gridGraphics(v.0.5-1), locfit(v.1.5-9.12), graphlayouts(v.1.2.2), bit(v.4.6.0), dplyr(v.1.1.4), fastmatch(v.1.1-6), codetools(v.0.2-20), crosstalk(v.1.2.1), bslib(v.0.9.0), paletteer(v.1.6.0), GetoptLong(v.1.0.5), plotly(v.4.11.0), remaCor(v.0.0.20), mime(v.0.13), splines(v.4.5.0), circlize(v.0.4.16), Rcpp(v.1.1.0), dbplyr(v.2.5.0), lars(v.1.3), knitr(v.1.50), blob(v.1.2.4), clue(v.0.3-66), BiocVersion(v.3.21.1), lme4(v.1.1-37), fs(v.1.6.6), Rdpack(v.2.6.4), EBSeq(v.2.6.0), openxlsx(v.4.2.8), ggplotify(v.0.1.2), Matrix(v.1.7-3), statmod(v.1.5.0), fANCOVA(v.0.6-1), tweenr(v.2.0.3), pkgconfig(v.2.0.3), tools(v.4.5.0), cachem(v.1.1.0), RhpcBLASctl(v.0.23-42), rbibutils(v.2.3), RSQLite(v.2.4.3), viridisLite(v.0.4.2), DBI(v.1.2.3), numDeriv(v.2016.8-1.1), graphite(v.1.54.0), fastmap(v.1.2.0), rmarkdown(v.2.29), scales(v.1.4.0), gprofiler2(v.0.2.3), Rsamtools(v.2.24.0), broom(v.1.0.9), AnnotationHub(v.3.16.1), sass(v.0.4.10), patchwork(v.1.3.2), BiocManager(v.1.30.26), insight(v.1.4.2), varhandle(v.2.0.6), farver(v.2.1.2), reformulas(v.0.4.1), aod(v.1.3.3), tidygraph(v.1.3.1), mgcv(v.1.9-3), yaml(v.2.3.10), MatrixGenerics(v.1.20.0), rtracklayer(v.1.68.0), cli(v.3.6.5), purrr(v.1.1.0), txdbmaker(v.1.4.2), lifecycle(v.1.0.4), mvtnorm(v.1.3-3), backports(v.1.5.0), Vennerable(v.3.1.0.9000), BiocParallel(v.1.42.1), annotate(v.1.86.1), MeSHDbi(v.1.44.0), rjson(v.0.2.23), gtable(v.0.3.6), parallel(v.4.5.0), ape(v.5.8-1), testthat(v.3.2.3), limma(v.3.64.3), jsonlite(v.2.0.0), bitops(v.1.0-9), NOISeq(v.2.52.0), bit64(v.4.6.0-1), brio(v.1.1.5), yulab.utils(v.0.2.1), zip(v.2.3.3), geneLenDataBase(v.1.44.0), RcppParallel(v.5.1.11-1), jquerylib(v.0.1.4), GOSemSim(v.2.34.0), zeallot(v.0.2.0), R.utils(v.2.13.0), pbkrtest(v.0.5.5), lazyeval(v.0.2.2), pander(v.0.6.6), ConsensusClusterPlus(v.1.72.0), shiny(v.1.11.1), htmltools(v.0.5.8.1), GO.db(v.3.21.0), rappdirs(v.0.3.3), blockmodeling(v.1.1.8), tinytex(v.0.57), httr2(v.1.2.1), XVector(v.0.48.0), RCurl(v.1.98-1.17), rprojroot(v.2.1.0), treeio(v.1.32.0), mnormt(v.2.1.1), gridExtra(v.2.3), ggsankey(v.0.0.99999), EnvStats(v.3.1.0), boot(v.1.3-31), igraph(v.2.1.4), variancePartition(v.1.38.1), R6(v.2.6.1), sva(v.3.56.0), DESeq2(v.1.48.1), gplots(v.3.2.0), labeling(v.0.4.3), GenomicFeatures(v.1.60.0), cluster(v.2.1.8.1), pkgload(v.1.4.0), aplot(v.0.2.8), GenomeInfoDb(v.1.44.2), nloptr(v.2.2.1), rstantools(v.2.5.0), DelayedArray(v.0.34.1), tidyselect(v.1.2.1), xml2(v.1.4.0), ggforce(v.0.5.0), statsExpressions(v.1.7.1), goseq(v.1.60.0), KernSmooth(v.2.23-26), data.table(v.1.17.8), ComplexHeatmap(v.2.24.1), htmlwidgets(v.1.6.4), fgsea(v.1.34.2), RColorBrewer(v.1.1-3), biomaRt(v.2.64.0), rlang(v.1.1.6), lmerTest(v.3.1-3) and ggnewscale(v.0.5.2)

message("This is hpgltools commit: ", get_git_commit())
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 2315e0db2bd684765eb5d29c71cfe08dc63b4322
## This is hpgltools commit: Mon Sep 8 13:30:32 2025 -0400: 2315e0db2bd684765eb5d29c71cfe08dc63b4322
tmp <- saveme(filename = savefile)
## The savefile is: /lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/savefiles/03differential_expression.rda.xz
## The file does not yet exist.
## The save string is: con <- pipe(paste0('pxz > /lab/singularity/tmrc2_macrophage_deb/202509081525_outputs/savefiles/03differential_expression.rda.xz'), 'wb'); save(list = ls(all.names = TRUE, envir = globalenv()),
##      envir = globalenv(), file = con, compress = FALSE); close(con)
## Error in save(list = ls(all.names = TRUE, envir = globalenv()), envir = globalenv(), : ignoring SIGPIPE signal
tmp <- loadme(filename = savefile)

devtools::load_all(‘~/hpgltools’)

