Note to self: I changed the clusterProfiler output list to store the enrichResults in [[xx_data]][[enrich]], or in the case of go: [[go_data]][[BP_enrich]] instead of enrich_results.
## LogFC cutoff when working on the inclusion sets.
lfc_cutoff <- 0.1
## Adjusted p-value cutoff when working on the inclusion sets.
adjp_cutoff <- 0.1
## Increase the maximum allowed group size when working with clusterProfiler
## This should bring out some of the more general groups like 'cellbody'
max_groupsize <- 2000
## Allow groups higher up in the tree for clusterProfiler results.
go_level <- 2
## Allow 10 GO categories to be displayed when plotting.
go_categories <- 14
## MA plot point outlines
outline <- FALSE
## Speed up clusterProfiler by choosing the correct keytypes
orgdb_from <- "ENSEMBL"
default_fstring <- "~ 0 + condition"Previous papers did not do an explicit subtraction, instead just compared to WT and kept the genes which are > in delta/het vs. wt. There are multiple ways to deal with this and that query has not yet been defined. Later, Theresa came to the conclusion that the subtraction method is not appropriate.
In this document I hope to explore the freshly processed samples and perform some comparisons to see that we have the expected similarities and differences from the prior analysis performed by Theresa.
There is one way in which I expect any/all of these analyses to be explicitly different: this should include the changes produced by April’s renaming of some samples.
My intention is to produce a sample sheet which includes one column with non-umi-deduplicated results and one with deduplicated results. With the exception of the previous point, I hope that the first will be identical (or at least very close to identical) to Theresa’s result while the second I expect will be subtly different – but I am hoping subtly enough that it will not significantly change the interpretation but be a little more precise.
Lets see! I need therefore to make a change to my metadata gathering function to include the umi deduplicated result. I am thinking therefore to create a separate specification for umi-barcoded samples because looking through the logs for umi stuff when they are not used will be too much of a pain…
I have a couple pictures of RPL22 to help me remember the experimental design:
That second picture came from: (Li et al. (2022))
I would like to improve this document by comparing/contrasting the methodologies performed by other groups and those performed by me in it. I never fully appreciated the suite of computational methods applied by previous groups when examining TRAP data; I instead simply followed Theresa’s notebook without considering other possibilities.
I therefore spent a little time stepping through her thesis and pulling out the relevant papers in the hopes of learning these various methods. I should therefore be able soon to compare/contrast the various methods employed by other labs in addition to copying Theresa’s logic.
The following block assumes the full tree of preprocessed data with the logs from the trimmer, mapping, umi deduplication, counting, etc. As a result it cannot work in the container which has only the various count tables.
As a result, I am including a copy of this sheet after running the following block in my working tree. I suppose for the moment you will have to trust that it worked. (for right now, when testing out this container, I am just sending the R working directory to my tree for this block, then moving it back.
I will need to manually edit one column though, the symlink column from Theresa has a series of paths which do not work in the container.
umi_spec <- make_rnaseq_spec(umi = TRUE)
iprgc_2022_meta <- gather_preprocessing_metadata("sample_sheets/20240606_only_umd_sequenced.xlsx",
spec = umi_spec, species = "mm39_112", verbose = FALSE,
basedir = "preprocessing/umd_sequenced")
colnames(iprgc_2022_meta[["new_meta"]])
head(iprgc_2022_meta[["new_meta"]])sample_sheet <- "sample_sheets/20240606_only_umd_sequenced_modified.xlsx"
msigdb <- "reference/msigdb_v2024.1.Mm.db"
msig_data <- NULL
make_transparent <- function() {
ggplot2::theme(
panel.background = element_rect(fill = 'transparent'),
plot.background = element_rect(fill = 'transparent', color = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.background = element_rect(fill = 'transparent'),
legend.box.background = element_rect(fill = 'transparent'))
}I will figure out if I can leave mSigDB M2 in this image; if not, then any analyses depending on those gene sets will fail a priori.
m2_gsc <- try(load_gmt_signatures(signatures = msigdb,
signature_category = "M2"), silent = TRUE)
## I do not think I have permission to load the msigdb in the container
## So, if this fails, just load it from GSVAData, oh wait no, GSVAdata is human.
if ("try-error" %in% class(m2_gsc)) {
warning("Unable to load the M2 MsigDB data.")
}## Warning: Unable to load the M2 MsigDB data.
From this point on, I am hoping/intending to pull liberally from Theresa’s notebook with a diversion to compare the three datasets:
Lets find out! But first, annotations!
I am pulling this from Theresa’s anxontrapR_pipeline.Rmd, primarily because it looks similar to the other documents, but was modified more recently. I will change it slightly, primarily because I grabbed a new mmusculus assembly and therefore I will pull the mmusculus annotations from a specific biomart (Smedley et al. (2009)) archive that should match it.
A note from the future: multiple ensembl archive servers have been taken offline since last I ran this. Let us see if Feb. 2023 still works.
In the recent past, ensembl queries have become inconsistent, failing much more often than ever in the past. I do not think this is the fault of ensembl; but I think I need a fallback mechanism for collecting annotation information.
In the case of ensembl, it should be trivial (but less fun) to use a combination of the locally installed orgdb and txdb databases.
This does open a risk that the set of genes with annotations will be different depending on when the container is run due to differences between the orgdb/txdb instance and the Feb 2023 biomart. I am not sure there is much I can do about that except to bundle the set of annotations I downloaded in the container – since load_biomart_annotations() does save a rda copy of its download.
ok, I did both. If you, dear reader, wish to download your own annotations, and ensembl is having troubles, the following should work without a problem; in addition the rda annotations are in /data of the container and should get loaded.
tx_gene_map <- data.frame()
mm_annot <- try(load_biomart_annotations(species = "mmusculus", year = "2023", month = "02"))## The biomart annotations file already exists, loading from it.
if ("try-error" %in% class(mm_annot)) {
fields <- c("ACCNUM", "ENSEMBL", "ENSEMBLTRANS", "ENTEZID", "GENENAME", "SYMBOL")
orgdb_annot <- load_orgdb_annotations("org.Mm.eg.db", fields = fields)
gene_info <- orgdb_annot[["genes"]]
## Note, there are a bunch of variants of the txdb package one might use.
## I do not think it matters a lot for our purposes, but I suspect that if we used
## a mismatched BSgenome and tried to pull CDS sequences, that might end badly.
pkg <- "TxDb.Mmusculus.UCSC.mm10.knownGene"
tx_annot <- load_txdb_annotations(pkg)
transcripts <- tx_annot[["TX"]]
transcripts[["tx"]] <- gsub(x = transcripts[["TXNAME"]],
pattern = "\\.\\d+$", replacement = "")
mm_annot <- merge(gene_info, transcripts, by.x = "ensembltrans", by.y = "tx")
rownames(mm_annot) <- make.names(mm_annot[["ensembl"]], unique = TRUE)
} else {
mm_annot <- mm_annot[["annotation"]]
mm_annot[["txid"]] <- paste0(mm_annot[["ensembl_transcript_id"]], ".", mm_annot[["version"]])
rownames(mm_annot) <- make.names(mm_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- mm_annot[, c("txid", "ensembl_gene_id")]
}The primary difference between my block and Theresa’s are:
Given that we are excluding a bunch of the older samples, the set of colors I expect to find is different; so I will make explicit here the various colors used to denote location/genotype/time/etc.
April turned me onto this website ‘paletton.com’ for this kind of stuff and I will try and pick out palettes which basically match what I am getting with the original colors.
color_choices <- list(
"all" = list(
"p08_het_dlgn" = "#E7298A",
"p15_het_dlgn" = "#E7298A",
"p08_het_retina" = "#238B45",
"p15_het_retina" = "#238B45",
"p08_het_scn" = "#4292C6",
"p15_het_scn" = "#4292C6",
"p08_ko_dlgn" = "#C994C7",
"p15_ko_dlgn" = "#C994C7",
"p08_ko_retina" = "#74c476",
"p15_ko_retina" = "#74c476",
"p08_ko_scn" = "#9BCAE1",
"p15_ko_scn" = "#9BCAE1",
"p08_wt_dlgn" = "#980043",
"p15_wt_dlgn" = "#980043",
"p08_wt_retina" = "#004008",
"p15_wt_retina" = "#004008",
"p08_wt_scn" = "#08519C",
"p15_wt_scn" = "#08519C",
"p60_wt_dlgn" = "#333333",
"p60_wt_retina" = "#222222",
"p60_wt_scn" = "#111111"),
"geno_loc" = list(
"het_dlgn" = "#E7298A",
"het_retina" = "#238B45",
"het_scn" = "#4292C6",
"ko_dlgn" = "#C994C7",
"ko_retina" = "#74c476",
"ko_scn" = "#9BCAE1",
"wt_dlgn" = "#980043",
"wt_retina" = "#004008",
"wt_scn" = "#08519C"),
"location" = list(
"retina" = "#004008",
"dlgn" = "#980043",
"scn" = "#08519C"),
"genotype" = list(
"wt" = "#74c476",
"het" = "#238B45",
"ko" = "#006D2C"),
"time" = list(
"p08" = "#5E104B",
"p15" = "#4E9231"))
label_column <- "mgi_symbol" ## Set the column used to extract gene symbols rather than ENSG.....
colors <- color_choices[["geno_loc"]]
time_colors <- list(
"p08_het_dlgn" = "#E7298A",
"p15_het_dlgn" = "#8a1852",
"p08_het_retina" = "#238B45",
"p15_het_retina" = "#155329",
"p08_het_scn" = "#4292C6",
"p15_het_scn" = "#275776",
"p08_ko_dlgn" = "#C994C7",
"p15_ko_dlgn" = "#785877",
"p08_ko_retina" = "#74C476",
"p15_ko_retina" = "#457546",
"p08_ko_scn" = "#9BCAE1",
"p15_ko_scn" = "#5d7987")There is one noteworthy sample: iprgc_103, it was effectively replaced when April renamed the samples and so exists in the v1 data, but not v2/v3; they instead have the newly named samples which I called iprgc_123 to iprgc_130. As a result, I copied the annotations for iprgc_123 to my column so that there is no discrepency in terms of genotype/location/time.
At the moment I have not included the original counts in this container because we made some changes to the mapping strategy and also found that a couple samples were mixed up in sequencing; as a result I documented all of the changes in the sample sheets and preprocessing documents and excluded the original files.
This is also why some columns in the sample sheet have suffixes like ‘adh’ and ‘atb’, those denote from whom the relevant metadata columns came from.
In the following I make two more versions of the data, one remapped with the changes to the sample identities, and one with deduplication applied.
mm38_hisat_v2 <- create_se(sample_sheet, gene_info = mm_annot,
file_column = "hisat_count_table") %>%
set_conditions(fact = "geno_loc_atb") %>%
set_batches(fact = "time_atb") %>%
set_colors(color_choices[["geno_loc"]])## Reading the sample metadata.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 69 rows(samples) and 76 columns(metadata fields).
## Warning in create_se(sample_sheet, gene_info = mm_annot, file_column =
## "hisat_count_table"): Some samples were removed when cross referencing the
## samples against the count data.
## Matched 25404 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final summarized experiment has 25425 rows and 76 columns.
## The numbers of samples by condition are:
##
## het_dlgn het_retina het_scn ko_dlgn ko_retina ko_scn wt_dlgn
## 7 7 7 6 6 6 11
## wt_retina wt_scn
## 11 7
## The number of samples by batch are:
##
## p08 p15 p60
## 31 34 3
## class: SummarizedExperiment
## dim: 25425 68
## metadata(7): notes title ... study researcher
## assays(1): ''
## rownames(25425): ENSMUSG00000000001 ENSMUSG00000000003 ...
## ENSMUSG00001074846 ENSMUSG00002076083
## rowData names(15): ensembl_gene_id ensembl_transcript_id ...
## uniprot_gn_symbol txid
## colnames(68): iprgc_62 iprgc_63 ... iprgc_129 iprgc_130
## colData names(76): rownames sampleid ... umi_dedup_mean_umi_per_pos
## umi_dedup_max_umi_per_pos
mm38_hisat_v3 <- create_se(sample_sheet, gene_info = mm_annot,
file_column = "umi_dedup_output_count") %>%
set_conditions(fact = "geno_loc_atb") %>%
set_batches(fact = "time_atb") %>%
set_colors(color_choices[["geno_loc"]])## Reading the sample metadata.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 69 rows(samples) and 76 columns(metadata fields).
## Warning in create_se(sample_sheet, gene_info = mm_annot, file_column =
## "umi_dedup_output_count"): Some samples were removed when cross referencing the
## samples against the count data.
## Matched 25404 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final summarized experiment has 25425 rows and 76 columns.
## The numbers of samples by condition are:
##
## het_dlgn het_retina het_scn ko_dlgn ko_retina ko_scn wt_dlgn
## 7 7 7 6 6 6 11
## wt_retina wt_scn
## 11 7
## The number of samples by batch are:
##
## p08 p15 p60
## 31 34 3
## class: SummarizedExperiment
## dim: 25425 68
## metadata(7): notes title ... study researcher
## assays(1): ''
## rownames(25425): ENSMUSG00000000001 ENSMUSG00000000003 ...
## ENSMUSG00001074846 ENSMUSG00002076083
## rowData names(15): ensembl_gene_id ensembl_transcript_id ...
## uniprot_gn_symbol txid
## colnames(68): iprgc_62 iprgc_63 ... iprgc_129 iprgc_130
## colData names(76): rownames sampleid ... umi_dedup_mean_umi_per_pos
## umi_dedup_max_umi_per_pos
all_fact <- paste0(colData(mm38_hisat_v3)[["time_atb"]], "_",
colData(mm38_hisat_v3)[["geno_loc_atb"]])
colData(mm38_hisat_v3)[["time_geno_loc"]] <- all_factNote the end of the previous block, I created a factor out of the combination of time, genotype, and location. In a future invocation of this notebook, I will change the pairwise comparisons to add each of these three factors to the statistical model instead of this. The code to do that is not quite ready yet.
Let’s look at the number of non-zero genes for all samples versus the coverage.
As above, this does not get run because I did not copy the count tables.
But these do!
## The colors used in the expressionset are: #004008, #08519C, #238B45, #4292C6, #74c476, #980043, #9BCAE1, #C994C7, #E7298A.
## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68"
## [7] "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_78" "iprgc_80" "iprgc_81" "iprgc_82" "iprgc_83"
## [19] "iprgc_84" "iprgc_85" "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89"
## [25] "iprgc_90" "iprgc_91" "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95"
## [31] "iprgc_96" "iprgc_97" "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104"
## [37] "iprgc_105" "iprgc_106" "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111"
## [43] "iprgc_112" "iprgc_113" "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118"
## [49] "iprgc_121" "iprgc_123" "iprgc_124" "iprgc_125" "iprgc_126" "iprgc_127"
## [55] "iprgc_128" "iprgc_129" "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.
## i The deprecated feature was likely used in the hpgltools package.
## Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## A non-zero genes plot of 68 samples.
## These samples have an average 13.7 CPM coverage and 15744 genes observed, ranging from 13692 to
## 17083.
## Warning in pp(file = "01diagnostic_images/nonzero_v2_unfiltered.pdf"): The
## directory: 01diagnostic_images does not exist, will attempt to create it.
v2_nonzero[["plot"]]
plotted <- dev.off()
v3_nonzero <- plot_nonzero(mm38_hisat_v3, y_intercept = 0.65)## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68"
## [7] "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_78" "iprgc_80" "iprgc_81" "iprgc_82" "iprgc_83"
## [19] "iprgc_84" "iprgc_85" "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89"
## [25] "iprgc_90" "iprgc_91" "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95"
## [31] "iprgc_96" "iprgc_97" "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104"
## [37] "iprgc_105" "iprgc_106" "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111"
## [43] "iprgc_112" "iprgc_113" "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118"
## [49] "iprgc_119" "iprgc_121" "iprgc_123" "iprgc_124" "iprgc_125" "iprgc_126"
## [55] "iprgc_127" "iprgc_128" "iprgc_129" "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 68 samples.
## These samples have an average 4.803 CPM coverage and 15787 genes observed, ranging from 13868 to
## 17101.
pp(file = "01diagnostic_images/nonzero_v3_unfiltered.pdf")
v3_nonzero[["plot"]]
plotted <- dev.off()Oh wow, I did not expect such a profound effect on the cpm values on the more saturated libraries. I guess in retrospect I should have?
Also note to self, we are not messing with p60.
## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68"
## [7] "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_81" "iprgc_82" "iprgc_83" "iprgc_84" "iprgc_85"
## [19] "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89" "iprgc_90" "iprgc_91"
## [25] "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95" "iprgc_96" "iprgc_97"
## [31] "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104" "iprgc_105" "iprgc_106"
## [37] "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111" "iprgc_112" "iprgc_113"
## [43] "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118" "iprgc_121" "iprgc_123"
## [49] "iprgc_124" "iprgc_125" "iprgc_126" "iprgc_127" "iprgc_128" "iprgc_129"
## [55] "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Not putting labels on the plot.
pp(file = "01diagnostic_images/nonzero_v2_filt.pdf")
v2_nonzero_filt[["plot"]]
plotted <- dev.off()
v3_nonzero_filt <- plot_nonzero(mm38_hisat_v3, plot_labels = FALSE)## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68"
## [7] "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_81" "iprgc_82" "iprgc_83" "iprgc_84" "iprgc_85"
## [19] "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89" "iprgc_90" "iprgc_91"
## [25] "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95" "iprgc_96" "iprgc_97"
## [31] "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104" "iprgc_105" "iprgc_106"
## [37] "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111" "iprgc_112" "iprgc_113"
## [43] "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118" "iprgc_119" "iprgc_121"
## [49] "iprgc_123" "iprgc_124" "iprgc_125" "iprgc_126" "iprgc_127" "iprgc_128"
## [55] "iprgc_129" "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Not putting labels on the plot.
Once again, I do not want to lose the previous code, so here is the v1 invocation
v2_norm <- normalize(mm38_hisat_v2, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 10298 low-count genes (15127 remaining).
## transform_counts: Found 8465 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn
## Shapes are defined by p08, p15.
pp(file = "01diagnostic_images/v2_norm_pca.pdf")
v2_norm_pca[["plot"]]
plotted <- dev.off()
v3_norm <- normalize(mm38_hisat_v3, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn
## Shapes are defined by p08, p15.
Ibid.
v1_norm <- normalize(mm38_hisat_v1, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
plot_pca(v1_norm)To my eyes it looks like we just have 1 weirdo p15 sample? Deduplication had a minor but significant effect on the PCA.
With that in mind, let us look at Theresa’s WORKING document and see what we can recapitulate.
Theresa’s document: The TRAP protocol has some variability which is introduced at different stpdf including homogenization, antibody labeling, pulldown efficiency/specificity, sample handling during cleanup, and library prep/sequencing. We know from Rashmi’s QC that there is variability at the level of pulldown efficiency (amount of RNA isolated). She is doing a good job of keeping track of this for all her samples and we have validated her P8 results (attached supplementary figure 3D). We consistently see clear differences between control and cre samples for the retina, which makes sense because the cell bodies are in the retina. The target tissue differences are smaller, which also makes sense for axon-TRAP. We think that some of her P15 samples are not good based on low amounts of isolated RNA from cre(+) retina samples. We plan to drop these samples and not perform additional isolations at this time point. Based on this (and the general lack of large developmental effects), we were planning to focus on presenting the P8 data only in the paper. Interested to hear your thoughts in this…
My notes: Theresa’s first operations in this notebook were to:
v3_loc_geno <- set_conditions(mm38_hisat_v3, fact = "location_atb",
colors = color_choices[["location"]]) %>%
set_batches(fact = "genotype_atb")## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 19
## The number of samples by batch are:
##
## het ko wt
## 21 18 26
At different times, it appears to me that Theresa has preferred slightly different normalization methods, primarily a mix of TMM and quantile.
Thus I will use different suffix letters to denote various normalizations employed, and if they turn out the same I will pick one arbitrarily.
loc_geno_nq <- normalize(v3_loc_geno, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
location_genotype_pca <- plot_pca(loc_geno_nq)
pp(file = "01diagnostic_images/location_genotype_norm_pca.pdf")
location_genotype_pca[["plot"]]
plotted <- dev.off()
location_genotype_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
## ok, I have two weirdo samples which look very much like they are actually dlgn.
## These are sample IDs iprgc_66 and iprgc_130
loc_geno_nt <- normalize(v3_loc_geno, transform = "log2", convert = "cpm",
filter = TRUE, norm = "tmm")## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 42869 values equal to 0, adding 1 to the matrix.
location_genotype_tmm_pca <- plot_pca(loc_geno_nt)
pp(file = "01diagnostic_images/location_genotype_tmm_pca.pdf")
location_genotype_tmm_pca[["plot"]]## Warning in MASS::cov.trob(data[, vars], wt = weight * nrow(data)): Probable
## convergence failure
## Warning in MASS::cov.trob(data[, vars], wt = weight * nrow(data)): Probable
## convergence failure
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
## Warning in MASS::cov.trob(data[, vars], wt = weight * nrow(data)): Probable
## convergence failure
## Warning in MASS::cov.trob(data[, vars], wt = weight * nrow(data)): Probable
## convergence failure
A random thought about these PCA plots, it might be worth while to add a panel below the legend with the sample numbers per condition/batch.
Of course, the same information is provided in a more fun fashion via my silly sankey function:
sample_sankey <- plot_meta_sankey(v3_loc_geno, color_choices = color_choices,
factors = c("genotype_atb", "location_atb", "time_atb"))## Warning: attributes are not identical across measure variables; they will be
## dropped
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## i Please use the `linewidth` argument instead.
## i The deprecated feature was likely used in the ggsankey package.
## Please report the issue at <https://github.com/davidsjoberg/ggsankey/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
pp(file = "01diagnostic_images/design_sankey.pdf")
sample_sankey[["ggplot"]]
plotted <- dev.off()
sample_sankey## A sankey plot describing the metadata of 65 samples,
## including 30 out of 0 nodes and traversing metadata factors:
## genotype_atb, location_atb, time_atb.
Rashmi came by and we discussed the samples a little. She suggested that is likely that we will need to exclude the 202205 samples, these may be identified by a few ways, most easily I think via the ‘project_ah’ column, they are the 021_1 samples.
My sense was that she concurred with my interpretation of the umi deduplication, so I will continue using the deduplicated results exclusively, at least for now.
One of Theresa’s first checks was wisely for melanopsin. Let us repeat a version of this:
An important note: Indrajeet Patil removed the groupedstats and its associated plotting library from CRAN/github/etc. I am not certain what happened, but that necessitates a change in how I plot this.
opn4_exprs <- data.frame(combined = colData(loc_geno_nt)[["geno_loc_atb"]],
location = colData(loc_geno_nt)[["location_atb"]],
genotype = colData(loc_geno_nt)[["genotype_atb"]],
opn = assay(loc_geno_nt)["ENSMUSG00000021799", ])
## groupedstats::grouped_summary(opn4_exprs, location, opn)
## opn4_location <- ggbetweenstats(data = opn4_exprs, x = location, y = opn)
## pp(file = "images/ggbetween_location.pdf")
## opn4_location
## plotted <- dev.off()
## opn4_location
## opn4_genotype <- ggbetweenstats(data = opn4_exprs, x = genotype, y = opn)
## pp(file = "images/ggbetween_location.pdf")
## opn4_genotype
## plotted <- dev.off()
## opn4_genotype
## opn4_combined <- ggbetweenstats(data = opn4_exprs, x = combined, y = opn)
## pp(file = "images/ggbetween_combined.pdf")
## opn4_combined
## plotted <- dev.off()
## opn4_combinedok, so I plotted the question a bit differently, but got the same answer.
Here is the text of Theresa’s notebook following this analysis:
“Ugh oh, looks like there is at least one retina KO sample that has some melanopsin expression in it. Turns out ipRGC_07 is a bad egg which is supposed to be a KO but has melanopsin expression. It’s friends which were pooled from the same mice are iprgc_06 and iprgc_08, so we need to exclude all these samples.”
I am also seeing some knockout expression with some caveats: I do not have the affected samples in my dataset (iprgc_07) and the levels I am seeing are quite low – I will look in IGV to double check, but I strongly suspect that these are some piddly reads near the UTRs.
Onward!
Theresa’s first pca was of log2 cpm values. I might add quantile/tmm to this?
v3_location <- set_conditions(mm38_hisat_v3, fact = "location_atb") %>%
set_batches(fact = "genotype_atb") %>%
set_colors(color_choices[["location"]])## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 19
## The number of samples by batch are:
##
## het ko wt
## 21 18 26
v3_location_norm <- normalize(v3_location, filter = TRUE, norm = "quant",
transform = "log2", convert = "cpm")## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
v3_location_pca <- plot_pca(v3_location_norm)
pp(file = "01diagnostic_images/v3_location_norm_pca.pdf")
v3_location_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
## png
## 2
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
Once again we see that samples iprgc_66 and iprgc_130 are likely actually DLGN and not SCN. I am therefore going to add a column to the sample sheet noting this, and remove them from the expressionset.
I will thus replot the data after removing those two. If we want to see what it looks like with the re-attributed locations, we can do so.
Theresa has a nice change to the PCA plotter in which she sets the alpha channel as an additional visual queue for a metadata factor…
mm38_hisat_v3 <- subset_se(mm38_hisat_v3, subset="sampleid!='iprgc_130'") %>%
subset_se(subset="sampleid!='iprgc_66'")
v3_location <- set_conditions(mm38_hisat_v3, fact = "location_atb") %>%
set_batches(fact = "genotype_atb") %>%
set_colors(color_choices[["location"]])## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 17
## The number of samples by batch are:
##
## het ko wt
## 20 18 25
v3_location_norm <- normalize(v3_location, filter = TRUE, norm = "quant",
transform = "log2", convert = "cpm")## Removing 10162 low-count genes (15263 remaining).
## transform_counts: Found 8867 values equal to 0, adding 1 to the matrix.
filtered_location_pca <- plot_pca(v3_location_norm)
pp(file = "02filtered_images/filtered_location_pca.pdf")## Warning in pp(file = "02filtered_images/filtered_location_pca.pdf"): The
## directory: 02filtered_images does not exist, will attempt to create it.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
removed_sankey <- plot_meta_sankey(v3_location, color_choices = color_choices,
factors = c("genotype_atb", "location_atb", "time_atb"))## Warning: attributes are not identical across measure variables; they will be
## dropped
pp(file = "02filtered_images/filtered_sankey.pdf")
removed_sankey[["ggplot"]]
plotted <- dev.off()
removed_sankey## A sankey plot describing the metadata of 63 samples,
## including 30 out of 0 nodes and traversing metadata factors:
## genotype_atb, location_atb, time_atb.
Here is Theresa’s text, recall once again that I do not have some of these older samples (iprgc_62):
PC1 vs PC2 identifies retina vs axon is still the main component of variation. We do see though that in the PC2 direction, we see with the new samples added, we don’t see separation based on axonal targets (dLGN vs SCN). In the PC1 vs PC3 plot, we see that it’s PC3 where we start to see variation correlated with axonal compartment. Let’s look at PC1 vs PC2 colored by batch (when they were processed/sequenced) to see if that is what is contributing so much variation in PC2.
Side note: ipRGC 62 seems like an odd ball. This seems to me like it should have been a dLGN P08 sample. Is there any possibility this got mislabeled early on? I went back and double checked to see if all my processing is correct and it indeed was labeled an SCN P15 from the time I got the samples, and it is indeed.
I now switched to Theresa’s document ‘WORKING_axonTRAP…’ and will start pulling sections from it. I am reasonably certain I have reasonably similar sample distributions, so I presume I can invoke similar/identical calls for DESeq and friends.
In the block immediately before the DE analyses, Theresa created a subset expressionset of only p08 retinas. Thus this initial DE I assume will be used to subtract for the SCN/DLGN analyses that follow. (I guess I could read ahead and find out, but no! I want to be a blank slate)
Theresa’s primary workflow makes heavy use of DESeq2 (Love, Huber, and Anders (2014)) and sva (Leek et al. (2012)). In some(most?) of Theresa’s invocations of the all_pairwise() function, she excludes the other methods that it performs. In this workbook, I left those methods on, thus we can evaluate the relative performance DESeq2 vs. some (all? I may have disabled EBSeq/dream because they were taking too long) of the following:
mm38_p8_retina <- subset_se(mm38_hisat_v3, subset = "time_atb=='p08' & location_atb=='retina'")
mm_normal_p8_ret_de <- all_pairwise(mm38_p8_retina, model_svs = "svaseq",
model_fstring = "~ 0 + condition", filter = TRUE)## het_retina ko_retina wt_retina
## 3 3 5
## Removing 12001 low-count genes (13424 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Setting 2593 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## conditions
## het_retina ko_retina wt_retina
## 3 3 5
## conditions
## het_retina ko_retina wt_retina
## 3 3 5
## conditions
## het_retina ko_retina wt_retina
## 3 3 5
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 3 comparisons.
There seems to be a discrepency with previous iterations of this. Let us simplify to just doing deseq and find what is causing it. In my previous iteration, I got 3632 genes in the unique(c()) or het+ko.
deseq_only <- deseq_pairwise(mm38_p8_retina, model_svs = "svaseq",
model_fstring = default_fstring, filter = TRUE)## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
deseq_hetkeeper_genes <- deseq_only$all_tables$wt_retina_vs_het_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
deseq_kokeeper_genes <- deseq_only$all_tables$wt_retina_vs_ko_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
deseq_keepergenes <- unique(c(rownames(deseq_hetkeeper_genes),
rownames(deseq_kokeeper_genes)))
length(deseq_keepergenes)## [1] 3632
deseq_pair_hetkeeper_genes <- mm_normal_p8_ret_de$deseq$all_tables$wt_retina_vs_het_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
deseq_pair_kokeeper_genes <- mm_normal_p8_ret_de$deseq$all_tables$wt_retina_vs_ko_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
deseq_pair_keepergenes <- unique(c(rownames(deseq_pair_hetkeeper_genes),
rownames(deseq_pair_kokeeper_genes)))
length(deseq_pair_keepergenes)## [1] 3632
The following invocation performed by Theresa filters the wt/het comparison for only those genes which increased by at least 0.25 logFC with a significant adjusted p-value. I assume that this is to use the wt samples as a translational control for the ket/ko comparisons; I am therefore thinking that for my purposes, I will therefore separate the contrasts from all_pairwise do this in a stepwise fashion…
The block of code immediately following Theresa’s all_pairwise() invocation is a little confusing for me and warrants some explanation by me to me in the hopes that I do not misunderstand what is happening and the goals therein.
I think I can safely assume that the goal here is to pull out the IDs which increased in het with respect to wild type; even if by a small margin, as long as it is statistically significant vis a vis the adjusted p-value.
I am going to perform what I think is the same thing in a slightly different fashion so that I can share a copy of the results with whomever is interested. I will also repeat Theresa’s invocation and prove to myself that I understood and got the same answer.
wt_het_keeper <- list("het_vs_wt" = c("het_retina", "wt_retina"))
het_wt_table <- combine_de_tables(
mm_normal_p8_ret_de, keepers = wt_het_keeper, label_column = label_column,
excel = "03theresa_comparison_excel/het_retina_control.xlsx")## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
wanted_sig <- extract_significant_genes(
het_wt_table, lfc = 0.25, according_to = "deseq",
excel = "03theresa_comparison_excel/het_retina_control-sig.xlsx")
wanted_het_increased <- wanted_sig[["deseq"]][["ups"]][["het_vs_wt"]]
increased_het_genes <- rownames(wanted_het_increased)Here are Theresa’s next lines:
mm_de_normal_p8_ret <- mm_normal_p8_ret_de
hetkeeper_genes <- mm_de_normal_p8_ret$deseq$all_tables$wt_retina_vs_het_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
kokeeper_genes <- mm_de_normal_p8_ret$deseq$all_tables$wt_retina_vs_ko_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(hetkeeper_genes),
rownames(kokeeper_genes)))
## We know a priori that Opn4 is ENSMUSG00000021799
## I do not expect to see it in this set, it should be higher in wt
## retina vs ko retina by a significant margin.
"ENSMUSG00000021799" %in% keepergenes## [1] TRUE
I think Rashmi made a compelling point which illustrates why we likely should expect the expression of Opn4 to significantly higher in the heterozygotes vs wild-type:
This makes me wonder if any normalization methods exist which do something like multiply the values by some value related to the proportion of observed genes; and/or if this is a good/bad/indifferent idea.
Also, just a note for me to remember: RPL22, not RPS22, for some reason I keep thinking the small subunit.
hetkeeper_genes <- mm_normal_p8_ret_de$deseq$all_tables$wt_retina_vs_het_retina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
testthat::expect_true(nrow(hetkeeper_genes) == length(increased_het_genes))
taa_keepers <- sort(rownames(hetkeeper_genes))
atb_keepers <- sort(increased_het_genes)
testthat::expect_equal(taa_keepers, atb_keepers)Yay! I can read! Now let us repeat for the KO vs wt
wt_ko_keeper <- list("ko_vs_wt" = c("ko_retina", "wt_retina"))
ko_wt_table <- combine_de_tables(
mm_normal_p8_ret_de, keepers = wt_ko_keeper, label_column = label_column,
excel = "03theresa_comparison_excel/ko_retina_control.xlsx")## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
wanted_sig <- extract_significant_genes(
ko_wt_table, lfc = 0.25, according_to = "deseq",
excel = "03theresa_comparison_excel/ko_retina_control-sig.xlsx")
wanted_ko_increased <- wanted_sig[["deseq"]][["ups"]][["ko_vs_wt"]]
increased_ko_genes <- rownames(wanted_ko_increased)The next thing performed in Theresa’s document is a unique(concatenation of these two gene groups), thus sucking up every gene which was significantly higher in either the knockout or heterzyous samples with respect to wild-type.
This was followed by a couple of merge operations of a little bit of the annotation data; I am not sure I understand the goal yet…
Here is her code. I copied the annotation ‘mgi_symbol’ column to ‘external_gene_name’ so that I need not change any of her code. I am assuming this is the appropriate column of interest, I do not know this for certain, but it seems quite likely.
While I am at it, here is the set_sig_limma() function from Theresa’s helpers.R
set_sig_limma <- function(limma_tbl, factors = NULL) {
if (is.null(factors)) {
#set significance for plotting colors
limma_tbl$Significance <- NA
limma_tbl[abs(limma_tbl$logFC) < 1 | limma_tbl$adj.P.Val > .05, "Significance"] <- "Not \nEnriched"
limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- "Disease \nUpregulated"
limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- "Disease \nDownregulated"
limma_tbl$Significance <- factor(limma_tbl$Significance, levels = c("Upregulated", "Downregulated", "Not \nEnriched"))
} else {
limma_tbl$Significance <- NA
limma_tbl[abs(limma_tbl$logFC) < 1 | limma_tbl$adj.P.Val > .05, "Significance"] <- "Not \nEnriched"
if(nrow(limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ]) != 0) {
limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- factors[1]
}
if (nrow(limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ]) != 0) {
limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- factors[2]
}
limma_tbl$Significance <- factor(limma_tbl$Significance, levels = c(factors, "Not \nEnriched"))
}
return(limma_tbl)
}mm_annot[["external_gene_name"]] <- mm_annot[["mgi_symbol"]]
keepergenes <- unique(c(rownames(hetkeeper_genes), rownames(kokeeper_genes)))
length(keepergenes)## [1] 3632
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in%
rownames(mm_de_normal_p8_ret$deseq$all_tables$ko_retina_vs_het_retina)) %>%
distinct()
mm_de_normal_p8_ret$deseq$all_tables$ko_retina_vs_het_retina <- merge(
mm_de_normal_p8_ret$deseq$all_tables$ko_retina_vs_het_retina, annots_to_merge,
by.x = 0, by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_normal_p8_ret$deseq$all_tables$ko_retina_vs_het_retina %>%
dplyr::mutate(logFC = -logFC) %>%
set_sig_limma(factors = c("Het Enriched", "KO Enriched"))My version of the above task makes use of the excludes option of combine_de_tabes. Given the set of unique gene IDs increased in the het/ko, I can ask to exlude anything not in that set. I could also have more parsimoniously directly excluded any gene ID increased in the wt samples. But, Theresa already provided the code to do the former, so it will be less typing/opportunity for silly mistakes to just do that.
both_increased_genes <- unique(c(increased_het_genes, increased_ko_genes))
## arbitrairly grab all genes from one of my data structures.
all_genes <- rownames(exprs(mm38_hisat_v3))
exclude_idx <- all_genes %in% both_increased_genes
summary(exclude_idx)## Mode FALSE TRUE
## logical 21793 3632
My April 2025 version of this shows 21,793 and 3,632 genes in this set. Is that still true? (As of 20260311, it is!)
exclude_increased_genes <- all_genes[exclude_idx]
retina_keepers <- list(
"het_vs_wt" = c("het_retina", "wt_retina"),
"ko_vs_wt" = c("ko_retina", "wt_retina"),
"ko_vs_het" = c("ko_retina", "het_retina"))
## A reminder to myself: there is also a parameter 'wanted_genes'
## which does effectively the same thing as excludes in this context;
## excludes was originally written to allow flexible, keyword-based
## exclusion.
p8_retina_tables <- combine_de_tables(
mm_normal_p8_ret_de, keepers = retina_keepers,
wanted_genes = both_increased_genes, label_column = label_column,
excel = glue("03theresa_comparison_excel/p8_retina_kept_genes_increased_in_wt_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
p8_retina_sig <- extract_significant_genes(
p8_retina_tables, according_to = "deseq",
excel = glue("03theresa_comparison_excel/p8_retina_kept_genes_increased_in_wt_sig-v{ver}.xlsx"))
opposite_p8_retina_tables <- combine_de_tables(
mm_normal_p8_ret_de, keepers = retina_keepers,
excludes = both_increased_genes, label_column = label_column,
excel = glue("03theresa_comparison_excel/p8_retina_removed_genes_increased_in_wt_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
The following is a copy/paste from Theresa containing the remaining tasks she performed and will provide the template for implementation of the final tasks.
This picks up with the lines from her notebook immediately following the invocation of ‘set_sig_limma(factors = c(“Het Enriched” …’.
For all of the remaining blocks I will copy in her code, turn off its evaluation, run the blocks manually, compare them to her notebook output, then enable each block as I ensure I understand it.
I will likely therefore introduce some small formatting changes and add some additional GSEA/enrichment tasks once the non-specific filtering is complete.
df <- df %>%
filter(Row.names %in% keepergenes)
labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 9)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 11)
labels <- rbind(labels_ups, labels_downs)
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1, 1)) +
geom_hline(yintercept = -log10(0.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
theme(legend.position = "right") +
scale_color_manual(values = c("#F8766D", "#00BFC4", "Grey")) +
geom_label_repel(
data = filter(df,
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
external_gene_name %in% labels$external_gene_name),
## nudge_x = -0.5,
nudge_y = 3, max.overlaps = 15) +
xlim(c(-3, 6))
pp(file = "03theresa_comparison_images/p08_retina_DE_1312024.pdf")## Warning in pp(file = "03theresa_comparison_images/p08_retina_DE_1312024.pdf"):
## The directory: 03theresa_comparison_images does not exist, will attempt to
## create it.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_label_repel()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 2 rows containing missing values or values outside the scale range
## (`geom_label_repel()`).
## write_xlsx() wrote excel/retinahet_vs_retinako_WTfiltered.xlsx.
## The cursor is on sheet first, row: 3635 column: 13.
## [1] 21
## [1] 69
regulated_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(logFC) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(abs(logFC) >= 1)
## gsea_result_ko <- gost(query = ko_genes$external_gene_name,
## organism = "mmusculus",
## evcodes = TRUE,
## ordered_query = TRUE)
gsea_result_het <- gost(query = het_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
##gsea_result_alldysregulated <- gost(query = alldysregulated_genes$external_gene_name,
## organism = "mmusculus",
## evcodes = TRUE,
## ordered_query = TRUE)I have a function in my package which seeks to make gProfiler queries a bit more complete and easy. Let us see how similar the result is…
rownames(alldysregulated_genes) <- alldysregulated_genes[["Row.names"]]
alldysregulated_genes[["Row.names"]] <- NULL
het_gp <- simple_gprofiler(rownames(alldysregulated_genes),
species = "mmusculus",
excel = glue("excel/het_gprofiler-v{ver}.xlsx"))
het_gp
enrichplot::dotplot(het_gp[["BP_enrich"]])
gp_pair <- enrichplot::pairwise_termsim(het_gp[["BP_enrich"]])
enrichplot::emapplot(gp_pair)
enrichplot::ssplot(gp_pair)
enrichplot::treeplot(gp_pair)
upsetplot(het_gp[["BP_enrich"]])
enrichplot::dotplot(het_gp[["REAC_enrich"]])
gp_pair <- enrichplot::pairwise_termsim(het_gp[["REAC_enrich"]])
enrichplot::emapplot(gp_pair)
enrichplot::ssplot(gp_pair)
enrichplot::treeplot(gp_pair)
upsetplot(het_gp[["REAC_enrich"]])I make a somewhat arbitrary distinction between the concepts of over-enrichment analyses and GSEA: the former (as performed by gprofiler) (Raudvere et al. (2019)) seeks to find groups of genes overrepresented in GO/reactome/etc. These groups of genes are taken exclusively from the top-n/bottom-n genes with respect to fold-change between conditions of interest; in this case most different than wt in the p08 retina ko or het samples.
With that in mind, I can invoke a similar function using the full table of DE results to get what I call the GSEA result using clusterProfiler (Yu (n.d.)). In the following block I will use the ‘all_cprofiler’ function on the data structures named ‘p8_retina_tables’ and ‘opposite_p8_retina_tables’ in order to get these GSEA results for each contrast performed (het/wt, ko/wt, het/ko). I will follow that up with ‘all_gprofiler’ which does the same, but uses gProfiler’s enrichment analyses (it will therefore include what we just looked at).
p08_retina_all_cp <- all_cprofiler(
p8_retina_sig, p8_retina_tables, orgdb = "org.Mm.eg.db", orgdb_from = orgdb_from,
excel = "03theresa_comparison_excel/cprofiler_p08_retina.xlsx")## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'p08_retina_all_cp' not found
## Error in `h()`:
## ! error in evaluating the argument 'gse' in selecting a method for function 'plot_topn_gsea': object 'p08_retina_all_cp' not found
pp(file = "03theresa_comparison_images/gsea_p08_retina_ko_vs_het_top_hit.pdf")
p08_topn_gsea[["GO_ko_vs_het_up"]][[1]]## Error:
## ! object 'p08_topn_gsea' not found
## Error:
## ! object 'p08_topn_gsea' not found
## Error:
## ! object 'p08_topn_gsea' not found
## Error:
## ! object 'p08_topn_gsea' not found
## Error:
## ! object 'p08_topn_gsea' not found
## Error:
## ! object 'p08_topn_gsea' not found
pp(file = "03theresa_comparison_images/gsea_p08_retina_het_vs_wt_top_hit.pdf")
p08_topn_gsea[["GO_het_vs_wt_up"]][[1]]## Error:
## ! object 'p08_topn_gsea' not found
#gsea_ko <- gsea_result_ko[["result"]] %>%
# select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
# arrange(desc(recall)) %>%
# head(n = 10)
# gsea_plots_ko <- ggplot(gsea_ko, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
# geom_bar(stat = "identity")+
# scale_fill_continuous(low = "blue", high = "red") +
# theme_bw()+
# ylab("") +
# xlab("GSEA Score")
gsea_het <- gsea_result_het[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_het <- ggplot(gsea_het, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over Representation Score")
pp(file = "03theresa_comparison_images/GSEA_p08_axontrap_retinahet_upregulated_vs_retinako.pdf")
gsea_plots_het
plotted <- dev.off()
gsea_plots_hetgsea_all <- gsea_result_alldysregulated[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_all <- ggplot(gsea_all, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over Representation Score")
pp(file = "images/GSEA_p08_retina_axontrap_alldysregulatedgenes.pdf")
gsea_plots_all
plotted <- dev.off()It is only now that I realized we are splitting the data by location for each set of comparisons. I think that, left to my own devices, I would prefer to keep the input data structure intact, perform the somewhat larger number of contrasts, and then split up the results. Ideally this will slightly improve the fidelity of the results returned by DESeq2 and friends. But, I will run the state of Theresa’s notebook with as few changes as possible first, then add this.
I am going to skip this PCA plot for a couple of reasons: I already did a superset of it, and the subset Theresa performed is not valid given the set of samples included in my sample sheet, and figuring out the actually corresponding subset will take me forever… In addition, I want to use my mm38_hisat_v3 for everything…
mm38_subset <- subset_se(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & location == 'scn' | sampleid == 'iprgc_03'")
mm38_norm <- normalize(mm38_subset, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")
mm38_norm <- set_batches(mm38_norm, fact = "location")
mm38_norm <- set_conditions(mm38_norm, fact = "genotype")
pca_norm <- plot_pca(mm38_norm, max_overlaps = 70)
pca_norm$plotInstead I will simplify the subset and see what happens…
scn_samples <- subset_se(mm38_hisat_v3,
subset = "location_atb == 'scn'") %>%
set_batches(fact = "location_atb") %>%
set_conditions(fact = "genotype_atb", colors = color_choices[["genotype"]])## The number of samples by batch are:
##
## scn
## 17
## The numbers of samples by condition are:
##
## het ko wt
## 6 6 5
scn_norm <- normalize(scn_samples, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")## Removing 11109 low-count genes (14316 remaining).
## transform_counts: Found 919 values less than 0.
## transform_counts: Found 919 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het, ko, wt
## Shapes are defined by scn.
Theresa’s next operation was to perform libsize/nonzero plots. I already did the pre/post deduplication nonzero, here is the analagous libsize.
v2 is pre-deduplication and v3 is post.
## Library sizes of 65 samples,
## ranging from 3,717,242 to 24,538,069.
post_filter_nonzero <- plot_libsize(mm38_hisat_v3, text = FALSE)
pp(file = "01diagnostic_images/post_all_filteres_nonzero.pdf")
post_filter_nonzero[["plot"]]
plotted <- dev.off()
post_filter_nonzero## Library sizes of 63 samples,
## ranging from 1,264,475 to 10,979,038.
I am a bit concerned about some of these library sizes post-deduplication.
Let us look at the relationship between reads and duplication, which I assume will be relatively linear.
test <- colData(mm38_hisat_v3)[, c("hisat_genome_single_all", "umi_dedup_pct_reads")]
test_plot <- plot_linear_scatter(test, loess = TRUE)
test_plot[["scatter"]]## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## i This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## i Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## i This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## i Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## i This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## i Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## png
## 2
Theresa also produced a density/sample plot, that might prove quite useful for these due to their significantly larger variance across samples (due to deduplication).
pp(file = "01diagnostic_images/sample_density.pdf")
mm38_density <- plot_density(loc_geno_nt)
mm38_density[["plot"]] +
theme(legend.position = "none")
plot_boxplot(loc_geno_nt)## iprgc_62 iprgc_63 iprgc_64 iprgc_65 iprgc_66 iprgc_67 iprgc_68
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.689 1.527 2.480 2.558 1.894 2.186 1.469
## median 3.713 3.521 4.413 4.445 3.909 4.021 3.539
## mean 3.711 3.556 4.250 4.279 3.916 3.995 3.573
## q3 5.460 5.314 5.914 5.904 5.705 5.638 5.333
## max 15.929 16.020 14.188 14.054 14.741 15.920 16.168
## iqr 3.771 3.788 3.434 3.345 3.811 3.452 3.864
## iqr_high 11.115 10.995 11.064 10.922 11.423 10.817 11.129
## iqr_low -5.656 -5.681 -5.151 -5.018 -5.717 -5.178 -5.796
## sd 2.426 2.434 2.309 2.285 2.439 2.309 2.439
## var 5.885 5.926 5.330 5.219 5.948 5.333 5.948
## stdvar 1.586 1.667 1.254 1.220 1.519 1.335 1.665
## iprgc_69 iprgc_70 iprgc_71 iprgc_72 iprgc_73 iprgc_74 iprgc_75
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.543 2.182 1.968 1.990 1.996 2.404 2.419
## median 4.466 4.015 3.869 3.922 3.907 4.268 4.318
## mean 4.299 3.977 3.865 3.889 3.901 4.169 4.210
## q3 5.929 5.624 5.556 5.584 5.599 5.848 5.883
## max 12.381 15.767 16.057 15.939 15.386 13.760 13.110
## iqr 3.387 3.442 3.587 3.594 3.602 3.444 3.465
## iqr_high 11.009 10.786 10.937 10.975 11.002 11.014 11.080
## iqr_low -5.080 -5.162 -5.381 -5.391 -5.403 -5.166 -5.197
## sd 2.304 2.325 2.377 2.377 2.384 2.334 2.328
## var 5.308 5.404 5.651 5.648 5.684 5.448 5.421
## stdvar 1.235 1.359 1.462 1.452 1.457 1.307 1.288
## iprgc_76 iprgc_77 iprgc_81 iprgc_82 iprgc_83 iprgc_84 iprgc_85
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.518 2.264 2.084 2.178 2.503 2.531 2.246
## median 4.379 4.066 3.814 3.984 4.354 4.457 3.959
## mean 4.264 4.025 3.848 3.996 4.221 4.288 3.938
## q3 5.908 5.654 5.476 5.617 5.852 5.961 5.531
## max 13.649 15.864 16.157 14.893 13.932 13.337 16.483
## iqr 3.389 3.390 3.392 3.439 3.349 3.430 3.285
## iqr_high 10.992 10.739 10.565 10.776 10.875 11.106 10.459
## iqr_low -5.084 -5.085 -5.088 -5.159 -5.023 -5.145 -4.928
## sd 2.299 2.295 2.320 2.320 2.298 2.317 2.234
## var 5.287 5.266 5.382 5.384 5.283 5.368 4.989
## stdvar 1.240 1.308 1.399 1.348 1.252 1.252 1.267
## iprgc_86 iprgc_87 iprgc_88 iprgc_89 iprgc_90 iprgc_91 iprgc_92
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.470 1.526 1.858 2.199 2.303 2.420 2.044
## median 4.345 3.317 3.742 4.035 4.389 4.450 3.823
## mean 4.232 3.389 3.773 4.031 4.202 4.259 3.874
## q3 5.941 4.998 5.463 5.664 5.973 5.967 5.531
## max 14.490 16.910 16.308 15.157 14.315 13.393 15.881
## iqr 3.471 3.472 3.604 3.465 3.669 3.547 3.487
## iqr_high 11.148 10.206 10.869 10.862 11.477 11.288 10.761
## iqr_low -5.207 -5.208 -5.406 -5.198 -5.504 -5.321 -5.230
## sd 2.339 2.276 2.347 2.320 2.403 2.360 2.325
## var 5.471 5.179 5.510 5.381 5.773 5.571 5.404
## stdvar 1.293 1.528 1.460 1.335 1.374 1.308 1.395
## iprgc_93 iprgc_94 iprgc_95 iprgc_96 iprgc_97 iprgc_98 iprgc_99
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.194 1.455 2.093 2.550 2.101 1.934 2.604
## median 3.971 3.413 3.819 4.423 3.892 3.878 4.541
## mean 3.924 3.486 3.853 4.270 3.934 3.908 4.335
## q3 5.516 5.208 5.455 5.926 5.576 5.660 5.976
## max 16.492 16.585 16.084 13.933 15.433 14.700 12.844
## iqr 3.322 3.753 3.362 3.377 3.475 3.726 3.371
## iqr_high 10.498 10.837 10.498 10.991 10.789 11.249 11.033
## iqr_low -4.982 -5.629 -5.043 -5.065 -5.213 -5.589 -5.057
## sd 2.253 2.401 2.276 2.305 2.332 2.424 2.303
## var 5.075 5.766 5.181 5.314 5.440 5.876 5.302
## stdvar 1.293 1.654 1.344 1.245 1.383 1.504 1.223
## iprgc_100 iprgc_101 iprgc_102 iprgc_104 iprgc_105 iprgc_106 iprgc_107
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.824 2.562 2.159 2.260 2.261 1.787 1.599
## median 3.846 4.490 4.065 4.050 4.187 3.930 3.435
## mean 3.838 4.307 4.032 4.011 4.119 3.888 3.505
## q3 5.593 5.958 5.751 5.639 5.825 5.706 5.147
## max 15.708 13.878 14.894 15.914 14.621 15.187 17.169
## iqr 3.770 3.396 3.592 3.380 3.564 3.919 3.548
## iqr_high 11.248 11.053 11.138 10.709 11.171 11.585 10.469
## iqr_low -5.654 -5.095 -5.388 -5.069 -5.346 -5.879 -5.322
## sd 2.419 2.294 2.368 2.270 2.351 2.479 2.303
## var 5.852 5.263 5.610 5.155 5.526 6.147 5.305
## stdvar 1.525 1.222 1.391 1.285 1.342 1.581 1.514
## iprgc_108 iprgc_109 iprgc_110 iprgc_111 iprgc_112 iprgc_113 iprgc_114
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.838 2.491 2.183 2.361 2.193 2.358 2.163
## median 4.026 4.483 3.966 4.274 4.066 4.150 4.078
## mean 3.945 4.302 4.002 4.184 4.080 4.108 4.065
## q3 5.784 6.029 5.657 5.855 5.808 5.711 5.801
## max 14.921 12.859 15.079 14.025 13.023 15.794 13.261
## iqr 3.946 3.538 3.473 3.493 3.614 3.353 3.638
## iqr_high 11.703 11.335 10.867 11.095 11.229 10.741 11.257
## iqr_low -5.919 -5.306 -5.210 -5.240 -5.422 -5.030 -5.457
## sd 2.486 2.354 2.336 2.335 2.386 2.267 2.400
## var 6.179 5.543 5.457 5.450 5.695 5.139 5.760
## stdvar 1.566 1.289 1.364 1.303 1.396 1.251 1.417
## iprgc_115 iprgc_116 iprgc_117 iprgc_118 iprgc_119 iprgc_120 iprgc_121
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.511 2.620 2.396 2.491 2.525 2.346 2.352
## median 4.495 4.525 4.367 4.396 4.404 4.347 4.365
## mean 4.303 4.332 4.223 4.256 4.248 4.214 4.214
## q3 6.020 5.957 5.980 5.985 5.919 5.941 5.942
## max 12.739 12.644 13.541 13.650 13.796 14.752 14.560
## iqr 3.508 3.337 3.584 3.494 3.394 3.596 3.590
## iqr_high 11.282 10.963 11.356 11.225 11.010 11.335 11.327
## iqr_low -5.262 -5.006 -5.376 -5.241 -5.091 -5.393 -5.385
## sd 2.354 2.289 2.377 2.347 2.310 2.368 2.366
## var 5.543 5.242 5.649 5.509 5.335 5.609 5.596
## stdvar 1.288 1.210 1.338 1.294 1.256 1.331 1.328
## iprgc_122 iprgc_123 iprgc_124 iprgc_125 iprgc_126 iprgc_127 iprgc_128
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.567 1.556 2.038 2.211 2.215 2.323 2.352
## median 4.453 3.322 3.745 4.085 4.082 4.172 4.198
## mean 4.282 3.409 3.761 4.045 4.037 4.138 4.128
## q3 5.915 5.004 5.325 5.709 5.733 5.838 5.782
## max 13.284 17.222 17.196 13.801 15.040 14.076 15.026
## iqr 3.348 3.447 3.287 3.498 3.518 3.515 3.430
## iqr_high 10.937 10.174 10.255 10.956 11.011 11.110 10.926
## iqr_low -5.022 -5.171 -4.930 -5.247 -5.278 -5.272 -5.145
## sd 2.300 2.242 2.196 2.354 2.363 2.348 2.316
## var 5.291 5.026 4.821 5.543 5.583 5.511 5.364
## stdvar 1.236 1.474 1.282 1.370 1.383 1.332 1.300
## iprgc_129 iprgc_130
## min 0.000 0.000
## q1 2.462 2.286
## median 4.266 4.221
## mean 4.175 4.132
## q3 5.817 5.836
## max 15.200 13.939
## iqr 3.354 3.550
## iqr_high 10.848 11.162
## iqr_low -5.031 -5.325
## sd 2.293 2.373
## var 5.258 5.632
## stdvar 1.259 1.363
## Plot describing the gene distribution from a dataset.
## png
## 2
## iprgc_62 iprgc_63 iprgc_64 iprgc_65 iprgc_66 iprgc_67 iprgc_68
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.689 1.527 2.480 2.558 1.894 2.186 1.469
## median 3.713 3.521 4.413 4.445 3.909 4.021 3.539
## mean 3.711 3.556 4.250 4.279 3.916 3.995 3.573
## q3 5.460 5.314 5.914 5.904 5.705 5.638 5.333
## max 15.929 16.020 14.188 14.054 14.741 15.920 16.168
## iqr 3.771 3.788 3.434 3.345 3.811 3.452 3.864
## iqr_high 11.115 10.995 11.064 10.922 11.423 10.817 11.129
## iqr_low -5.656 -5.681 -5.151 -5.018 -5.717 -5.178 -5.796
## sd 2.426 2.434 2.309 2.285 2.439 2.309 2.439
## var 5.885 5.926 5.330 5.219 5.948 5.333 5.948
## stdvar 1.586 1.667 1.254 1.220 1.519 1.335 1.665
## iprgc_69 iprgc_70 iprgc_71 iprgc_72 iprgc_73 iprgc_74 iprgc_75
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.543 2.182 1.968 1.990 1.996 2.404 2.419
## median 4.466 4.015 3.869 3.922 3.907 4.268 4.318
## mean 4.299 3.977 3.865 3.889 3.901 4.169 4.210
## q3 5.929 5.624 5.556 5.584 5.599 5.848 5.883
## max 12.381 15.767 16.057 15.939 15.386 13.760 13.110
## iqr 3.387 3.442 3.587 3.594 3.602 3.444 3.465
## iqr_high 11.009 10.786 10.937 10.975 11.002 11.014 11.080
## iqr_low -5.080 -5.162 -5.381 -5.391 -5.403 -5.166 -5.197
## sd 2.304 2.325 2.377 2.377 2.384 2.334 2.328
## var 5.308 5.404 5.651 5.648 5.684 5.448 5.421
## stdvar 1.235 1.359 1.462 1.452 1.457 1.307 1.288
## iprgc_76 iprgc_77 iprgc_81 iprgc_82 iprgc_83 iprgc_84 iprgc_85
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.518 2.264 2.084 2.178 2.503 2.531 2.246
## median 4.379 4.066 3.814 3.984 4.354 4.457 3.959
## mean 4.264 4.025 3.848 3.996 4.221 4.288 3.938
## q3 5.908 5.654 5.476 5.617 5.852 5.961 5.531
## max 13.649 15.864 16.157 14.893 13.932 13.337 16.483
## iqr 3.389 3.390 3.392 3.439 3.349 3.430 3.285
## iqr_high 10.992 10.739 10.565 10.776 10.875 11.106 10.459
## iqr_low -5.084 -5.085 -5.088 -5.159 -5.023 -5.145 -4.928
## sd 2.299 2.295 2.320 2.320 2.298 2.317 2.234
## var 5.287 5.266 5.382 5.384 5.283 5.368 4.989
## stdvar 1.240 1.308 1.399 1.348 1.252 1.252 1.267
## iprgc_86 iprgc_87 iprgc_88 iprgc_89 iprgc_90 iprgc_91 iprgc_92
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.470 1.526 1.858 2.199 2.303 2.420 2.044
## median 4.345 3.317 3.742 4.035 4.389 4.450 3.823
## mean 4.232 3.389 3.773 4.031 4.202 4.259 3.874
## q3 5.941 4.998 5.463 5.664 5.973 5.967 5.531
## max 14.490 16.910 16.308 15.157 14.315 13.393 15.881
## iqr 3.471 3.472 3.604 3.465 3.669 3.547 3.487
## iqr_high 11.148 10.206 10.869 10.862 11.477 11.288 10.761
## iqr_low -5.207 -5.208 -5.406 -5.198 -5.504 -5.321 -5.230
## sd 2.339 2.276 2.347 2.320 2.403 2.360 2.325
## var 5.471 5.179 5.510 5.381 5.773 5.571 5.404
## stdvar 1.293 1.528 1.460 1.335 1.374 1.308 1.395
## iprgc_93 iprgc_94 iprgc_95 iprgc_96 iprgc_97 iprgc_98 iprgc_99
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.194 1.455 2.093 2.550 2.101 1.934 2.604
## median 3.971 3.413 3.819 4.423 3.892 3.878 4.541
## mean 3.924 3.486 3.853 4.270 3.934 3.908 4.335
## q3 5.516 5.208 5.455 5.926 5.576 5.660 5.976
## max 16.492 16.585 16.084 13.933 15.433 14.700 12.844
## iqr 3.322 3.753 3.362 3.377 3.475 3.726 3.371
## iqr_high 10.498 10.837 10.498 10.991 10.789 11.249 11.033
## iqr_low -4.982 -5.629 -5.043 -5.065 -5.213 -5.589 -5.057
## sd 2.253 2.401 2.276 2.305 2.332 2.424 2.303
## var 5.075 5.766 5.181 5.314 5.440 5.876 5.302
## stdvar 1.293 1.654 1.344 1.245 1.383 1.504 1.223
## iprgc_100 iprgc_101 iprgc_102 iprgc_104 iprgc_105 iprgc_106 iprgc_107
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.824 2.562 2.159 2.260 2.261 1.787 1.599
## median 3.846 4.490 4.065 4.050 4.187 3.930 3.435
## mean 3.838 4.307 4.032 4.011 4.119 3.888 3.505
## q3 5.593 5.958 5.751 5.639 5.825 5.706 5.147
## max 15.708 13.878 14.894 15.914 14.621 15.187 17.169
## iqr 3.770 3.396 3.592 3.380 3.564 3.919 3.548
## iqr_high 11.248 11.053 11.138 10.709 11.171 11.585 10.469
## iqr_low -5.654 -5.095 -5.388 -5.069 -5.346 -5.879 -5.322
## sd 2.419 2.294 2.368 2.270 2.351 2.479 2.303
## var 5.852 5.263 5.610 5.155 5.526 6.147 5.305
## stdvar 1.525 1.222 1.391 1.285 1.342 1.581 1.514
## iprgc_108 iprgc_109 iprgc_110 iprgc_111 iprgc_112 iprgc_113 iprgc_114
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 1.838 2.491 2.183 2.361 2.193 2.358 2.163
## median 4.026 4.483 3.966 4.274 4.066 4.150 4.078
## mean 3.945 4.302 4.002 4.184 4.080 4.108 4.065
## q3 5.784 6.029 5.657 5.855 5.808 5.711 5.801
## max 14.921 12.859 15.079 14.025 13.023 15.794 13.261
## iqr 3.946 3.538 3.473 3.493 3.614 3.353 3.638
## iqr_high 11.703 11.335 10.867 11.095 11.229 10.741 11.257
## iqr_low -5.919 -5.306 -5.210 -5.240 -5.422 -5.030 -5.457
## sd 2.486 2.354 2.336 2.335 2.386 2.267 2.400
## var 6.179 5.543 5.457 5.450 5.695 5.139 5.760
## stdvar 1.566 1.289 1.364 1.303 1.396 1.251 1.417
## iprgc_115 iprgc_116 iprgc_117 iprgc_118 iprgc_119 iprgc_120 iprgc_121
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.511 2.620 2.396 2.491 2.525 2.346 2.352
## median 4.495 4.525 4.367 4.396 4.404 4.347 4.365
## mean 4.303 4.332 4.223 4.256 4.248 4.214 4.214
## q3 6.020 5.957 5.980 5.985 5.919 5.941 5.942
## max 12.739 12.644 13.541 13.650 13.796 14.752 14.560
## iqr 3.508 3.337 3.584 3.494 3.394 3.596 3.590
## iqr_high 11.282 10.963 11.356 11.225 11.010 11.335 11.327
## iqr_low -5.262 -5.006 -5.376 -5.241 -5.091 -5.393 -5.385
## sd 2.354 2.289 2.377 2.347 2.310 2.368 2.366
## var 5.543 5.242 5.649 5.509 5.335 5.609 5.596
## stdvar 1.288 1.210 1.338 1.294 1.256 1.331 1.328
## iprgc_122 iprgc_123 iprgc_124 iprgc_125 iprgc_126 iprgc_127 iprgc_128
## min 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## q1 2.567 1.556 2.038 2.211 2.215 2.323 2.352
## median 4.453 3.322 3.745 4.085 4.082 4.172 4.198
## mean 4.282 3.409 3.761 4.045 4.037 4.138 4.128
## q3 5.915 5.004 5.325 5.709 5.733 5.838 5.782
## max 13.284 17.222 17.196 13.801 15.040 14.076 15.026
## iqr 3.348 3.447 3.287 3.498 3.518 3.515 3.430
## iqr_high 10.937 10.174 10.255 10.956 11.011 11.110 10.926
## iqr_low -5.022 -5.171 -4.930 -5.247 -5.278 -5.272 -5.145
## sd 2.300 2.242 2.196 2.354 2.363 2.348 2.316
## var 5.291 5.026 4.821 5.543 5.583 5.511 5.364
## stdvar 1.236 1.474 1.282 1.370 1.383 1.332 1.300
## iprgc_129 iprgc_130
## min 0.000 0.000
## q1 2.462 2.286
## median 4.266 4.221
## mean 4.175 4.132
## q3 5.817 5.836
## max 15.200 13.939
## iqr 3.354 3.550
## iqr_high 10.848 11.162
## iqr_low -5.031 -5.325
## sd 2.293 2.373
## var 5.258 5.632
## stdvar 1.259 1.363
## Plot describing the gene distribution from a dataset.
## png
## 2
## Plot describing the gene distribution from a dataset.
There is some difference across sample densities, but it is not too crazytown.
At this point in the document I read ahead a bit and came to the conclusion that it repeats the above logic of taking the union of wt comparisons to remove genes from the appropriate het/ko or p15/p08 or location comparisons. This seems quite reasonable to me, but I would prefer to not separate all the data, so I will attempt to duplicate and slightly streamline this logic on the full dataset. Thus I am going to skip down to the end and attempt to implement this.
Note: The following few blocks are all copy/pasted directly from Theresa’s notebook and are not evaluated because they are performed almost identically later but with slightly different logic/orders.
mm_de_normal_p8_scn <- all_pairwise(mm38_subset, model_batch = "svaseq",
parallel = FALSE, do_ebseq = FALSE, do_basic = FALSE,
do_dream = FALSE, do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_normal_p8_scn$deseq$all_tables$ko_scn_vs_het_scn)) %>%
distinct()
mm_de_normal_p8_scn$deseq$all_tables$ko_scn_vs_het_scn <- merge(
mm_de_normal_p8_scn$deseq$all_tables$ko_scn_vs_het_scn,
annots_to_merge, by.x = 0, by.y = "ensembl_gene_id", all.x = TRUE)hetkeeper_genes <- mm_de_normal_p8_scn$deseq$all_tables$wt_scn_vs_het_scn %>%
filter(logFC <= -0.1 & adj.P.Val <= 0.05)
kokeeper_genes <- mm_de_normal_p8_scn$deseq$all_tables$wt_scn_vs_ko_scn %>%
filter(logFC <= -0.1 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(hetkeeper_genes), rownames(kokeeper_genes)))
df <- mm_de_normal_p8_scn$deseq$all_tables$koscn_vs_hetscn %>%
dplyr::mutate(logFC = -logFC) %>%
set_sig_limma(factors = c("Het Enriched",
"KO Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
labels_ups <- df %>%
filter(abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 1)
labels_downs <- df %>%
filter(abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 1)
labels <- rbind(labels_ups, labels_downs)
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1, 1)) +
geom_hline(yintercept = -log10(0.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position="right") +
scale_color_manual(values=c("Het Enriched" = "#F8766D",
"KO Enriched" = "#00BFC4",
"Not\n Enriched" = "Grey")) +
geom_label_repel(data=filter(df,
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
external_gene_name %in% labels$external_gene_name),
## nudge_x = -0.5,
nudge_y = 3, max.overlaps = 15) +
ggtitle("SCN Het vs KO Translatome")
pp(file = "images/p08_scn_DE_1312024.pdf")
DEplot
plotted <- dev.off()
write_xlsx(df, excel = "excel/scnhet_vs_scnko_WTfiltered.xlsx")ko_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(-abs(logFC)) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(logFC <= -1)
het_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(-abs(logFC)) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(logFC >= 1)
alldysregulated_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(logFC) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(abs(logFC) >= 1)
gsea_result_ko <- gost(query = ko_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_result_het <- gost(query = het_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_result_alldysregulated <- gost(query = alldysregulated_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)gsea_ko <- gsea_result_ko[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_ko <- ggplot(gsea_ko, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over enrichment Score")
gsea_het <- gsea_result_het[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_het <- ggplot(gsea_het, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over enrichment Score")
gsea_all <- gsea_result_alldysregulated[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_all <- ggplot(gsea_all, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over enrichment Score")
pp(file = "images/GSEA_p08_retina_axontrap_alldysregulatedgenes.pdf")
gsea_plots_all
plotted <- dev.off()mm38_subset2 <- subset_se(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & genotype != 'ko' & location != 'dlgn' | sampleid == 'iprgc_03'")
mm38_subset2 <- subset_se(mm38_subset2, subset = "sampleid != 'iprgc_89'")
mm38_subset2$design %>%
select(genotype, location) %>%
table()
mm38_norm2 <- normalize(mm38_subset2, filter=TRUE,
convert="cpm",
transform="log2", batch = "svaseq")mm_de_subset2 <- all_pairwise(mm38_subset2,
model_batch="svaseq",
parallel=FALSE, do_ebseq=FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)retinakeeper_genes <- mm_de_subset2$deseq$all_tables$wt_retina_vs_het_retina %>%
filter(logFC <= -0.1 & adj.P.Val <= 0.05)
scnkeeper_genes <- mm_de_subset2$deseq$all_tables$wt_scn_vs_het_scn %>%
filter(logFC <= -0.1 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(retinakeeper_genes), rownames(scnkeeper_genes)))
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_subset2$deseq$all_tables$het_scn_vs_het_retina)) %>%
distinct()
mm_de_subset2$deseq$all_tables$het_scn_vs_het_retina <- merge(
mm_de_subset2$deseq$all_tables$het_scn_vs_het_retina,
annots_to_merge, by.x = 0,
by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_subset2$deseq$all_tables$het_scn_vs_het_retina %>%
mutate(Significance = case_when(logFC <= -1.0 ~ "Retina Enriched",
logFC >= 1.0 ~ "SCN Enriched",
logFC > -1.0 & logFC < 1.0 ~ "Not\n Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1.0) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "SCN Enriched") %>%
filter(Row.names %in% rownames(scnkeeper_genes))
retina_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1.0) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "Retina Enriched") %>%
filter(Row.names %in% rownames(retinakeeper_genes))
notenriched <- df %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance) %>%
filter(Row.names %in% c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes))[duplicated(c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes)))]) %>%
filter(Significance == "Not\n Enriched")
df <- rbind(scn_enriched, retina_enriched, notenriched)
df <- df %>%
distinct()
## writexl::write_xlsx(df, path = "axonTRAP_DE_results_20240202/retinahet_vs_scn_het_WTfiltered.xlsx")labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1.0) %>%
arrange(logFC) %>%
head(n = 10)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1.0) %>%
arrange(-logFC) %>%
head(n = 10)
labels <- rbind(labels_ups, labels_downs)
labels_requested <- c("Cdh10","Cdh12","Cdh13","Cdh18",
"Cdh7","Cdh8","Cdh9","Cntn3",
"Cntn4","Cntn5","Cntn6","Kirrel3",
"Nrxn1","Nrxn3","Sema3c","Sema6d",
"Tenm1","Tenm2","Tenm4")
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1, 1)) +
geom_hline(yintercept = -log10(0.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position="right") +
scale_color_manual(values=c("Grey", "#F8766D", "#00BFC4")) +
geom_label_repel(data=filter(df,
external_gene_name %in% labels_requested),
## c(labels$external_gene_name, "Opn4")), #c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
## nudge_x = -0.5,
nudge_y = 15, max.overlaps = 25)
#pp(file = "axonTRAP_Volcanoplots_20240202/p08_retinavsscnhet_DE_requested_genelabels_02052024.pdf")
DEplot
#dev.off()scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1.0) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance)
retina_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1.0) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance)
scn_enriched
retina_enriched
df %>%
filter(Significance == "Not\n Enriched")gsea_result_scn <- gost(query = scn_enriched$external_gene_name,
organism = "mmusculus", evcodes = TRUE,
ordered_query = TRUE, source = c("GO"))
gsea_result_ret <- gost(query = retina_enriched$external_gene_name,
organism = "mmusculus", evcodes = TRUE,
ordered_query = TRUE, source = c("GO"))gsea_scn <- gsea_result_scn[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_scn <- ggplot(gsea_scn, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over enrichment Score")
pp(file = "images/GSEA_SCNhet_vs_retina_enriched_P08.pdf")
gsea_plots_scn
plotted <- dev.off()
gsea_ret <- gsea_result_ret[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_ret <- ggplot(gsea_ret, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("Over enrichment Score")
pp(file = "images/GSEA_Retinahet_vs_SCN_enriched_P08.pdf")
gsea_plots_ret
plotted <- dev.off()mm38_subset3 <- subset_se(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & genotype != 'het' & location != 'dlgn' | sampleid == 'iprgc_03'")
mm38_subset3 <- subset_se(mm38_subset3, subset = "sampleid != 'iprgc_86'")
mm38_subset3$design %>%
select(genotype, location) %>%
table()
mm38_norm3 <- normalize(mm38_subset3, filter=TRUE,
convert="cpm", transform="log2", batch = "svaseq")mm_de_subset3 <- all_pairwise(mm38_subset3,
model_batch="svaseq",
parallel=FALSE, do_ebseq=FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)
retinakeeper_genes <- mm_de_subset3$deseq$all_tables$wtretina_vs_koretina %>%
filter(logFC <= -1.0 & adj.P.Val <= 0.05)
scnkeeper_genes <- mm_de_subset3$deseq$all_tables$wtscn_vs_koscn %>%
filter(logFC <= -1.0 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(retinakeeper_genes), rownames(scnkeeper_genes)))
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_subset3$deseq$all_tables$ko_scn_vs_ko_retina)) %>%
distinct()
mm_de_subset3$deseq$all_tables$ko_scn_vs_ko_retina <- merge(
mm_de_subset3$deseq$all_tables$ko_scn_vs_ko_retina,
annots_to_merge, by.x = 0,
by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_subset3$deseq$all_tables$ko_scn_vs_ko_retina %>%
mutate(Significance = case_when(logFC <= -1 ~ "Retina Enriched",
logFC >= 1 ~ "SCN Enriched",
logFC > -1 & logFC < 1 ~ "Not\n Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "SCN Enriched") %>%
filter(Row.names %in% rownames(scnkeeper_genes))
df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "Retina Enriched") %>%
filter(Row.names %in% rownames(retinakeeper_genes)) -> retina_enriched
notenriched <- df %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance) %>%
filter(Row.names %in% c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes))[duplicated(c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes)))])
df <- rbind(scn_enriched, retina_enriched, notenriched)labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 10)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 10)
labels <- rbind(labels_ups, labels_downs)
## wanted_column <- "Significance"
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
## geom_point(aes(colour = !!sym(wanted_column)), size = 4) +
geom_vline(xintercept = c(-1, 1)) +
geom_hline(yintercept = -log10(0.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position = "right") +
scale_color_manual(values = c("Grey", "#F8766D", "#00BFC4")) +
geom_label_repel(data = filter(
df, external_gene_name %in% c(labels$external_gene_name, "Opn4")),
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
## nudge_x = -0.5,
nudge_y = 10, max.overlaps = 25)
pp(file = "images/p08_retinavsscnko_DE_1312024.pdf")
DEplot
plotted <- dev.off()gsea_result_scn <- gost(query = scn_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE,
source = c("GO"))
gsea_result_ret <- gost(query = retina_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE,
source = c("GO"))gsea_scn <- gsea_result_scn[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_scn <- ggplot(gsea_scn, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
pp(file = "images/GSEA_SCNko_enriched_vs_retina_P08.pdf")
gsea_plots_scn
plotted <- dev.off()
gsea_ret <- gsea_result_ret[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_ret <- ggplot(gsea_ret, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
pp(file = "images/GSEA_Retinako_enriched_vs_SCN_P08.pdf")
gsea_plots_ret
plotted <- dev.off()I want to have an invocation of all_pairwise() which uses all samples, in the following block I will set that up using a set of ‘keepers’ which will be named by time, location, then 2 letters for the numerator/denominator: w for WT, h for het, d for delta; thus “p08_retina_hw” is comparing the het/wt for the p08 retina samples.
If they are of interest, I will have a separate set which follows the same convention with names like “p08_ko_sr” to compare p08 deltas with SCN as the numerator and retina as the denominator.
The most peculiar aspect of this analysis resides in the choices around choosing which genes to consider when comparing the genotypes/locations/times. The general idea is pretty clear: find the genes which are non-specifically being pulled down in the WT samples and either exclude or discount them. The various potential methods for performing this are confusing:
Theresa’s current worksheet implements a version of 1b in which she separated the various input gene sets to define the exclusion genes. I am going to repeat this, but leave the starting data structure intact.
In this first iteration, I will do that by creating a simplified model of the data which combines the time/genotype/location and using sva. In my next iteration I will use a full statistical model containing each of those factors (and probably also using sva).
Note: my color choices are kind of garbage.
In addition, the exclusion dataset is the same as the analysis dataset, it is really only the contrasts which will be different.
v3_pairwise_input <- set_conditions(mm38_hisat_v3, fact = "time_geno_loc",
colors = color_choices[["all"]])## The numbers of samples by condition are:
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Warning in set_se_colors(new_se, colors = colors): Colors for the following
## categories are not being used: p60_wt_dlgn, p60_wt_retina, p60_wt_scn.
all_cond_gene_heatmap_start <- normalize(v3_pairwise_input, filter = "simple",
length_column = "cds_length",
convert = "rpkm", transform = "log2")## Removing 5517 low-count genes (19908 remaining).
## There appear to be 6387 genes without a length.
## transform_counts: Found 592846 values equal to 0, adding 1 to the matrix.
## The factor p08_het_dlgn has 3 rows.
## The factor p08_het_retina has 3 rows.
## The factor p08_het_scn has 3 rows.
## The factor p08_ko_dlgn has 3 rows.
## The factor p08_ko_retina has 3 rows.
## The factor p08_ko_scn has 3 rows.
## The factor p08_wt_dlgn has 5 rows.
## The factor p08_wt_retina has 5 rows.
## The factor p08_wt_scn has 3 rows.
## The factor p15_het_dlgn has 4 rows.
## The factor p15_het_retina has 4 rows.
## The factor p15_het_scn has 3 rows.
## The factor p15_ko_dlgn has 3 rows.
## The factor p15_ko_retina has 3 rows.
## The factor p15_ko_scn has 3 rows.
## The factor p15_wt_dlgn has 5 rows.
## The factor p15_wt_retina has 5 rows.
## The factor p15_wt_scn has 2 rows.
all_cond_mtrx <- all_cond_gene_heatmap_input[["medians"]]
color_order <- colnames(all_cond_mtrx)
na_idx <- is.na(all_cond_mtrx)
all_cond_mtrx[na_idx] <- 0
variances <- matrixStats::rowVars(as.matrix(all_cond_mtrx))
variant_genes <- variances > 6.2
input_mtrx <- all_cond_mtrx[variant_genes, ]
cond_colors <- get_colors_by_condition(v3_pairwise_input, levels = color_order)
dim(input_mtrx)## [1] 104 18
## Warning in pp(file =
## "04inclusion_comparisons/top_104_variant_rpkm_genes_heatmap.pdf"): The
## directory: 04inclusion_comparisons does not exist, will attempt to create it.
gplots::heatmap.2(as.matrix(input_mtrx), scale = "none", trace = "none",
ColSideColors = cond_colors)
dev.off()## png
## 2
Rashmi suggested we should do the above plot after subtracting the wt counts. This is a good idea, but it will have to wait until we finish the current set.
In the following few blocks I will set up the various comparisons of interest. Starting with the set of genes to exclude because they were observed to bind non-specifically in the wt samples.
In each exclusion I will have the contrast first followed by the pair of contrasts which will be used to define the gene set to exclude.
Put slightly differently, for every term of interest I will create a contrast with the wt as numerator and the desired term as denominator, then pull out the genes increased in wt.
inclusions <- list(
## I like alphabetizing things, start with dlgn
"p15_het_dlgn" = c("p15_het_dlgn", "p15_wt_dlgn"),
"p08_het_dlgn" = c("p08_het_dlgn", "p08_wt_dlgn"),
"p15_ko_dlgn" = c("p15_ko_dlgn", "p15_wt_dlgn"),
"p08_ko_dlgn" = c("p08_ko_dlgn", "p08_wt_dlgn"),
## Then retinas
"p15_het_retina" = c("p15_het_retina", "p15_wt_retina"),
"p08_het_retina" = c("p08_het_retina", "p08_wt_retina"),
"p15_ko_retina" = c("p15_ko_retina", "p15_wt_retina"),
"p08_ko_retina" = c("p08_ko_retina", "p08_wt_retina"),
## Then scn
"p15_het_scn" = c("p15_het_scn", "p15_wt_scn"),
"p08_het_scn" = c("p08_het_scn", "p08_wt_scn"),
"p15_ko_scn" = c("p15_ko_scn", "p15_wt_scn"),
"p08_ko_scn" = c("p08_ko_scn", "p08_wt_scn"))For each location/genotype of interest, let us compare p15/p08
time_keepers <- list(
## DLGN
"t_het_dlgn" = c("p15_het_dlgn", "p08_het_dlgn"),
"t_ko_dlgn" = c("p15_ko_dlgn", "p08_ko_dlgn"),
## Retina
"t_het_retina" = c("p15_het_retina", "p08_het_retina"),
"t_ko_retina" = c("p15_ko_retina", "p08_ko_retina"),
## SCN
"t_het_scn" = c("p15_het_scn", "p08_het_scn"),
"t_ko_scn" = c("p15_ko_scn", "p08_ko_scn"))Compare locations and keep time/genotype consistent. I will use the location initials to define numerator/denominator.
location_keepers <- list(
## dlgn/retina
"dr_p08_het" = c("p08_het_dlgn", "p08_het_retina"),
"dr_p15_het" = c("p15_het_dlgn", "p15_het_retina"),
"dr_p08_ko" = c("p08_ko_dlgn", "p08_ko_retina"),
"dr_p15_ko" = c("p15_ko_dlgn", "p15_ko_retina"),
## scn/retina
"sr_p08_het" = c("p08_het_scn", "p08_het_retina"),
"sr_p15_het" = c("p15_het_scn", "p15_het_retina"),
"sr_p08_ko" = c("p08_ko_scn", "p08_ko_retina"),
"sr_p15_ko" = c("p15_ko_scn", "p15_ko_retina"),
## dlgn/scn
"ds_p08_het" = c("p08_het_dlgn", "p08_het_scn"),
"ds_p15_het" = c("p15_het_dlgn", "p15_het_scn"),
"ds_p08_ko" = c("p08_ko_dlgn", "p08_ko_scn"),
"ds_p15_ko" = c("p15_ko_dlgn", "p15_ko_scn"))Compare ko/het while keeping time/location constant. Similarly, use the initials to denote numerator/denominator, which will always be kh.
genotype_keepers <- list(
## DLGN
"kh_p08_dlgn" = c("p08_ko_dlgn", "p08_het_dlgn"),
"kh_p15_dlgn" = c("p15_ko_dlgn", "p15_het_dlgn"),
## Retina
"kh_p08_retina" = c("p08_ko_retina", "p08_het_retina"),
"kh_p15_retina" = c("p15_ko_retina", "p15_het_retina"),
## SCN
"kh_p08_scn" = c("p08_ko_scn", "p08_het_scn"),
"kh_p15_scn" = c("p15_ko_scn", "p15_het_scn"))My all_pairwise() function now has a parameter which allows me to choose which contrasts to perform instead of literally doing every possible comparison. That is well suited for these operations:
In a container, the following appears to fail with:
“error code 1 from Lapack routine ‘dgesdd’”
Running it manually outside the container results in it working without error. I assume therefore that the problem lies in the compilation flags of LAPACK in the container.
Note: This problem was fixed by removing some parallelization.
inclusion_de <- all_pairwise(
v3_pairwise_input, filter = "simple", model_fstring = default_fstring,
keepers = inclusions, model_svs = "svaseq")## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 5517 low-count genes (19908 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Setting 394442 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 12 comparisons.
inclusion_tables <- combine_de_tables(
inclusion_de, keepers = inclusions, label_column = label_column,
excel = glue("04inclusion_comparisons/inclusion_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_het_dlgn_vs_p15_wt_dlgn 536 971 668
## 2 p08_het_dlgn_vs_p08_wt_dlgn 15 79 57
## 3 p15_ko_dlgn_vs_p15_wt_dlgn 903 1333 1029
## 4 p08_ko_dlgn_vs_p08_wt_dlgn 117 282 133
## 5 p15_het_retina_vs_p15_wt_retina 151 58 193
## 6 p08_het_retina_vs_p08_wt_retina 432 110 464
## 7 p15_ko_retina_vs_p15_wt_retina 107 34 178
## 8 p08_ko_retina_vs_p08_wt_retina 533 155 551
## 9 p15_het_scn_vs_p15_wt_scn 11 8 39
## 10 p08_het_scn_vs_p08_wt_scn 48 8 57
## 11 p15_ko_scn_vs_p15_wt_scn 7 5 28
## 12 p08_ko_scn_vs_p08_wt_scn 31 26 83
## edger_sigdown limma_sigup limma_sigdown
## 1 1013 582 833
## 2 107 40 46
## 3 1346 837 1045
## 4 297 186 356
## 5 114 119 57
## 6 162 315 88
## 7 62 39 25
## 8 224 404 144
## 9 28 19 34
## 10 45 42 20
## 11 30 39 39
## 12 54 65 55
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## i Please use tidy evaluation idioms with `aes()`.
## i See also `vignette("ggplot2-in-packages")` for more information.
## i The deprecated feature was likely used in the UpSetR package.
## Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## i Please use the `linewidth` argument instead.
## i The deprecated feature was likely used in the UpSetR package.
## Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Plot describing unique/shared genes in a differential expression table.
## 202603: I successfully recapitulated previous non-container result.
inclusion_sig <- extract_significant_genes(
inclusion_tables, lfc = lfc_cutoff, p = adjp_cutoff, according_to = "deseq",
excel = glue("04inclusion_comparisons/inclusion_sig-v{ver}.xlsx"))
inclusion_sig## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 0.1 adj P cutoff: 0.1
## deseq_up deseq_down
## p15_het_dlgn 2067 2381
## p08_het_dlgn 180 349
## p15_ko_dlgn 2544 2633
## p08_ko_dlgn 397 572
## p15_het_retina 402 223
## p08_het_retina 976 380
## p15_ko_retina 347 93
## p08_ko_retina 1551 859
## p15_het_scn 15 9
## p08_het_scn 225 55
## p15_ko_scn 8 6
## p08_ko_scn 112 116
202505: A strange thing happened here in this iteration: the plot of the significant genes is the exact same as the previous iteration; but the table of numbers of genes looks different.
For example, the previous table showed: p15_het_dlgn with 2067 up and 2381 down. The plot shows exactly that; but the new table shows 607 up and 1229 down. Let us check the actual data structure and see what is up?
I think I get it: when we do the extract_significant_genes above, we explicitly set a non-standard p-value and logFC because we are explicitly attempting to use a very loose definition of the set of genes which are in greater abundance than their most similar wild-type. However, when I create the barplot of significant genes; those values are explicitly set to 0,1,2 logFCs and p-value 0.05. Therefore, what I need to do, in order to check consistency, is to repeat this call but with the default FC/p values and see what the numbers look like.
test_inclusion <- extract_significant_genes(
inclusion_tables, according_to = "deseq", excel = "excel/default_inclusion_sig.xlsx")
test_inclusion## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## p15_het_dlgn 536 971
## p08_het_dlgn 15 79
## p15_ko_dlgn 903 1333
## p08_ko_dlgn 117 282
## p15_het_retina 151 58
## p08_het_retina 432 110
## p15_ko_retina 107 34
## p08_ko_retina 533 155
## p15_het_scn 11 8
## p08_het_scn 48 8
## p15_ko_scn 7 5
## p08_ko_scn 31 26
Yeah, I think this makes sense; what I need to do: change the significant bar plot so that it uses the lfc cutoff argument as the second of its 3 cutoffs. That should ensure that these numbers are consistent across analyses and parameters provided.
## [1] 2067 75
test_all_up <- inclusion_tables$data$p15_het_dlgn[["deseq_logfc"]] > 0.1 &
inclusion_tables$data$p15_het_dlgn[["deseq_adjp"]] <= 0.1
summary(test_all_up)## Mode FALSE TRUE
## logical 17841 2067
Ohh, I get it, when I was testing this out manually, I set the logFC to 1.0 instead of the very minimal 0.1 we have been using for this!
Rashmi asked to see the comparisons against wt; I will name each file xw to show that it is x vs wt. for whatever other parameters are being examined. It is likely that some colors will be wrong because this is my first time creating these plots and we are doing them manually.
allc <- color_choices[["all"]]
table_name <- "p15_het_dlgn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_het_dlgn"
denom <- "p15_wt_dlgn"
hw_p15_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p15_dlgn_volcano.pdf", width = 9, height = 9)## Warning in pp(file = "05inclusion_volcano_ma/hw_p15_dlgn_volcano.pdf", width =
## 9, : The directory: 05inclusion_volcano_ma does not exist, will attempt to
## create it.
hw_p15_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p15_dlgn_ma.pdf", width = 9, height = 9)
hw_p15_dlgn_ma[["plot"]]
plotted <- dev.off()
hw_p15_dlgn_ma[["plot"]]table_name <- "p08_het_dlgn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_het_dlgn"
denom <- "p08_wt_dlgn"
hw_p08_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p08_dlgn_volcano.pdf", width = 9, height = 9)
hw_p08_dlgn_volcano[["plot"]]
plotted <- dev.off()
hw_p08_dlgn_volcano[["plot"]]hw_p08_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p08_dlgn_ma.pdf", width = 9, height = 9)
hw_p08_dlgn_ma[["plot"]]
plotted <- dev.off()
hw_p08_dlgn_ma[["plot"]]table_name <- "p15_ko_dlgn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_ko_dlgn"
denom <- "p15_wt_dlgn"
kw_p15_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p15_dlgn_volcano.pdf", width = 9, height = 9)
kw_p15_dlgn_volcano[["plot"]]
plotted <- dev.off()
kw_p15_dlgn_volcano[["plot"]]kw_p15_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/kw_p15_dlgn_ma.pdf", width = 9, height = 9)
kw_p15_dlgn_ma[["plot"]]
plotted <- dev.off()
kw_p15_dlgn_ma[["plot"]]table_name <- "p08_ko_dlgn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_ko_dlgn"
denom <- "p08_wt_dlgn"
kw_p08_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p08_dlgn_volcano.pdf", width = 9, height = 9)
kw_p08_dlgn_volcano[["plot"]]
plotted <- dev.off()
kw_p08_dlgn_volcano[["plot"]]kw_p08_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/kw_p08_dlgn_ma.pdf", width = 9, height = 9)
kw_p08_dlgn_ma[["plot"]]
plotted <- dev.off()
kw_p08_dlgn_ma[["plot"]]table_name <- "p15_het_retina"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_het_retina"
denom <- "p15_wt_retina"
hw_p15_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p15_retina_volcano.pdf", width = 9, height = 9)
hw_p15_retina_volcano[["plot"]]
plotted <- dev.off()
hw_p15_retina_volcano[["plot"]]hw_p15_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p15_retina_ma.pdf", width = 9, height = 9)
hw_p15_retina_ma[["plot"]]
plotted <- dev.off()
hw_p15_retina_ma[["plot"]]table_name <- "p08_het_retina"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_het_retina"
denom <- "p08_wt_retina"
hw_p08_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p08_retina_volcano.pdf", width = 9, height = 9)
hw_p08_retina_volcano[["plot"]]
plotted <- dev.off()
hw_p08_retina_volcano[["plot"]]hw_p08_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p08_retina_ma.pdf", width = 9, height = 9)
hw_p08_retina_ma[["plot"]]
plotted <- dev.off()
hw_p08_retina_ma[["plot"]]table_name <- "p15_ko_retina"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_ko_retina"
denom <- "p15_wt_retina"
kw_p15_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p15_retina_volcano.pdf", width = 9, height = 9)
kw_p15_retina_volcano[["plot"]]
plotted <- dev.off()
kw_p15_retina_volcano[["plot"]]kw_p15_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/kw_p15_retina_ma.pdf", width = 9, height = 9)
kw_p15_retina_ma[["plot"]]
plotted <- dev.off()
kw_p15_retina_ma[["plot"]]table_name <- "p08_ko_retina"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_ko_retina"
denom <- "p08_wt_retina"
kw_p08_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p08_retina_volcano.pdf", width = 9, height = 9)
kw_p08_retina_volcano[["plot"]]
plotted <- dev.off()
kw_p08_retina_volcano[["plot"]]kw_p08_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma_p08_retina_ma.pdf", width = 9, height = 9)
kw_p08_retina_ma[["plot"]]
plotted <- dev.off()
kw_p08_retina_ma[["plot"]]table_name <- "p15_het_scn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_het_scn"
denom <- "p15_wt_scn"
hw_p15_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p15_scn_volcano.pdf", width = 9, height = 9)
hw_p15_scn_volcano[["plot"]]
plotted <- dev.off()
hw_p15_scn_volcano[["plot"]]hw_p15_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p15_scn_ma.pdf", width = 9, height = 9)
hw_p15_scn_ma[["plot"]]
plotted <- dev.off()
hw_p15_scn_ma[["plot"]]table_name <- "p08_het_scn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_het_scn"
denom <- "p08_wt_scn"
hw_p08_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/hw_p08_scn_volcano.pdf", width = 9, height = 9)
hw_p08_scn_volcano[["plot"]]
plotted <- dev.off()
hw_p08_scn_volcano[["plot"]]hw_p08_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/hw_p08_scn_ma.pdf", width = 9, height = 9)
hw_p08_scn_ma[["plot"]]
plotted <- dev.off()
hw_p08_scn_ma[["plot"]]table_name <- "p15_ko_scn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p15_ko_scn"
denom <- "p15_wt_scn"
kw_p15_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p15_scn_volcano.pdf", width = 9, height = 9)
kw_p15_scn_volcano[["plot"]]
plotted <- dev.off()
kw_p15_scn_volcano[["plot"]]kw_p15_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/kw_p15_scn_ma.pdf", width = 9, height = 9)
kw_p15_scn_ma[["plot"]]
plotted <- dev.off()
kw_p15_scn_ma[["plot"]]table_name <- "p08_ko_scn"
table <- inclusion_tables[["data"]][[table_name]]
num <- "p08_ko_scn"
denom <- "p08_wt_scn"
kw_p08_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = allc[[denom]], color_high = allc[[num]],
label_column = "mgi_symbol", label = 10, alpha = 1.0,
size = 4)
pp(file = "05inclusion_volcano_ma/kw_p08_scn_volcano.pdf", width = 9, height = 9)
kw_p08_scn_volcano[["plot"]]
plotted <- dev.off()
kw_p08_scn_volcano[["plot"]]kw_p08_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = allc[[denom]], color_high = allc[[num]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = 10)
pp(file = "05inclusion_volcano_ma/kw_p08_scn_ma.pdf", width = 9, height = 9)
kw_p08_scn_ma[["plot"]]
plotted <- dev.off()
kw_p08_scn_ma[["plot"]]See the shared/unique genes in these sets.
inclusion_upsets <- upsetr_sig(inclusion_sig)
inclusion_intersects <- write_upset_groups(
inclusion_upsets, excel = "04inclusion_comparison/inclusion_gene_groups.xlsx")Now, using that function, pull out the gene IDs of genes we do not trust because they were too high in wt for every contrast we are likely to perform.
The following was a modified version of the inclusion function which is somewhat more restrictive.
extract_inclusions_strict <- function(inclusion_sig, inclusion_tables, inclusions, keepers,
all_genes, according_to = "deseq", which = "ups") {
retlist <- list()
table_names <- names(inclusion_sig[[according_to]][[which]])
for (c_num in seq_along(keepers)) {
contrast <- names(keepers)[c_num]
numerator_name <- keepers[[c_num]][1]
denominator_name <- keepers[[c_num]][2]
## In my new branch I cleaned up the sanitizer function for contrasts so this is not needed.
## The following two lines are no longer needed because of the cleanups I performed.
##numerator_name <- gsub(x = numerator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
##denominator_name <- gsub(x = denominator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
numerator_table <- inclusion_sig[[according_to]][[which]][[numerator_name]]
numerator_genes <- rownames(numerator_table)
denominator_table <- inclusion_sig[[according_to]][[which]][[denominator_name]]
denominator_genes <- rownames(denominator_table)
df_columns <- paste0("deseq_", c("logfc", "adjp", "den"))
included_num <- inclusion_tables[["data"]][[numerator_name]][, df_columns]
colnames(included_num) <- c("numerator_vs_wt_logfc", "numerator_vs_wt_adjp", "num_wt_mean_exprs")
included_den <- inclusion_tables[["data"]][[denominator_name]][, df_columns]
colnames(included_den) <- c("denominator_vs_wt_logfc", "denominator_vs_wt_adjp", "den_wt_mean_exprs")
## I think this is where things went wrong,
## compare this modified line to the original to prove it.
included_df <- merge(included_num, included_den, by = "row.names")
## Previously, I did not specify the merge action, all = FALSE by default.
## This then will result in a difference in the rows observed
## included_df <- merge(included_num, included_den, by = "row.names", all = FALSE)
rownames(included_df) <- included_df[["Row.names"]]
included_df[["Row.names"]] <- NULL
concatenated_genes <- c(numerator_genes, denominator_genes)
both_gene_idx <- duplicated(concatenated_genes)
genes_in_both <- concatenated_genes[both_gene_idx]
message("The set of unique genes higher in ", numerator_name,
" vs. wt is ", length(numerator_genes), ".")
message("The set of unique genes higher in ", denominator_name,
" vs. wt is ", length(denominator_genes), ".")
message("The intersection of them is ", length(genes_in_both), " genes.")
include_name <- paste0("inc_", contrast)
include_idx <- all_genes %in% genes_in_both
include_genes <- all_genes[include_idx]
df_name <- paste0("df_", contrast)
retlist[[df_name]] <- included_df
written_inclusion <- write_xlsx(
data = included_df,
excel = glue("07included_strict_genes_excel/{include_name}-v{ver}.xlsx"))
retlist[[include_name]] <- include_genes
retlist[[contrast]] <- include_genes
}
return(retlist)
}This is the pre-202505 version of this function.
extract_inclusions <- function(inclusion_sig, inclusion_tables, inclusions, keepers, all_genes,
according_to = "deseq", which = "ups") {
retlist <- list()
table_names <- names(inclusion_sig[[according_to]][[which]])
for (c_num in seq_along(keepers)) {
contrast <- names(keepers)[c_num]
numerator_name <- keepers[[c_num]][1]
denominator_name <- keepers[[c_num]][2]
## In my new branch I cleaned up the sanitizer function for contrasts so this is not needed.
## The following two lines are no longer needed because of the cleanups I performed.
##numerator_name <- gsub(x = numerator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
##denominator_name <- gsub(x = denominator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
numerator_table <- inclusion_sig[[according_to]][[which]][[numerator_name]]
numerator_genes <- rownames(numerator_table)
denominator_table <- inclusion_sig[[according_to]][[which]][[denominator_name]]
denominator_genes <- rownames(denominator_table)
df_columns <- paste0("deseq_", c("logfc", "adjp", "den"))
included_num <- inclusion_tables[["data"]][[numerator_name]][, df_columns]
colnames(included_num) <- c("numerator_vs_wt_logfc", "numerator_vs_wt_adjp", "num_wt_mean_exprs")
included_den <- inclusion_tables[["data"]][[denominator_name]][, df_columns]
colnames(included_den) <- c("denominator_vs_wt_logfc", "denominator_vs_wt_adjp", "den_wt_mean_exprs")
included_df <- merge(included_num, included_den, by = "row.names")
rownames(included_df) <- included_df[["Row.names"]]
included_df[["Row.names"]] <- NULL
include_genes <- unique(c(numerator_genes, denominator_genes))
message("The set of unique genes higher in ", numerator_name,
" vs. wt is ", length(numerator_genes), ".")
message("The set of unique genes higher in ", denominator_name,
" vs. wt is ", length(denominator_genes), ".")
message("The unique union of them is ", length(include_genes), " genes.")
include_name <- paste0("inc_", contrast)
include_idx <- all_genes %in% include_genes
include_genes <- all_genes[include_idx]
df_name <- paste0("df_", contrast)
retlist[[df_name]] <- included_df
written_inclusion <- write_xlsx(data = included_df,
excel = glue("included_genes/{include_name}-v{ver}.xlsx"))
retlist[[include_name]] <- include_genes
retlist[[contrast]] <- include_genes
}
return(retlist)
}Here is the full set of gene IDs
In the following blocks I am including the union of genes observed higher than wt in either of the numerator or denominator for each contrast.
time_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
time_keepers, all_genes)## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The unique union of them is 2113 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The unique union of them is 2716 genes.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1086 genes.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1664 genes.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 238 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The unique union of them is 120 genes.
location_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
location_keepers, all_genes)## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1134 genes.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 2361 genes.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1883 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The unique union of them is 2843 genes.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1188 genes.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 417 genes.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1624 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The unique union of them is 355 genes.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 402 genes.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The unique union of them is 2080 genes.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The unique union of them is 493 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The unique union of them is 2550 genes.
genotype_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
genotype_keepers, all_genes)## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The unique union of them is 501 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The unique union of them is 2773 genes.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1760 genes.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 571 genes.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 312 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The unique union of them is 18 genes.
time_inclusions_strict <- extract_inclusions_strict(inclusion_sig, inclusion_tables, inclusions,
time_keepers, all_genes)## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The intersection of them is 134 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The intersection of them is 225 genes.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The intersection of them is 292 genes.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The intersection of them is 234 genes.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The intersection of them is 2 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The intersection of them is 0 genes.
location_inclusions_strict <- extract_inclusions_strict(inclusion_sig, inclusion_tables, inclusions,
location_keepers, all_genes)## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The intersection of them is 22 genes.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The intersection of them is 108 genes.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The intersection of them is 65 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The intersection of them is 48 genes.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The intersection of them is 13 genes.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The intersection of them is 0 genes.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The intersection of them is 39 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The intersection of them is 0 genes.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The intersection of them is 3 genes.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The intersection of them is 2 genes.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The intersection of them is 16 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The intersection of them is 2 genes.
genotype_inclusions_strict <- extract_inclusions_strict(inclusion_sig, inclusion_tables, inclusions,
genotype_keepers, all_genes)## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The intersection of them is 76 genes.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The intersection of them is 1838 genes.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The intersection of them is 767 genes.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The intersection of them is 178 genes.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The intersection of them is 25 genes.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The intersection of them is 5 genes.
Up above Theresa performed a 0.25 log2FC and 0.05 adjp filter which provided a set of 2,640 genes observed higher in the p08 het retinas vs. wt retinas. I should see that in this inclusion_sig data structure.
There is an important caveat though: in Theresa’s filter above, she did a DE of only the retina samples but I did all samples. I expected that this would result in basically the same result (I actually assumed I would get a few more genes), but instead it appears to have retrieved a significantly smaller number of genes (about 1/2, happily they pretty much all appear in the previous filter). As a result, I am going to try relaxing my constraints slightly to see if I can recapitulate her filter (which would match Theresa’s later filter, though I guess that in turn will lead to a smaller set of genes compared to her later, relaxed 0.1 filter).
comparison <- inclusion_sig[["deseq"]][["ups"]][["p08_het_retina"]]
comp <- list(
"taa" = taa_keepers,
"new" = rownames(comparison))
test_comparison <- Vennerable::Venn(comp)
Vennerable::plot(test_comparison)I want to have a little function which, given a contrast of interest, will extract the gene sets which should be included/excluded given the above.
write_all_cp <- function(all_cp, prefix = "12", suffix = "") {
all_written <- list()
for (g in seq_len(length(all_cp))) {
name <- names(all_cp)[g]
datum <- all_cp[[name]]
filename <- glue("{prefix}enrichment_excel/{name}_cprofiler{suffix}-v{ver}.xlsx")
written <- sm(write_cp_data(datum, excel = filename))
all_written[[g]] <- written
}
return(all_written)
}
write_all_gp <- function(all_gp, prefix = "13", suffix = "") {
all_written <- list()
for (g in seq_len(length(all_gp))) {
name <- names(all_gp)[g]
datum <- all_gp[[name]]
filename <- glue("{prefix}enrichment_excel/{name}_gprofiler{suffix}-v{ver}.xlsx")
written <- sm(write_gprofiler_data(datum, excel = filename))
all_written[[g]] <- written
}
return(all_written)
}
write_all_en <- function(all_en, prefix = "14", suffix = "") {
all_written <- list()
for (e in seq_len(length(all_en))) {
name <- names(all_en)[e]
datum <- all_en[[name]]
filename <- glue("{prefix}enrichment_excel/{name}_enricher{suffix}-v{ver}.xlsx")
written <- sm(write_enricher_data(datum, excel = filename))
all_written[[e]] <- written
}
return(all_written)
}Now, using that function, pull out the gene IDs of genes we do not trust because they were too high in wt for every contrast we are likely to perform.
all_genes <- rownames(assay(v3_pairwise_input))
time_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
time_keepers, all_genes)## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The unique union of them is 2113 genes.
## Deleting the file included_genes/inc_t_het_dlgn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The unique union of them is 2716 genes.
## Deleting the file included_genes/inc_t_ko_dlgn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1086 genes.
## Deleting the file included_genes/inc_t_het_retina-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1664 genes.
## Deleting the file included_genes/inc_t_ko_retina-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 238 genes.
## Deleting the file included_genes/inc_t_het_scn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The unique union of them is 120 genes.
## Deleting the file included_genes/inc_t_ko_scn-v20260331.xlsx before writing the tables.
location_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
location_keepers, all_genes)## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1134 genes.
## Deleting the file included_genes/inc_dr_p08_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 2361 genes.
## Deleting the file included_genes/inc_dr_p15_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1883 genes.
## Deleting the file included_genes/inc_dr_p08_ko-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The unique union of them is 2843 genes.
## Deleting the file included_genes/inc_dr_p15_ko-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1188 genes.
## Deleting the file included_genes/inc_sr_p08_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 417 genes.
## Deleting the file included_genes/inc_sr_p15_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The unique union of them is 1624 genes.
## Deleting the file included_genes/inc_sr_p08_ko-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The unique union of them is 355 genes.
## Deleting the file included_genes/inc_sr_p15_ko-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 402 genes.
## Deleting the file included_genes/inc_ds_p08_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The unique union of them is 2080 genes.
## Deleting the file included_genes/inc_ds_p15_het-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The unique union of them is 493 genes.
## Deleting the file included_genes/inc_ds_p08_ko-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The unique union of them is 2550 genes.
## Deleting the file included_genes/inc_ds_p15_ko-v20260331.xlsx before writing the tables.
genotype_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
genotype_keepers, all_genes)## The set of unique genes higher in p08_ko_dlgn vs. wt is 397.
## The set of unique genes higher in p08_het_dlgn vs. wt is 180.
## The unique union of them is 501 genes.
## Deleting the file included_genes/inc_kh_p08_dlgn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2544.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2067.
## The unique union of them is 2773 genes.
## Deleting the file included_genes/inc_kh_p15_dlgn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_retina vs. wt is 1551.
## The set of unique genes higher in p08_het_retina vs. wt is 976.
## The unique union of them is 1760 genes.
## Deleting the file included_genes/inc_kh_p08_retina-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_retina vs. wt is 347.
## The set of unique genes higher in p15_het_retina vs. wt is 402.
## The unique union of them is 571 genes.
## Deleting the file included_genes/inc_kh_p15_retina-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_scn vs. wt is 112.
## The set of unique genes higher in p08_het_scn vs. wt is 225.
## The unique union of them is 312 genes.
## Deleting the file included_genes/inc_kh_p08_scn-v20260331.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 8.
## The set of unique genes higher in p15_het_scn vs. wt is 15.
## The unique union of them is 18 genes.
## Deleting the file included_genes/inc_kh_p15_scn-v20260331.xlsx before writing the tables.
genotype_de <- all_pairwise(v3_pairwise_input, filter = TRUE, model_fstring = default_fstring,
keepers = genotype_keepers, model_svs = "svaseq")## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Setting 109425 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
location_de <- all_pairwise(v3_pairwise_input, filter = TRUE, model_fstring = default_fstring,
keepers = location_keepers, model_svs = "svaseq")## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Setting 109425 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 12 comparisons.
time_de <- all_pairwise(v3_pairwise_input, filter = TRUE, model_fstring = default_fstring,
keepers = time_keepers, model_svs = "svaseq")## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## I think this is failing? SummarizedExperiment
## Basic step 0/3: Transforming data.
## Setting 109425 entries to zero.
## This received a matrix of SVs.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
It is near here when the computer sometimes fails with no more tempfiles. In another window I am messing with tempfile() in R to try to understand where it is going off the rails…
I will start with the tables and no inclusions so I can check my work.
In this first block I will explain a little more thoroughly what is going on:
genotype_tables_full <- combine_de_tables(
genotype_de, keepers = genotype_keepers, label_column = label_column,
fancy = TRUE,
excel = glue("08full_contrasts_excel/genotype_full_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_ko_dlgn_vs_p08_het_dlgn 24 2 42
## 2 p15_ko_dlgn_vs_p15_het_dlgn 50 2 81
## 3 p08_ko_retina_vs_p08_het_retina 6 2 9
## 4 p15_ko_retina_vs_p15_het_retina 9 5 6
## 5 p08_ko_scn_vs_p08_het_scn 54 135 80
## 6 p15_ko_scn_vs_p15_het_scn 0 16 3
## edger_sigdown limma_sigup limma_sigdown
## 1 1 41 3
## 2 3 0 0
## 3 2 3 1
## 4 4 0 3
## 5 139 32 28
## 6 29 0 1
## Plot describing unique/shared genes in a differential expression table.
genotype_sig_full <- extract_significant_genes(
genotype_tables_full, according_to = "deseq",
excel = glue("08full_contrasts_excel/genotype_full_sig-v{ver}.xlsx"))
genotype_sig_full## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_dlgn 24 2
## kh_p15_dlgn 50 2
## kh_p08_retina 6 2
## kh_p15_retina 9 5
## kh_p08_scn 54 135
## kh_p15_scn 0 16
In this run, we will search the full set of genes, next we will only do the inclusions.
genotype_full_gp <- all_gprofiler(genotype_sig_full, species = "mmusculus",
excel = "09full_contrasts_enrich/genotype_full_gprofiler.xlsx")
genotype_full_cp <- all_cprofiler(genotype_sig_full, genotype_tables_full,
orgdb = "org.Mm.eg.db", go_level = go_level, organism = "mouse",
orgdb_from = orgdb_from, max_groupsize = max_groupsize,
excel = "09full_contrasts_enrich/genotype_full_cprofiler.xlsx")## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
genotype_full_upset <- upsetr_sig(genotype_sig_full)
genotype_full_intersects <- write_upset_groups(genotype_full_upset,
excel = "09full_contrasts_intersections/genotype_full_gene_groups.xlsx")Now separate the various genotype tables and perform the inclusions of the genes with relatively low wt values.
genotype_tables <- list()
genotype_sig <- list()
genotype_gp <- list()
genotype_cp <- list()
genotype_en <- list()
for (k in seq_along(genotype_keepers)) {
name <- names(genotype_keepers)[k]
message("Examining ", name)
keeper <- genotype_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- genotype_inclusions[[include_df_name]]
includes <- genotype_inclusions[[include_name]]
summary(rownames(genotype_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("10genotype_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("10genotype_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
genotype_tables[[name]] <- combine_de_tables(
genotype_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(genotype_tables[[name]])
genotype_sig[[name]] <- extract_significant_genes(
genotype_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(genotype_sig[[name]])
num_rows <- nrow(genotype_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(genotype_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows >= 10) {
message("Performing gprofiler/clusterProfiler.")
genotype_gp[[name]] <- all_gprofiler(genotype_sig[[name]], species = "mmusculus")
genotype_cp[[name]] <- all_cprofiler(
genotype_sig[[name]], genotype_tables[[name]],
orgdb = "org.Mm.eg.db", orgdb_from = orgdb_from,
go_level = go_level, max_groupsize = max_groupsize, organism = "mouse")
#if (!is.null(get0("m2_gsc"))) {
# genotype_en[[name]] <- all_enricher(genotype_sig[[name]], gsc = m2_gsc,
# orgdb = "org.Mm.eg.db", from = "ENSEMBL", to = "SYMBOL")
#}
gp_written <- write_all_gp(genotype_gp[[name]], prefix = "11")
cp_written <- write_all_cp(genotype_cp[[name]], prefix = "11")
#en_written <- write_all_en(genotype_en[[name]])
} else {
warning("There are less than 10 genes up and down in the ", name, " comparison.")
message("There are less than 10 genes up and down in the ", name, " comparison.")
}
}## Examining kh_p08_dlgn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_ko_dlgn_vs_p08_het_dlgn 23 1 30
## edger_sigdown limma_sigup limma_sigdown
## 1 0 26 0
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_dlgn 23 1
## There are 24 significant up and down genes.
## Performing gprofiler/clusterProfiler.
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
Plot the results separately.
for (k in seq_along(genotype_keepers)) {
name <- names(genotype_keepers)[k]
message("Examining ", name)
keeper <- genotype_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- genotype_inclusions[[include_df_name]]
includes <- genotype_inclusions[[include_name]]
summary(rownames(genotype_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
num_rows <- nrow(genotype_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(genotype_sig[[name]][["deseq"]][["downs"]][[name]])
nrow(genotype_sig[[name]][["deseq"]][["ups"]][[name]])
nrow(genotype_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
## #1 is up and #2 is down, avoiding typeos here.
num_objects <- length(genotype_cp[[name]])
if (num_objects == 0) {
warning("Something went wrong in all_cprofiler.")
} else {
upp <- which(grepl(x = names(genotype_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(genotype_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- genotype_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- genotype_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- genotype_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig)
mf_tree_up_filename <- glue("12clusterProfiler_plots/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("12clusterProfiler_plots/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("12clusterProfiler_plots/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("12clusterProfiler_plots/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("12clusterProfiler_plots/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("12clusterProfiler_plots/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- genotype_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- genotype_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- genotype_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("12clusterProfiler_plots/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("12clusterProfiler_plots/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_down_filename <- glue("12clusterProfiler_plots/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("12clusterProfiler_plots/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("12clusterProfiler_plots/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, cateogories = go_categories)
bp_bar_down_filename <- glue("12clusterProfiler_plots/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining kh_p08_dlgn
## There are 24 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_dlgn
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p08_retina
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_retina
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p08_scn
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_scn
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
A few specific plots of interest: Colenso asked to label a few genes for the knockout/het p08_retinas, p08_scn, and p08_dlgn: either the top-15 or all significant. I am pretty sure if I tell it 15 and there are not that many, it will just do the significant? Let us find out!
table_name <- "kh_p08_retina"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting <- c("Opn4", "Gm9008", "Lrr1", "Cnbd1")
kh_p08_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = colors[["het_retina"]], color_high = colors[["ko_retina"]],
label_column = "mgi_symbol", label = interesting, alpha = 1.0,
outline = outline, size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
## Warning in pp(file = "13genotype_ma_volcano/kh_p08_retina_volcano.pdf", : The
## directory: 13genotype_ma_volcano does not exist, will attempt to create it.
## Error:
## ! object 'kh_p08_retina_volcano' not found
## Error:
## ! object 'kh_p08_retina_volcano' not found
kh_p08_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["het_retina"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", outline = outline,
label = interesting)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "13genotype_ma_volcano/kh_p08_retina_ma.pdf", width = 9, height = 9)
kh_p08_retina_ma[["plot"]]## Error:
## ! object 'kh_p08_retina_ma' not found
## Error:
## ! object 'kh_p08_retina_ma' not found
I am going to make an executive decision for this plot, 15 is too many and makes it crazy cluttered.
table_name <- "kh_p08_scn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Fign", "Nrn1", "Dpysl2", "Actb", "Fgf9", "Otx2", "Sec23",
"Ncam1", "Map4", "Sec22b", "Nlgn3", "Marcks", "Cd47",
"Dpysl3", "Lin7c", "Cadm1", "Snx12", "Rhoa", "Inpp5f",
"Atg12", "Set", "Gsk3b", "Pdcd4", "Gabra2", "Tmco1", "Anapc16")
kh_p08_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq
_adjp",
label_column = "mgi_symbol", label = interesting_genes, size = 4, alpha = 1.0,
outline = outline, color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]])## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p08_scn_volcano.pdf", width = 9, height = 9)
kh_p08_scn_volcano[["plot"]]## Error:
## ! object 'kh_p08_scn_volcano' not found
## Error:
## ! object 'kh_p08_scn_volcano' not found
## why in the crap is it double-labelling!?
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
outline = outline, p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'kh_p08_scn_ma' not found
## Error:
## ! object 'kh_p08_scn_ma' not found
table_name <- "kh_p08_scn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting_genes <- c(
"Anapc16", "Gabra2", "Tmco1", "Sod2", "Fgf9", "Pdcd4", "Rhoa", "Gsk3b", "Foxp1",
"Ncam1", "Marcks", "Fign", "Dpysl3", "Inpp5f", "Cadm1", "Map4", "Ugcg", "Elovl4",
"Elavl1", "Cfl2", "Tnnt1", "Gnb1", "Impact", "Nrn1", "Nlgn3", "Actb", "Cd47",
"Sec22b", "Slc17a7", "Vglut1", "Actb", "B4galt5", "Foxp1", "Otx2", "Lin7c",
"Snx12", "Atg12", "Set")
kh_p08_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
outline = outline, label_column = "mgi_symbol", label = interesting_genes, size = 4, alpha = 1.0)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p08_scn_volcano_v2.pdf", width = 9, height = 9)
kh_p08_scn_volcano[["plot"]]## Error:
## ! object 'kh_p08_scn_volcano' not found
## Error:
## ! object 'kh_p08_scn_volcano' not found
## why in the crap is it double-labelling!?
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
outline = outline, p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "13genotype_ma_volcano/kh_p08_scn_ma_v2.pdf", width = 9, height = 9)
kh_p08_scn_ma[["plot"]]## Error:
## ! object 'kh_p08_scn_ma' not found
## Error:
## ! object 'kh_p08_scn_ma' not found
table_name <- "kh_p08_dlgn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
kh_p08_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_dlgn"]], color_high = colors[["het_dlgn"]],
outline = outline, label_column = "mgi_symbol", label = 10, size = 4, alpha = 1.0)## Warning in ggrepel::geom_text_repel(data = df_subset, nudge_x = nudge_x, :
## Ignoring unknown parameters: `outline`
pp(file = "13genotype_ma_volcano/kh_p08_dlgn_volcano.pdf", width = 9, height = 9)
kh_p08_dlgn_volcano[["plot"]]
plotted <- dev.off()
kh_p08_dlgn_volcano[["plot"]]## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_dlgn"]], color_high = colors[["het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = 10, outline = outline)
pp(file = "13genotype_ma_volcano/kh_p08_dlgn_ma.pdf", width = 9, height = 9)
kh_p08_dlgn_ma[["plot"]]
plotted <- dev.off()
kh_p08_dlgn_ma[["plot"]]When last I ran this manually, it did not double-label, hopefully that remains true in the container.
table_name <- "kh_p15_retina"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting <- c("Opn4", "Gm9008", "Lrr1", "Cnbd1")
kh_p15_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp", fill = "black",
color_low = colors[["ko_retina"]], color_high = colors[["het_retina"]],
label_column = "mgi_symbol", label = interesting, alpha = 1.0,
outline = outline, size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p15_retina_volcano.pdf", width = 9, height = 9)
kh_p15_retina_volcano[["plot"]]## Error:
## ! object 'kh_p15_retina_volcano' not found
## Error:
## ! object 'kh_p15_retina_volcano' not found
## why in the crap is it double-labelling!?
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p15_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["het_retina"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "13genotype_ma_volcano/kh_p15_retina_ma.pdf", width = 9, height = 9)
kh_p15_retina_ma[["plot"]]## Error:
## ! object 'kh_p15_retina_ma' not found
## Error:
## ! object 'kh_p15_retina_ma' not found
table_name <- "kh_p15_scn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Fign", "Nrn1", "Dpysl2", "Actb", "Fgf9", "Otx2", "Sec23",
"Ncam1", "Map4", "Sec22b", "Nlgn3", "Marcks", "Cd47",
"Dpysl3", "Lin7c", "Cadm1", "Snx12", "Rhoa", "Inpp5f",
"Atg12", "Set", "Gsk3b", "Pdcd4", "Gabra2", "Tmco1", "Anapc16")
kh_p15_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
label_column = "mgi_symbol", size = 4, alpha = 1.0,
outline = outline, color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]])## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p15_scn_volcano.pdf", width = 9, height = 9)
kh_p15_scn_volcano[["plot"]]## Error:
## ! object 'kh_p15_scn_volcano' not found
## Error:
## ! object 'kh_p15_scn_volcano' not found
kh_p15_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'kh_p15_scn_ma' not found
## Error:
## ! object 'kh_p15_scn_ma' not found
Round 2 with a separate gene set.
table_name <- "kh_p15_scn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
interesting_genes <- c(
"Anapc16", "Gabra2", "Tmco1", "Sod2", "Fgf9", "Pdcd4", "Rhoa", "Gsk3b", "Foxp1",
"Ncam1", "Marcks", "Fign", "Dpysl3", "Inpp5f", "Cadm1", "Map4", "Ugcg", "Elovl4",
"Elavl1", "Cfl2", "Tnnt1", "Gnb1", "Impact", "Nrn1", "Nlgn3", "Actb", "Cd47",
"Sec22b", "Slc17a7", "Vglut1", "Actb", "B4galt5", "Foxp1", "Otx2", "Lin7c",
"Snx12", "Atg12", "Set")
kh_p15_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
outline = outline, label_column = "mgi_symbol", label = interesting_genes, size = 4, alpha = 1.0)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p15_scn_volcano_v2.pdf", width = 9, height = 9)
kh_p15_scn_volcano[["plot"]]## Error:
## ! object 'kh_p15_scn_volcano' not found
## Error:
## ! object 'kh_p15_scn_volcano' not found
kh_p15_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_scn"]], color_high = colors[["het_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "13genotype_ma_volcano/kh_p15_scn_ma_v2.pdf", width = 9, height = 9)
kh_p15_scn_ma[["plot"]]## Error:
## ! object 'kh_p15_scn_ma' not found
## Error:
## ! object 'kh_p15_scn_ma' not found
table_name <- "kh_p15_dlgn"
table_input <- genotype_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
kh_p15_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_dlgn"]], color_high = colors[["het_dlgn"]],
outline = outline, label_column = "mgi_symbol", label = 10, size = 4, alpha = 1.0)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "13genotype_ma_volcano/kh_p15_dlgn_volcano.pdf", width = 9, height = 9)
kh_p15_dlgn_volcano[["plot"]]## Error:
## ! object 'kh_p15_dlgn_volcano' not found
## Error:
## ! object 'kh_p15_dlgn_volcano' not found
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p15_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_dlgn"]], color_high = colors[["het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = 10, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "13genotype_ma_volcano/kh_p15_dlgn_ma.pdf", width = 9, height = 9)
kh_p15_dlgn_ma[["plot"]]## Error:
## ! object 'kh_p15_dlgn_ma' not found
## Error:
## ! object 'kh_p15_dlgn_ma' not found
A query from Rashmi:
“I was discussing with Dr. Speer about the dLGN data and we found mostly retinal genes in dLGN Het/KO or time point comparison. Please check if those are not retina samples.”
I checked the samples etc and everything looks ok to me; perhaps I can use the results to look at this question in another way:
I will therefore load the p08_het_dlgn/p08_ko_dlgn table and compare it to the p08_het_retina/p08_ko_retina table directly. I think that if these turn out to be identical, then the hypothesis suggested by this query is correct.
Note, in order to do this, I must use the full tables, not the post-inclusion tables because I cannot guarantee that they will have identical gene IDs.
retina_table <- genotype_tables_full[["data"]][["kh_p08_retina"]]
dlgn_table <- genotype_tables_full[["data"]][["kh_p08_dlgn"]]
retina_subset <- retina_table[, c("ensembl_gene_id", "deseq_logfc")]
colnames(retina_subset) <- c("ID", "retina_logfc")
dlgn_subset <- dlgn_table[, c("ensembl_gene_id", "deseq_logfc")]
colnames(dlgn_subset) <- c("ID", "dlgn_logfc")
merged <- merge(retina_subset, dlgn_subset, by = "ID")
rownames(merged) <- make.names(merged[["ID"]], unique = TRUE)
merged[["ID"]] <- NULL
plotted <- plot_linear_scatter(merged)
pp(file = "images/kh_p08_retina_vs_dlgn_deseq_logfc_values.png")## Warning in pp(file = "images/kh_p08_retina_vs_dlgn_deseq_logfc_values.png"):
## The directory: images does not exist, will attempt to create it.
## png
## 2
Rashmi asked if we could also do the p15 for this comparison:
retina_table <- genotype_tables_full[["data"]][["kh_p15_retina"]]
dlgn_table <- genotype_tables_full[["data"]][["kh_p15_dlgn"]]
retina_subset <- retina_table[, c("ensembl_gene_id", "deseq_logfc")]
colnames(retina_subset) <- c("ID", "retina_logfc")
dlgn_subset <- dlgn_table[, c("ensembl_gene_id", "deseq_logfc")]
colnames(dlgn_subset) <- c("ID", "dlgn_logfc")
merged <- merge(retina_subset, dlgn_subset, by = "ID")
rownames(merged) <- make.names(merged[["ID"]], unique = TRUE)
merged[["ID"]] <- NULL
plotted <- plot_linear_scatter(merged)
pp(file = "images/kh_p15_retina_vs_dlgn_deseq_logfc_values.png")
plotted[["scatter"]]
dev.off()## png
## 2
genotype_strict_tables <- list()
genotype_strict_sig <- list()
genotype_strict_gp <- list()
genotype_strict_cp <- list()
genotype_strict_en <- list()
for (k in seq_along(genotype_keepers)) {
name <- names(genotype_keepers)[k]
message("Examining ", name)
keeper <- genotype_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_strict_df <- genotype_inclusions_strict[[include_df_name]]
includes_strict <- genotype_inclusions_strict[[include_name]]
summary(rownames(genotype_sig_full[["deseq"]][["ups"]][[name]]) %in% includes_strict)
include_filename <- glue("14genotype_strict_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("14genotype_strict_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
genotype_strict_tables[[name]] <- combine_de_tables(
genotype_de, extra_annot = include_strict_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes_strict)
print(genotype_strict_tables[[name]])
genotype_strict_sig[[name]] <- extract_significant_genes(
genotype_strict_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(genotype_strict_sig[[name]])
num_rows <- nrow(genotype_strict_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(genotype_strict_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows >= 10) {
message("Performing gprofiler/clusterProfiler.")
genotype_strict_gp[[name]] <- all_gprofiler(genotype_strict_sig[[name]], species = "mmusculus")
genotype_strict_cp[[name]] <- all_cprofiler(
genotype_strict_sig[[name]], genotype_strict_tables[[name]],
orgdb = "org.Mm.eg.db", go_level = go_level,
orgdb_from = orgdb_from, max_groupsize = max_groupsize, organism = "mouse")
#if (!is.null(get0("m2_gsc"))) {
# genotype_strict_en[[name]] <- all_enricher(genotype_strict_sig[[name]], gsc = m2_gsc,
# orgdb = "org.Mm.eg.db", from = "ENSEMBL", to = "SYMBOL")
#}
gp_written <- write_all_gp(genotype_strict_gp[[name]], prefix = "15", suffix = "strict")
cp_written <- write_all_cp(genotype_strict_cp[[name]], prefix = "15", suffix = "strict")
#en_written <- write_all_en(genotype_strict_en[[name]])
} else {
warning("There are less than 10 genes up and down in the ", name, " comparison.")
message("There are less than 10 genes up and down in the ", name, " comparison.")
}
}## Examining kh_p08_dlgn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_ko_dlgn_vs_p08_het_dlgn 1 0 1
## edger_sigdown limma_sigup limma_sigdown
## 1 0 1 0
## Only kh_p08_dlgn_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_dlgn 1 0
## There are 1 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p08_dlgn
## comparison.
## There are less than 10 genes up and down in the kh_p08_dlgn comparison.
## Examining kh_p15_dlgn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_ko_dlgn_vs_p15_het_dlgn 0 0 0
## edger_sigdown limma_sigup limma_sigdown
## 1 0 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_dlgn 0 0
## There are 0 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p15_dlgn
## comparison.
## There are less than 10 genes up and down in the kh_p15_dlgn comparison.
## Examining kh_p08_retina
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_ko_retina_vs_p08_het_retina 0 0 0
## edger_sigdown limma_sigup limma_sigdown
## 1 0 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_retina 0 0
## There are 0 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p08_retina
## comparison.
## There are less than 10 genes up and down in the kh_p08_retina comparison.
## Examining kh_p15_retina
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_ko_retina_vs_p15_het_retina 0 0 0
## edger_sigdown limma_sigup limma_sigdown
## 1 0 0 2
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_retina 0 0
## There are 0 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p15_retina
## comparison.
## There are less than 10 genes up and down in the kh_p15_retina comparison.
## Examining kh_p08_scn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## The result table is too small for meaningful comparisons.
## The first table has only: 25.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 p08_ko_scn_vs_p08_het_scn 0 0 0 0
## limma_sigup limma_sigdown
## 1 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_scn 0 0
## There are 0 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p08_scn comparison.
## There are less than 10 genes up and down in the kh_p08_scn comparison.
## Examining kh_p15_scn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## The result table is too small for meaningful comparisons.
## The first table has only: 3.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 p15_ko_scn_vs_p15_het_scn 0 0 0 0
## limma_sigup limma_sigdown
## 1 0 0
## Only has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_scn 0 0
## There are 0 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p15_scn comparison.
## There are less than 10 genes up and down in the kh_p15_scn comparison.
for (k in seq_along(genotype_keepers)) {
name <- names(genotype_keepers)[k]
message("Examining ", name)
keeper <- genotype_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_strict_df <- genotype_inclusions_strict[[include_df_name]]
includes_strict <- genotype_inclusions_strict[[include_name]]
summary(rownames(genotype_sig_full[["deseq"]][["ups"]][[name]]) %in% includes_strict)
num_rows <- nrow(genotype_strict_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(genotype_strict_sig[[name]][["deseq"]][["downs"]][[name]])
nrow(genotype_strict_sig[[name]][["deseq"]][["ups"]][[name]])
nrow(genotype_strict_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
## #1 is up and #2 is down, avoiding typeos here.
num_objects <- length(genotype_strict_cp[[name]])
if (num_objects == 0) {
warning("Something went wrong in all_cprofiler.")
} else {
upp <- which(grepl(x = names(genotype_strict_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(genotype_strict_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- genotype_strict_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- genotype_strict_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- genotype_strict_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_up_filename <- glue("16clusterProfiler_plots/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("16clusterProfiler_plots/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("16clusterProfiler_plots/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("16clusterProfiler_plots/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("16clusterProfiler_plots/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("16clusterProfiler_plots/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- genotype_strict_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- genotype_strict_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- genotype_strict_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("16clusterProfiler_plots/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("16clusterProfiler_plots/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_down_filename <- glue("16clusterProfiler_plots/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("16clusterProfiler_plots/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("16clusterProfiler_plots/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, cateogories = go_categories)
bp_bar_down_filename <- glue("16clusterProfiler_plots/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining kh_p08_dlgn
## There are 1 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_dlgn
## There are 0 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p08_retina
## There are 0 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_retina
## There are 0 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p08_scn
## There are 0 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining kh_p15_scn
## There are 0 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
Given this stricter filter, I think no genes pass in the genome comparisons.
We will now repeat the above tasks seeking location differences instead of genotype; essentially I copy/pasted the above with s/genotype/location/g.
location_tables_full <- combine_de_tables(
location_de, keepers = location_keepers, label_column = label_column,
excel = glue("17full_location_contrasts/location_full_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_het_dlgn_vs_p08_het_retina 2165 1562 2212
## 2 p15_het_dlgn_vs_p15_het_retina 2437 3369 2587
## 3 p08_ko_dlgn_vs_p08_ko_retina 2180 1868 2144
## 4 p15_ko_dlgn_vs_p15_ko_retina 2715 3942 2934
## 5 p08_het_scn_vs_p08_het_retina 2634 1707 2586
## 6 p15_het_scn_vs_p15_het_retina 2841 2395 2716
## 7 p08_ko_scn_vs_p08_ko_retina 2728 1705 2644
## 8 p15_ko_scn_vs_p15_ko_retina 2613 3005 2612
## 9 p08_het_dlgn_vs_p08_het_scn 648 788 751
## 10 p15_het_dlgn_vs_p15_het_scn 1708 2796 1984
## 11 p08_ko_dlgn_vs_p08_ko_scn 1002 1342 1115
## 12 p15_ko_dlgn_vs_p15_ko_scn 1829 2529 2158
## edger_sigdown limma_sigup limma_sigdown
## 1 1632 1886 1660
## 2 3339 2748 2639
## 3 2077 2104 1963
## 4 3847 3236 2962
## 5 1882 2226 1889
## 6 2616 2730 2366
## 7 1922 2386 2160
## 8 3143 2812 2623
## 9 780 647 766
## 10 2623 1979 2034
## 11 1439 1169 1317
## 12 2396 1878 2000
## Plot describing unique/shared genes in a differential expression table.
location_sig_full <- extract_significant_genes(
location_tables_full, according_to = "deseq",
excel = glue("17full_location_contrasts/location_full_sig-v{ver}.xlsx"))
location_sig_full## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_het 2165 1562
## dr_p15_het 2437 3369
## dr_p08_ko 2180 1868
## dr_p15_ko 2715 3942
## sr_p08_het 2634 1707
## sr_p15_het 2841 2395
## sr_p08_ko 2728 1705
## sr_p15_ko 2613 3005
## ds_p08_het 648 788
## ds_p15_het 1708 2796
## ds_p08_ko 1002 1342
## ds_p15_ko 1829 2529
location_full_upset <- upsetr_sig(location_sig_full)
location_full_intersects <- write_upset_groups(
location_full_upset,
excel = "excel/17full_location_contrasts/location_full_gene_groups.xlsx")location_tables <- list()
location_sig <- list()
location_gp <- list()
location_cp <- list()
for (k in seq_along(location_keepers)) {
name <- names(location_keepers)[k]
message("Examining ", name)
keeper <- location_keepers[name]
includes <- location_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- location_inclusions[[include_df_name]]
includes <- location_inclusions[[include_name]]
summary(rownames(location_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("18location_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("18location_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
location_tables[[name]] <- combine_de_tables(
location_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(location_tables[[name]])
location_sig[[name]] <- extract_significant_genes(
location_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(location_sig[[name]])
num_rows <- nrow(location_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(location_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows > 10) {
location_gp[[name]] <- all_gprofiler(location_sig[[name]], species = "mmusculus")
location_cp[[name]] <- all_cprofiler(
location_sig[[name]], location_tables[[name]],
orgdb = "org.Mm.eg.db", go_level = go_level, orgdb_from = orgdb_from,
max_groupsize = max_groupsize, organism = "mouse")
cp_written <- write_all_cp(location_cp[[name]], prefix = "19")
gp_written <- write_all_gp(location_gp[[name]], prefix = "19")
}
}## Examining dr_p08_het
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_het_dlgn_vs_p08_het_retina 259 81 259
## edger_sigdown limma_sigup limma_sigdown
## 1 85 240 81
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_het 259 81
## There are 340 significant up and down genes.
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
Print out all the plots in a separate block.
for (k in seq_along(location_keepers)) {
name <- names(location_keepers)[k]
message("Examining ", name)
keeper <- location_keepers[name]
includes <- location_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- location_inclusions[[include_df_name]]
includes <- location_inclusions[[include_name]]
summary(rownames(location_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
num_rows <- nrow(location_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(location_sig[[name]][["deseq"]][["downs"]][[name]])
nrow(location_sig[[name]][["deseq"]][["ups"]][[name]])
nrow(location_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
num_objects <- length(location_cp[[name]])
if (num_objects == 0) {
warning("Something went wrong in all_cprofiler.")
} else {
upp <- which(grepl(x = names(location_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(location_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- location_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- location_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- location_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_up_filename <- glue("19clusterProfiler_plots/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("19clusterProfiler_plots/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("19clusterProfiler_plots/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("19clusterProfiler_plots/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("19clusterProfiler_plots/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("19clusterProfiler_plots/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- location_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- location_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- location_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("19clusterProfiler_plots/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("19clusterProfiler_plots/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = 12)
cc_tree_down_filename <- glue("19clusterProfiler_plots/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("19clusterProfiler_plots/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("19clusterProfiler_plots/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_down_filename <- glue("19clusterProfiler_plots/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining dr_p08_het
## There are 340 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p15_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p08_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p15_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p08_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p15_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
Colenso sent a specific query of interest, comparing SCN vs. Retinas at p08 in the heterozygotes including a set of genes of particular interest. Perhaps I can use some of these as markers to quality control my work in the future?
Here are the genes:
Opn4, Eomes, Trpc7, Oprm1, Nr4a3, Tbx20, Irx6, AW551984, Pcdh19, Adcyap1, Baiap3, Chl1, Grin3a, Igf1, Gria1, Grin2d, Grin3a, Chrna6, Chrna3, Htr5a, Htr2a, Htr7, Irx4, PlxnC1, Sema6d, Sema4f, Sema4a, Sema6b, Lrrc4b, Lrrc58, Lrrc3b, Wnt4, Wnt9b, Ctxn3, Tenm1, Gna14, Rgs4, Rgs6, Rgs5
table_input <- location_tables[["sr_p08_het"]]
table_name <- "sr_p08_het"
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Opn4", "Eomes", "Trpc7", "Oprm1", "Nr4a3", "Tbx20",
"Irx6", "AW551984", "Pcdh19", "Adcyap1r1", "Baiap3",
"Chl1", "Grin3a", "Igf1", "Gria1", "Grin2d", "Grin3a",
"Chrna6", "Chrna3", "Htr5a", "Htr2a", "Htr7", "Irx4",
"PlxnC1", "Sema6d", "Sema4f", "Sema4a", "Sema6b", "Lrrc4b",
"Lrrc58", "Lrrc3b", "Wnt4", "Wnt9b", "Ctxn3", "Tenm1", "Gna14",
"Rgs4", "Rgs6", "Rgs5", "Pou4f2", "Chrnb3", "Bcan")
sr_p08_het_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
## Warning in pp(file = "20location_ma_volcano/sr_p08_het_volcano.pdf", width = 9,
## : The directory: 20location_ma_volcano does not exist, will attempt to create
## it.
## Error:
## ! object 'sr_p08_het_volcano' not found
## Error:
## ! object 'sr_p08_het_volcano' not found
sr_p08_het_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'sr_p08_het_ma' not found
## Error:
## ! object 'sr_p08_het_ma' not found
table_input <- location_tables[["sr_p08_ko"]]
table_name <- "sr_p08_ko"
table <- table_input[["data"]][[table_name]]
sr_p08_ko_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "20location_ma_volcano/sr_p08_ko_volcano.pdf", width = 9, height = 9)
sr_p08_ko_volcano[["plot"]]## Error:
## ! object 'sr_p08_ko_volcano' not found
## Error:
## ! object 'sr_p08_ko_volcano' not found
sr_p08_ko_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'sr_p08_ko_ma' not found
## Error:
## ! object 'sr_p08_ko_ma' not found
table_input <- location_tables[["sr_p15_het"]]
table_name <- "sr_p15_het"
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Opn4", "Eomes", "Trpc7", "Oprm1", "Nr4a3", "Tbx20",
"Irx6", "AW551984", "Pcdh19", "Adcyap1r1", "Baiap3",
"Chl1", "Grin3a", "Igf1", "Gria1", "Grin2d", "Grin3a",
"Chrna6", "Chrna3", "Htr5a", "Htr2a", "Htr7", "Irx4",
"PlxnC1", "Sema6d", "Sema4f", "Sema4a", "Sema6b", "Lrrc4b",
"Lrrc58", "Lrrc3b", "Wnt4", "Wnt9b", "Ctxn3", "Tenm1", "Gna14",
"Rgs4", "Rgs6", "Rgs5", "Pou4f2", "Chrnb3", "Bcan")
sr_p15_het_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "20location_ma_volcano/sr_p15_het_volcano.pdf", width = 9, height = 9)
sr_p15_het_volcano[["plot"]]## Error:
## ! object 'sr_p15_het_volcano' not found
## Error:
## ! object 'sr_p15_het_volcano' not found
sr_p15_het_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'sr_p15_het_ma' not found
## Error:
## ! object 'sr_p15_het_ma' not found
table_input <- location_tables[["sr_p15_ko"]]
table_name <- "sr_p15_ko"
table <- table_input[["data"]][[table_name]]
sr_p15_ko_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "20location_ma_volcano/sr_p15_ko_volcano.pdf", width = 12, height = 12)
sr_p15_ko_volcano[["plot"]]## Error:
## ! object 'sr_p15_ko_volcano' not found
## Error:
## ! object 'sr_p15_ko_volcano' not found
sr_p15_ko_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 'sr_p15_ko_ma' not found
## Error:
## ! object 'sr_p15_ko_ma' not found
Let us see if any Ensembl gene IDs and/or MGI IDs are shared in the worksheet location_sr_p08_ko_including_wt_0.1_decreased_sig up/down.
test_table_up <- location_sig[["sr_p08_ko"]][["deseq"]][["ups"]][[1]]
test_table_down <- location_sig[["sr_p08_ko"]][["deseq"]][["downs"]][[1]]
query <- list("up" = rownames(test_table_up),
"down" = rownames(test_table_down))
query_upset <- UpSetR::fromList(query)
UpSetR::upset(query_upset)## Error in `start_col:end_col`:
## ! argument of length 0
query <- list("up" = test_table_up[["mgi_symbol"]],
"down" = test_table_down[["mgi_symbol"]])
query_upset <- UpSetR::fromList(query)
UpSetR::upset(query_upset)## Error in `start_col:end_col`:
## ! argument of length 0
location_strict_tables <- list()
location_strict_sig <- list()
location_strict_gp <- list()
location_strict_cp <- list()
for (k in seq_along(location_keepers)) {
name <- names(location_keepers)[k]
message("Examining ", name)
keeper <- location_keepers[name]
includes <- location_inclusions_strict[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- location_inclusions_strict[[include_df_name]]
includes <- location_inclusions_strict[[include_name]]
found_includes <- rownames(location_sig_full[["deseq"]][["ups"]][[name]]) %in% includes
summary(found_includes)
if (sum(found_includes) == 0) {
next
}
include_filename <- glue("21location_strict_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("21location_strict_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
location_strict_tables[[name]] <- combine_de_tables(
location_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(location_strict_tables[[name]])
location_strict_sig[[name]] <- extract_significant_genes(
location_strict_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(location_strict_sig[[name]])
num_rows <- nrow(location_strict_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(location_strict_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows > 10) {
location_strict_gp[[name]] <- all_gprofiler(location_strict_sig[[name]], species = "mmusculus")
location_strict_cp[[name]] <- all_cprofiler(
location_strict_sig[[name]], location_strict_tables[[name]],
orgdb = "org.Mm.eg.db", go_level = go_level, orgdb_from = orgdb_from,
max_groupsize = max_groupsize, organism = "mouse")
cp_written <- write_all_cp(location_strict_cp[[name]], prefix = "22", suffix = "strict")
gp_written <- write_all_gp(location_strict_gp[[name]], prefix = "22", suffix = "strict")
}
}## Examining dr_p08_het
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## The result table is too small for meaningful comparisons.
## The first table has only: 22.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p08_het_dlgn_vs_p08_het_retina 7 0 8
## edger_sigdown limma_sigup limma_sigdown
## 1 0 5 0
## Only dr_p08_het_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_het 7 0
## There are 7 significant up and down genes.
## Examining dr_p15_het
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_het_dlgn_vs_p15_het_retina 65 5 65
## edger_sigdown limma_sigup limma_sigdown
## 1 5 57 4
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p15_het 65 5
## There are 70 significant up and down genes.
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
Print out all the plots in a separate block.
for (k in seq_along(location_keepers)) {
name <- names(location_keepers)[k]
message("Examining ", name)
keeper <- location_keepers[name]
includes <- location_inclusions_strict[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- location_inclusions_strict[[include_df_name]]
includes <- location_inclusions_strict[[include_name]]
summary(rownames(location_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
num_rows <- nrow(location_strict_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(location_strict_sig[[name]][["deseq"]][["downs"]][[name]])
nrow(location_strict_sig[[name]][["deseq"]][["ups"]][[name]])
nrow(location_strict_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
num_objects <- length(location_strict_cp[[name]])
if (num_objects == 0) {
warning("Something went wrong in all_cprofiler.")
} else {
upp <- which(grepl(x = names(location_strict_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(location_strict_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- location_strict_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- location_strict_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- location_strict_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_up_filename <- glue("23clusterProfiler_plots/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("23clusterProfiler_plots/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("23clusterProfiler_plots/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("23clusterProfiler_plots/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("23clusterProfiler_plots/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("23clusterProfiler_plots/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- location_strict_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- location_strict_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- location_strict_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("23clusterProfiler_plots/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("23clusterProfiler_plots/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = 12)
cc_tree_down_filename <- glue("23clusterProfiler_plots/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("23clusterProfiler_plots/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("23clusterProfiler_plots/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_down_filename <- glue("23clusterProfiler_plots/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining dr_p08_het
## There are 7 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p15_het
## There are 70 significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining dr_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p08_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p15_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining sr_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p08_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p15_het
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p08_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
## Examining ds_p15_ko
## There are significant up and down genes.
## Warning: Something went wrong in all_cprofiler.
Colenso sent a specific query of interest, comparing SCN vs. Retinas at p08 in the heterozygotes including a set of genes of particular interest. Perhaps I can use some of these as markers to quality control my work in the future?
Here are the genes:
Opn4, Eomes, Trpc7, Oprm1, Nr4a3, Tbx20, Irx6, AW551984, Pcdh19, Adcyap1, Baiap3, Chl1, Grin3a, Igf1, Gria1, Grin2d, Grin3a, Chrna6, Chrna3, Htr5a, Htr2a, Htr7, Irx4, PlxnC1, Sema6d, Sema4f, Sema4a, Sema6b, Lrrc4b, Lrrc58, Lrrc3b, Wnt4, Wnt9b, Ctxn3, Tenm1, Gna14, Rgs4, Rgs6, Rgs5
table_input <- location_strict_tables[["sr_p08_het"]]
table_name <- "sr_p08_het"
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Opn4", "Eomes", "Trpc7", "Oprm1", "Nr4a3", "Tbx20",
"Irx6", "AW551984", "Pcdh19", "Adcyap1r1", "Baiap3",
"Chl1", "Grin3a", "Igf1", "Gria1", "Grin2d", "Grin3a",
"Chrna6", "Chrna3", "Htr5a", "Htr2a", "Htr7", "Irx4",
"PlxnC1", "Sema6d", "Sema4f", "Sema4a", "Sema6b", "Lrrc4b",
"Lrrc58", "Lrrc3b", "Wnt4", "Wnt9b", "Ctxn3", "Tenm1", "Gna14",
"Rgs4", "Rgs6", "Rgs5", "Pou4f2", "Chrnb3", "Bcan")
sr_p08_het_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
## Warning in pp(file = "24location_ma_volcano_strict/sr_p08_het_volcano.pdf", :
## The directory: 24location_ma_volcano_strict does not exist, will attempt to
## create it.
## Error:
## ! object 'sr_p08_het_volcano' not found
## Error:
## ! object 'sr_p08_het_volcano' not found
sr_p08_het_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["het_retina"]], color_high = colors[["het_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "24location_ma_volcano_strict/sr_p08_het_ma.pdf", width = 9, height = 9)
sr_p08_het_ma[["plot"]]## Error:
## ! object 'sr_p08_het_ma' not found
## Error:
## ! object 'sr_p08_het_ma' not found
table_input <- location_strict_tables[["sr_p08_ko"]]
table_name <- "sr_p08_ko"
table <- table_input[["data"]][[table_name]]
sr_p08_ko_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "24location_ma_volcano_strict/sr_p08_ko_volcano.pdf", width = 9, height = 9)
sr_p08_ko_volcano[["plot"]]## Error:
## ! object 'sr_p08_ko_volcano' not found
## Error:
## ! object 'sr_p08_ko_volcano' not found
sr_p08_ko_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "24location_ma_volcano_strict/sr_p08_ko_ma.pdf", width = 9, height = 9)
sr_p08_ko_ma[["plot"]]## Error:
## ! object 'sr_p08_ko_ma' not found
## Error:
## ! object 'sr_p08_ko_ma' not found
table_input <- location_tables[["sr_p15_ko"]]
table_name <- "sr_p15_ko"
table <- table_input[["data"]][[table_name]]
sr_p15_ko_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "24location_ma_volcano_strict/sr_p15_ko_volcano.pdf", width = 12, height = 12)
sr_p15_ko_volcano[["plot"]]## Error:
## ! object 'sr_p15_ko_volcano' not found
## Error:
## ! object 'sr_p15_ko_volcano' not found
sr_p15_ko_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = colors[["ko_retina"]], color_high = colors[["ko_scn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
pp(file = "24location_ma_volcano_strict/sr_p15_ko_ma.pdf", width = 9, height = 9)
sr_p15_ko_ma[["plot"]]## Error:
## ! object 'sr_p15_ko_ma' not found
## Error:
## ! object 'sr_p15_ko_ma' not found
time_tables_full <- combine_de_tables(
time_de, keepers = time_keepers,
label_column = label_column,
excel = glue("25full_contrasts_time/full_tables-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
time_sig_full <- extract_significant_genes(
time_tables_full, according_to = "deseq",
excel = glue("25full_contrasts_time/full_sig-v{ver}.xlsx"))time_tables <- list()
time_sig <- list()
time_gp <- list()
time_cp <- list()
for (k in seq_along(time_keepers)) {
name <- names(time_keepers)[k]
message("Examining ", name)
keeper <- time_keepers[name]
includes <- time_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- time_inclusions[[include_df_name]]
includes <- time_inclusions[[include_name]]
summary(rownames(time_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("26time_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("26time_contrasts/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
time_tables[[name]] <- combine_de_tables(
time_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(time_tables[[name]])
time_sig[[name]] <- extract_significant_genes(
time_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(time_sig[[name]])
num_rows <- nrow(time_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(time_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows > 10) {
time_gp[[name]] <- all_gprofiler(time_sig[[name]], species = "mmusculus")
gp_written <- write_all_gp(time_gp[[name]])
time_cp[[name]] <- all_cprofiler(
time_sig[[name]], time_tables[[name]], orgdb = "org.Mm.eg.db", organism = "mouse",
orgdb_from = orgdb_from, go_level = go_level, max_groupsize = max_groupsize)
cp_written <- write_all_cp(time_cp[[name]], prefix = "27")
gp_written <- write_all_gp(time_gp[[name]], prefix = "27")
}
}## Examining t_het_dlgn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_het_dlgn_vs_p08_het_dlgn 397 14 431
## edger_sigdown limma_sigup limma_sigdown
## 1 14 359 13
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_het_dlgn 397 14
## There are 411 significant up and down genes.
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
Send the plots separately.
for (k in seq_along(time_keepers)) {
name <- names(time_keepers)[k]
message("Examining ", name)
keeper <- time_keepers[name]
includes <- time_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- time_inclusions[[include_df_name]]
includes <- time_inclusions[[include_name]]
num_rows <- nrow(time_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(time_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
num_objects <- length(time_cp[[name]])
if (num_objects == 0) {
warning("Something failed in all_cprofiler.")
} else {
upp <- which(grepl(x = names(time_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(time_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- time_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- time_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- time_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_up_filename <- glue("28clusterProfiler_plots/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("28clusterProfiler_plots/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("28clusterProfiler_plots/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("28clusterProfiler_plots/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("28clusterProfiler_plots/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("28clusterProfiler_plots/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- time_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- time_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- time_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("28clusterProfiler_plots/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("28clusterProfiler_plots/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_down_filename <- glue("28clusterProfiler_plots/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("28clusterProfiler_plots/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("28clusterProfiler_plots/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_down_filename <- glue("28clusterProfiler_plots/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining t_het_dlgn
## There are 411 significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_dlgn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_het_retina
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_retina
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_het_scn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_scn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
table_name <- "t_het_dlgn"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_het_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)
pp(file = "29time_ma_volcano/t_het_dlgn_volcano.pdf", width = 12, height = 12)## Warning in pp(file = "29time_ma_volcano/t_het_dlgn_volcano.pdf", width = 12, :
## The directory: 29time_ma_volcano does not exist, will attempt to create it.
t_het_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)
pp(file = "29time_ma_volcano/t_het_dlgn_ma.pdf", width = 9, height = 9)
t_het_dlgn_ma[["plot"]]
plotted <- dev.off()
t_het_dlgn_ma[["plot"]]table_name <- "t_ko_dlgn"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_ko_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "29time_ma_volcano/t_ko_dlgn_volcano.pdf", width = 12, height = 12)
t_ko_dlgn_volcano[["plot"]]## Error:
## ! object 't_ko_dlgn_volcano' not found
## Error:
## ! object 't_ko_dlgn_volcano' not found
t_ko_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 't_ko_dlgn_ma' not found
## Error:
## ! object 't_ko_dlgn_ma' not found
table_name <- "t_het_retina"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_het_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_retina"]], color_high = time_colors[["p15_het_retina"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "29time_ma_volcano/t_het_retina_volcano.pdf", width = 12, height = 12)
t_het_retina_volcano[["plot"]]## Error:
## ! object 't_het_retina_volcano' not found
## Error:
## ! object 't_het_retina_volcano' not found
t_het_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_retina"]], color_high = time_colors[["p15_het_retina"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 't_het_retina_ma' not found
## Error:
## ! object 't_het_retina_ma' not found
table_name <- "t_ko_retina"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_ko_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "29time_ma_volcano/t_ko_retina_volcano.pdf", width = 12, height = 12)
t_ko_retina_volcano[["plot"]]## Error:
## ! object 't_ko_retina_volcano' not found
## Error:
## ! object 't_ko_retina_volcano' not found
t_ko_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 't_ko_retina_ma' not found
## Error:
## ! object 't_ko_retina_ma' not found
table_name <- "t_het_scn"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_het_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "29time_ma_volcano/t_het_scn_volcano.pdf", width = 12, height = 12)
t_het_scn_volcano[["plot"]]## Error:
## ! object 't_het_scn_volcano' not found
## Error:
## ! object 't_het_scn_volcano' not found
t_het_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 't_het_scn_ma' not found
## Error:
## ! object 't_het_scn_ma' not found
table_name <- "t_ko_scn"
table_input <- time_tables[[table_name]]
table <- table_input[["data"]][[table_name]]
t_ko_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
label_column = "mgi_symbol", label = interesting_genes, alpha = 1.0,
size = 4, min.segment.length = 0, point.padding = 0.2)## Error in `plot_volcano_condition_de()`:
## ! Column: deseq_logfc is not in the table.
pp(file = "29time_ma_volcano/t_ko_scn_volcano.pdf", width = 12, height = 12)
t_ko_scn_volcano[["plot"]]## Error:
## ! object 't_ko_scn_volcano' not found
## Error:
## ! object 't_ko_scn_volcano' not found
t_ko_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
color_low = time_colors[["p08_het_dlgn"]], color_high = time_colors[["p15_het_dlgn"]],
p_col = "deseq_adjp", label_column = "mgi_symbol", label = interesting_genes, outline = outline)## The column: mgi_symbol is not in the data, using rownames.
## Warning in max(newdf[["avg"]]): no non-missing arguments to max; returning -Inf
## Warning in plot_ma_condition_de(table, table_name, expr_col = "deseq_basemean",
## : NAs introduced by coercion
## Error in `[[<-.data.frame`:
## ! replacement has 1 row, data has 0
## Error:
## ! object 't_ko_scn_ma' not found
## Error:
## ! object 't_ko_scn_ma' not found
time_strict_tables <- list()
time_strict_sig <- list()
time_strict_gp <- list()
time_strict_cp <- list()
time_strict_en <- list()
for (k in seq_along(time_keepers)) {
name <- names(time_keepers)[k]
message("Examining ", name)
keeper <- time_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- time_inclusions_strict[[include_df_name]]
includes <- time_inclusions_strict[[include_name]]
found_includes <- rownames(time_sig_full[["deseq"]][["ups"]][[name]]) %in% includes
summary(found_includes)
if (sum(found_includes) == 0) {
next
}
include_filename <- glue("30time_strict_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("30time_strict_contrasts_excel/{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
time_strict_tables[[name]] <- combine_de_tables(
time_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(time_strict_tables[[name]])
time_strict_sig[[name]] <- extract_significant_genes(
time_strict_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(time_strict_sig[[name]])
num_rows <- nrow(time_strict_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(time_strict_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows >= 10) {
message("Performing gprofiler/clusterProfiler.")
time_strict_gp[[name]] <- all_gprofiler(time_strict_sig[[name]], species = "mmusculus")
time_strict_cp[[name]] <- all_cprofiler(
time_strict_sig[[name]], time_strict_tables[[name]],
orgdb = "org.Mm.eg.db", go_level = go_level, orgdb_from = orgdb_from,
max_groupsize = max_groupsize, organism = "mouse")
#if (!is.null(get0("m2_gsc"))) {
# time_strict_en[[name]] <- all_enricher(time_strict_sig[[name]], gsc = m2_gsc,
# orgdb = "org.Mm.eg.db", from = "ENSEMBL", to = "SYMBOL")
#}
gp_written <- write_all_gp(time_strict_gp[[name]], prefix = "31", suffix = "strict")
cp_written <- write_all_cp(time_strict_cp[[name]], prefix = "31", suffix = "strict")
#en_written <- write_all_en(time_strict_en[[name]])
} else {
warning("There are less than 10 genes up and down in the ", name, " comparison.")
message("There are less than 10 genes up and down in the ", name, " comparison.")
}
}## Examining t_het_dlgn
## Looking for subscript invalid names, start of extract_keepers.
## Looking for subscript invalid names, end of extract_keepers.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup
## 1 p15_het_dlgn_vs_p08_het_dlgn 19 2 20
## edger_sigdown limma_sigup limma_sigdown
## 1 2 12 3
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_het_dlgn 19 2
## There are 21 significant up and down genes.
## Performing gprofiler/clusterProfiler.
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error : unable to find an inherited method for function 'metadata' for signature 'x = "NULL"'
## Error in `simple_cl[["kegg_universe"]]`:
## ! subscript out of bounds
Send the plots separately.
for (k in seq_along(time_keepers)) {
name <- names(time_keepers)[k]
message("Examining ", name)
keeper <- time_keepers[name]
includes <- time_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- time_inclusions[[include_df_name]]
includes <- time_inclusions[[include_name]]
num_rows <- nrow(time_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(time_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
num_objects <- length(time_cp[[name]])
if (num_objects == 0) {
warning("Something failed in all_cprofiler.")
} else {
upp <- which(grepl(x = names(time_cp[[name]]), pattern = "_up$"))
downp <- which(grepl(x = names(time_cp[[name]]), pattern = "_down$"))
if (length(upp) > 0) {
mf_sig <- time_cp[[name]][[upp]][["go_data"]][["MF_enrich"]]
cc_sig <- time_cp[[name]][[upp]][["go_data"]][["CC_enrich"]]
bp_sig <- time_cp[[name]][[upp]][["go_data"]][["BP_enrich"]]
mf_plots_up <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_up_filename <- glue("32cp_trees/{name}_up_mf_sig_tree.pdf")
pp(file = mf_tree_up_filename)
try(print(mf_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_up_filename <- glue("32cp_bar/{name}_up_mf_sig_bar.pdf")
pp(file = mf_bar_up_filename)
try(print(mf_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_up <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_up_filename <- glue("32cp_trees/{name}_up_cc_sig_tree.pdf")
pp(file = cc_tree_up_filename)
try(print(cc_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_up_filename <- glue("32cp_bar/{name}_up_cc_sig_bar.pdf")
pp(file = cc_bar_up_filename)
try(print(cc_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_up_filename <- glue("32cp_trees/{name}_up_bp_sig_tree.pdf")
pp(file = bp_tree_up_filename)
try(print(bp_plots_up[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_up <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_up_filename <- glue("32cp_bar/{name}_up_bp_sig_bar.pdf")
pp(file = bp_bar_up_filename)
try(print(bp_plots_up[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
if (length(downp) > 0) {
mf_sig <- time_cp[[name]][[downp]][["go_data"]][["MF_enrich"]]
cc_sig <- time_cp[[name]][[downp]][["go_data"]][["CC_enrich"]]
bp_sig <- time_cp[[name]][[downp]][["go_data"]][["BP_enrich"]]
mf_plots_down <- plot_enrichresult(mf_sig, showCategory = go_categories)
mf_tree_down_filename <- glue("32cp_trees/{name}_down_mf_sig_tree.pdf")
pp(file = mf_tree_down_filename)
try(print(mf_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
mf_bar_down_filename <- glue("32cp_bar/{name}_down_mf_sig_bar.pdf")
pp(file = mf_bar_down_filename)
try(print(mf_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
cc_plots_down <- plot_enrichresult(cc_sig, showCategory = go_categories)
cc_tree_down_filename <- glue("32cp_trees/{name}_down_cc_sig_tree.pdf")
pp(file = cc_tree_down_filename)
try(print(cc_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
cc_bar_down_filename <- glue("32cp_bar/{name}_down_cc_sig_bar.pdf")
pp(file = cc_bar_down_filename)
try(print(cc_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_tree_down_filename <- glue("32cp_trees/{name}_down_bp_sig_tree.pdf")
pp(file = bp_tree_down_filename)
try(print(bp_plots_down[["tree"]]), silent = TRUE)
plotted <- dev.off()
bp_plots_down <- plot_enrichresult(bp_sig, showCategory = go_categories)
bp_bar_down_filename <- glue("32cp_bar/{name}_down_bp_sig_bar.pdf")
pp(file = bp_bar_down_filename)
try(print(bp_plots_down[["bar"]]), silent = TRUE)
plotted <- dev.off()
}
}
}## Examining t_het_dlgn
## There are 411 significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_dlgn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_het_retina
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_retina
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_het_scn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
## Examining t_ko_scn
## There are significant up and down genes.
## Warning: Something failed in all_cprofiler.
In conversation with Colenso, he spoke about a series of contrasts which would be interesting to attempt in order to query the changes across both locations and genotypes and/or both locations and time, thus:
(p08_het_scn / p08_het_retina) / (p08_ko_scn / p08_ko_retina)
as an example. We can definitely do these, but they do not work for all methods employed (I think they work best with limma and edgeR).
Lets find out!
scn_extra <- glue("\\
p08het = (conditionp08_het_scn - conditionp08_het_retina), \\
p08ko = (conditionp08_ko_scn - conditionp08_ko_retina), \\
p08het_vs_p08ko = (conditionp08_het_scn - conditionp08_het_retina) - (conditionp08_ko_scn - conditionp08_ko_retina), \\
p15het = (conditionp15_het_scn - conditionp15_het_retina), \\
p15ko = (conditionp15_ko_scn - conditionp15_ko_retina), \\
p15het_vs_p15ko = (conditionp15_het_scn - conditionp15_het_retina) - (conditionp15_ko_scn - conditionp15_ko_retina)")
scn_translatome_de_keepers <- list(
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"),
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"))
scn_translatome_keepers <- list(
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"),
"p08_scn_translatome" = c("p08het", "p08ko"),
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"),
"p15_scn_translatome" = c("p15het", "p15ko"))
filt <- normalize(v3_pairwise_input, filter = TRUE)## Removing 10162 low-count genes (15263 remaining).
limma_test <- limma_pairwise(filt,
keepers = scn_translatome_de_keepers,
model_fstring = "~ 0 + condition",
model_svs = FALSE, extra_contrastrs = scn_extra)## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
edger_test <- edger_pairwise(filt,
keepers = scn_translatome_de_keepers,
model_fstring = "~ 0 + condition",
model_svs = FALSE, extra_contrasts = scn_extra)## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
scn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = scn_translatome_de_keepers,
model_svs = FALSE,
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = scn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## scn_translatome_de_keepers, : This will likely fail because of how the keepers
## and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
scn_combined_test <- combine_de_tables(
scn_translatome_de, keepers = scn_translatome_keepers,
excel = glue("33translatome_xlsx/test_scn_translatome_unfiltered_nosva-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08ko or conditionp08het.
## coefficient limma did not find p08ko or p08het.
## coefficient edger did not find conditionp15ko or conditionp15het.
## coefficient limma did not find p15ko or p15het.
## Looking for subscript invalid names, end of extract_keepers.
scn_translatome_de_sva <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = scn_translatome_de_keepers,
model_svs = "svaseq",
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = scn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## scn_translatome_de_keepers, : This will likely fail because of how the keepers
## and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Error in `adjuster_svs()`:
## ! unused arguments (do_basic = FALSE, do_dream = FALSE, do_noiseq = FALSE, do_ebseq = FALSE)
scn_combined_test_sva <- combine_de_tables(
scn_translatome_de_sva, keepers = scn_translatome_keepers,
excel = glue("33translatome_xlsx/test_scn_translatome_unfiltered_sva-v{ver}.xlsx"))## Error:
## ! object 'scn_translatome_de_sva' not found
p08_scn_combined_deseq <- subtract_deseq_results(
first_table = scn_combined_test[["data"]][["p08het"]],
second_table = scn_combined_test[["data"]][["p08ko"]],
first_lfc = "deseq_logfc", second_lfc = "deseq_logfc",
first_p = "deseq_adjp", second_p = "deseq_adjp",
first_name = "het", second_name = "ko",
excel = glue("33translatome_xlsx/translatome_p08_scn_combined_deseq-v{ver}.xlsx"))## Error in `subtract_deseq_results()`:
## ! could not find function "subtract_deseq_results"
p15_scn_combined_deseq <- subtract_deseq_results(
first_table = scn_combined_test[["data"]][["p15het"]],
second_table = scn_combined_test[["data"]][["p15ko"]],
first_lfc = "deseq_logfc", second_lfc = "deseq_logfc",
first_p = "deseq_adjp", second_p = "deseq_adjp",
first_name = "het", second_name = "ko",
excel = glue("34translatome_deseqsub_xlsx/translatome_p15_scn_combined_deseq-v{ver}.xlsx"))## Error in `subtract_deseq_results()`:
## ! could not find function "subtract_deseq_results"
p08_dlgn_extra <- "p08het_vs_p08ko = (conditionp08_het_dlgn - conditionp08_het_retina) - (conditionp08_ko_dlgn - conditionp08_ko_retina)"
p08_dlgn_translatome_de_keepers <- list(
"p08het" = c("p08_het_dlgn", "p08_het_retina"),
"p08ko" = c("p08_ko_dlgn", "p08_ko_retina"))
p08_dlgn_translatome_keepers <- list(
"p08_het_dlgn_vs_retina" = c("p08_het_dlgn", "p08_het_retina"),
"p08_ko_dlgn_vs_retina" = c("p08_ko_dlgn", "p08_ko_retina"),
"p08_dlgn_translatome" = c("p08het", "p08ko"))
p08_dlgn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = p08_dlgn_translatome_de_keepers,
model_svs = FALSE,
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = p08_dlgn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## p08_dlgn_translatome_de_keepers, : This will likely fail because of how the
## keepers and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
p08_dlgn_combined_test <- combine_de_tables(
p08_dlgn_translatome_de, keepers = p08_dlgn_translatome_keepers,
label_column = label_column,
excel = glue("33translatome_xlsx/test_p08_dlgn_translatome_unfiltered_nosva-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08ko or conditionp08het.
## coefficient limma did not find p08ko or p08het.
## Looking for subscript invalid names, end of extract_keepers.
p08_dlgn_translatome_de_sva <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = p08_dlgn_translatome_de_keepers,
model_svs = "svaseq",
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = p08_dlgn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## p08_dlgn_translatome_de_keepers, : This will likely fail because of how the
## keepers and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Error in `adjuster_svs()`:
## ! unused arguments (do_basic = FALSE, do_dream = FALSE, do_noiseq = FALSE, do_ebseq = FALSE)
p08_dlgn_combined_test_sva <- combine_de_tables(
p08_dlgn_translatome_de_sva, keepers = p08_dlgn_translatome_keepers,
label_column = label_column,
excel = glue("33translatome_xlsx/test_p08_dlgn_translatome_unfiltered_sva-v{ver}.xlsx"))## Error:
## ! object 'p08_dlgn_translatome_de_sva' not found
time_scn_extra <- glue("\\
p15het = (conditionp15_het_scn - conditionp15_het_retina), \\
p08het = (conditionp08_het_scn - conditionp08_het_retina), \\
p15het_vs_p08het = (conditionp15_het_scn - conditionp15_het_retina) - (conditionp08_het_scn - conditionp08_het_retina),
p15ko = (conditionp15_ko_scn - conditionp15_ko_retina), \\
p08ko = (conditionp08_ko_scn - conditionp08_ko_retina), \\
p15ko_vs_p08ko = (conditionp15_ko_scn - conditionp15_ko_retina) - (conditionp08_ko_scn - conditionp08_ko_retina)")
time_scn_translatome_de_keepers <- list(
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"))
time_scn_translatome_keepers <- list(
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"),
"p15_het_sc_vs_retina" = c("p15_het_scn", "p15_het_retina"),
"p08_het_sc_vs_retina" = c("p08_het_scn", "p08_het_retina"),
"scn_het_translatome" = c("p15het", "p08het"),
"scn_ko_translatome" = c("p15ko", "p08ko"))
time_scn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = time_scn_translatome_de_keepers,
model_svs = FALSE,
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = time_scn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## time_scn_translatome_de_keepers, : This will likely fail because of how the
## keepers and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## conditions
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
time_scn_translatome_test <- combine_de_tables(
time_scn_translatome_de,
keepers = time_scn_translatome_keepers,
label_column = label_column,
excel = glue("33translatome_xlsx/test_time_scn_translatome_unfiltered_nosva-v{ver}.xlsx"))## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08het or conditionp15het.
## coefficient limma did not find p08het or p15het.
## coefficient edger did not find conditionp08ko or conditionp15ko.
## coefficient limma did not find p08ko or p15ko.
## Looking for subscript invalid names, end of extract_keepers.
time_scn_translatome_de_sva <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = time_scn_translatome_de_keepers,
model_svs = "svaseq",
model_fstring = "~ 0 + condition",
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = time_scn_extra)## Warning in all_pairwise(v3_pairwise_input, filter = TRUE, keepers =
## time_scn_translatome_de_keepers, : This will likely fail because of how the
## keepers and extra contrasts are evaluated.
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina
## 3 3 3 3 3
## p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn
## 4 3 3 3 3
## p15_wt_dlgn p15_wt_retina p15_wt_scn
## 5 5 2
## Removing 10162 low-count genes (15263 remaining).
## Error in `adjuster_svs()`:
## ! unused arguments (do_basic = FALSE, do_dream = FALSE, do_noiseq = FALSE, do_ebseq = FALSE)
time_scn_translatome_test_sva <- combine_de_tables(
time_scn_translatome_de_sva,
keepers = time_scn_translatome_keepers,
label_column = label_column,
excel = glue("33translatome_xlsx/test_time_scn_translatome_unfiltered_sva-v{ver}.xlsx"))## Error:
## ! object 'time_scn_translatome_de_sva' not found
Next step: Perform the retina filter; need to think about the proper union/intersection of the retina/x expression values
In the previous block, we are making 2 global comparisons, here is one of them:
(p15hetscn/p15hetret)/(p08hetscn/p08hetret)
I therefore want to extract the most logical set of genes higher in some/all of these conditions with respect to the corresponding wt conditions. Previously, in section ‘Extract genes included for each set of contrasts’, I attempted to perform this operation for 2 specific wt conditions. When this was performed, it took the unique(union) of the two sets. Thus it stands to reason that I want to take the unique(union) of all 4 in this instance? e.g.:
(p15hetscn > p15wtscn) | (p15hetret > p15wtret) | (p08hetscn > p08wtscn) | (p08hetret > p08wtret)
I kind of think it should be:
((p15hetscn > p15wtscn) | (p15hetret > p15wtret)) & ((p08hetscn > p08wtscn) | (p08hetret > p08wtret))
gross, perhaps I should just do this manually, given that there are only a few putative translatomes to query?
In a fashion similar to how Hector handled the effect of phagocytosis with Laura and Najib a long time ago, I propose to do a simple subtraction of the results of our two contrasts which comprise the translatome query (I was thinking about this last week, thus the inclusion of them in the de tables above). Similarly to the phagocytosis effect, I will simply take the worst posible adjusted p-value. I will repeat this with limma/EdgeR and see how similar the final results are to what those methods provide in the (a/b)/(c/d) comparisons. I am reasonably certain that DESeq2’s results() function has the ability to perform these odd contrasts, but I have never figured out how; perhaps I will use this as a chance to revisit that…
Let us test this idea with the p08 dlgn query, which seeks to compare:
(p08_het_dlgn / p08_het_retina) / (p08_ko_dlgn / p08_ko_retina)
These are maintained in the de_table with the names ‘p08_het_dlgn_vs_retina’ and ‘p08_ko_dlgn_vs_retina’
p08_dlgn_combined_deseq <- subtract_deseq_results(
first_table = p08_dlgn_combined_test[["data"]][["p08_het_dlgn_vs_retina"]],
second_table = p08_dlgn_combined_test[["data"]][["p08_ko_dlgn_vs_retina"]],
first_lfc = "deseq_logfc", second_lfc = "deseq_logfc",
first_p = "deseq_adjp", second_p = "deseq_adjp",
first_name = "het", second_name = "ko",
excel = glue("34translatome_deseqsub_xlsx/translatome_p08_dlgn_combined_deseq-v{ver}.xlsx"))## Error in `subtract_deseq_results()`:
## ! could not find function "subtract_deseq_results"
See how similar these results are to those obtained from limma/edger.
test_columns <- c("edger_logfc", "limma_logfc", "edger_adjp", "limma_adjp")
test_df <- p08_dlgn_combined_test[["data"]][["p08_dlgn_translatome"]][, test_columns]
test_df <- merge(test_df, p08_dlgn_combined_deseq, by = "row.names")## Error in `h()`:
## ! error in evaluating the argument 'y' in selecting a method for function 'merge': object 'p08_dlgn_combined_deseq' not found
rownames(test_df) <- test_df[["Row.names"]]
test_df[["Row.names"]] <- NULL
cor.test(test_df[["limma_logfc"]], test_df[["het_vs_ko_logfc"]])## Error in `cor.test.default()`:
## ! 'y' must be a numeric vector
## Error in `cor.test.default()`:
## ! 'y' must be a numeric vector
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': subscript contains invalid names
## NULL
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': subscript contains invalid names
## NULL
## So, using the maximum p-value is a complete failure; but the extreme similarities
## between this and edgeR suggest to me that it is likely possible to use the results
## from edgeR without concern (or limma for that matter, it was also extremely similar)
## Or I can spend a little time and collect the numbers on each side of the division
## and calculate a t statistic myself.I have on hand
I have gene sets up above which define the genes suitable for each of these pieces. There are only 5 comparisons, let us step through them.
The data for this contrast resides in scn_combined_test\(data\)p08_scn_translatome or the same slot of scn_combined_test_sva
Thus, the inclusion_sig portions to extract are found in: inclusion_sig[[“deseq”]][[“ups”]], and are named exactly as written above!
p08_het_vs_ko_translatome_unfilt <- scn_combined_test[["data"]][["p08_scn_translatome"]]
num_union <- unique(c(rownames(inclusion_sig[["deseq"]][["ups"]][["p08_het_scn"]]),
rownames(inclusion_sig[["deseq"]][["ups"]][["p08_het_retina"]])))
length(num_union)## [1] 1188
den_union <- unique(c(rownames(inclusion_sig[["deseq"]][["ups"]][["p08_ko_scn"]]),
rownames(inclusion_sig[["deseq"]][["ups"]][["p08_ko_retina"]])))
length(den_union)## [1] 1624
## [1] 2005
both_inter_idx <- num_union %in% den_union
both_inter <- num_union[both_inter_idx]
length(both_inter)## [1] 807
keeper <- list("p08_scn_translatome" = c("p08het", "p08ko"))
p08_scn_translatome_union_filtered <- combine_de_tables(
scn_translatome_de, keepers = keeper,
label_column = label_column,
excel = glue("35translatome_union/p08_scn_translatome_union_filtered_nosva-v{ver}.xlsx"),
wanted_genes = both_union)## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08ko or conditionp08het.
## coefficient limma did not find p08ko or p08het.
## Looking for subscript invalid names, end of extract_keepers.
p08_scn_translatome_inter_filtered <- combine_de_tables(
scn_translatome_de, keepers = keeper,
label_column = label_column,
excel = glue("35translatome_union/p08_scn_translatome_intersect_filtered_nosva-v{ver}.xlsx"),
wanted_genes = both_inter)## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08ko or conditionp08het.
## coefficient limma did not find p08ko or p08het.
## Looking for subscript invalid names, end of extract_keepers.
p08_scn_translatome_union_filtered_sva <- combine_de_tables(
scn_translatome_de_sva, keepers = keeper,
label_column = label_column,
excel = glue("35translatome_union/p08_scn_translatome_union_filtered_sva-v{ver}.xlsx"),
wanted_genes = both_union)## Error:
## ! object 'scn_translatome_de_sva' not found
p08_scn_translatome_union_filtered <- combine_de_tables(
scn_translatome_de, keepers = keeper,
label_column = label_column,
excel = glue("35translatome_union/p08_scn_translatome_intersect_filtered_sva-v{ver}.xlsx"),
wanted_genes = both_inter)## Looking for subscript invalid names, start of extract_keepers.
## coefficient edger did not find conditionp08ko or conditionp08het.
## coefficient limma did not find p08ko or p08het.
## Looking for subscript invalid names, end of extract_keepers.
Here is a snippet from Rashmi which expresses nicely the DE-result comparisons she is most interested:
Since, I want to know the number of DEG expressed in Retina, SCN and dLGN with respect to genotype, Location and time. I prepared the venn diagram for these comparison:
Since I was interested in understanding the change in local translatome according to Location for different developmental time points for Het and KO. Hence, I tried to generate a venn diagram for Location (Ret and SCN) at developmental time points P8 and P15 for genotype het and KO. So the venn diagram / upset plot will be for location where some genes will be shared/unique for P8_Ret_het, P8_SCN_Het, P15_Ret_HET, P15_SCN_HET. We can prepare an upset plot for P8_Ret_KO, P8_SCN_KO, P15_Ret_KO and P15_SCN_KO also. Or can generate an upset plot by combining both P8_Ret_het, P8_SCN_Het, P15_Ret_HET and P15_SCN_HET and P8_Ret_KO, P8_SCN_KO, P15_Ret_KO and P15_SCN_KO.
Ok, let us see if I can implement this, starting with the genotype query
## The appropriate data structure is 'genotype_tables',
## and the tables of interest are:
table_names <- c("kh_p08_retina", "kh_p15_retina", "kh_p08_scn",
"kh_p15_scn", "kh_p08_dlgn", "kh_p15_dlgn")
table_names %in% names(genotype_sig)## [1] FALSE FALSE FALSE FALSE TRUE FALSE
newsig <- genotype_sig[[1]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- genotype_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- genotype_sig[[name]][["deseq"]][["downs"]][[name]]
}
genotype_upsetr <- upsetr_sig(newsig)## Error in `1:ncol(data)`:
## ! argument of length 0
genotype_upset_written <- write_upset_groups(genotype_upsetr, excel = "36upset_genotype/genotype_upset_groups.xlsx")## Error:
## ! object 'genotype_upsetr' not found
## Error:
## ! object 'genotype_upsetr' not found
## Warning in pp(file = "36upset_genotype/test_genotype_upset.pdf"): The
## directory: 36upset_genotype does not exist, will attempt to create it.
## Error:
## ! object 'genotype_upsetr' not found
Now let us try the location-specific comparisons
## The appropriate data structure is 'genotype_tables',
## and the tables of interest are:
table_names <- c("sr_p08_het", "sr_p08_ko")
table_names %in% names(location_sig)## [1] FALSE FALSE
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `xtfrm.data.frame()`:
## ! cannot xtfrm data frames
location_upset_written <- write_upset_groups(location_upsetr, excel = "36upset_genotype/sr_p08_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
pp(file = "36upset_genotype/test_location_sr_p08_hetko_upset.pdf")
print(location_upsetr[["all_plot"]])## Error:
## ! object 'location_upsetr' not found
I am reasonably certain that Rashmi would like a table of the genes shared among increased scn ko and het in the above plot along with the increased retina (e.g. the 269 and 103 gene sets).
## [1] FALSE FALSE
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `xtfrm.data.frame()`:
## ! cannot xtfrm data frames
location_upset_written <- write_upset_groups(location_upsetr, excel = "36upset_genotype/sr_p15_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'scn_retina_p15_upset_result' not found
pp(file = "36upset_genotype/test_location_sr_p15_hetko_upset.pdf")
print(location_upsetr[["all_plot"]])## Error:
## ! object 'location_upsetr' not found
## The appropriate data structure is 'genotype_tables',
## and the tables of interest are:
table_names <- c("dr_p08_het", "dr_p08_ko")
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `1:ncol(data)`:
## ! argument of length 0
location_upset_written <- write_upset_groups(location_upsetr, excel = "3upset_genotype/dr_p08_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
pp(file = "36upset_genotype/test_location_dr_p08_hetko_upset.pdf")
print(location_upsetr[["all_plot"]])## Error:
## ! object 'location_upsetr' not found
## The appropriate data structure is 'genotype_tables',
## and the tables of interest are:
table_names <- c("dr_p15_het", "dr_p15_ko")
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `xtfrm.data.frame()`:
## ! cannot xtfrm data frames
location_upset_written <- write_upset_groups(location_upsetr, excel = "37upset_location/dr_p15_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
## Warning in pp(file = "37upset_location/test_location_dr_p15_hetko_upset.pdf"):
## The directory: 37upset_location does not exist, will attempt to create it.
## Error:
## ! object 'location_upsetr' not found
table_names <- c("ds_p08_het", "ds_p08_ko")
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `xtfrm.data.frame()`:
## ! cannot xtfrm data frames
location_upset_written <- write_upset_groups(location_upsetr, excel = "37upset_location/ds_p08_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
pp(file = "37upset_location/test_location_ds_p08_hetko_upset.pdf")
print(location_upsetr[["all_plot"]])## Error:
## ! object 'location_upsetr' not found
table_names <- c("ds_p15_het", "ds_p15_ko")
location_upset_input <- list()
first_table <- table_names[1]
newsig <- location_sig[[first_table]]
for (sig in 2:length(table_names)) {
name <- table_names[sig]
newsig[["deseq"]][["ups"]][[name]] <- location_sig[[name]][["deseq"]][["ups"]][[name]]
newsig[["deseq"]][["downs"]][[name]] <- location_sig[[name]][["deseq"]][["downs"]][[name]]
}
location_upsetr <- upsetr_sig(newsig)## Error in `xtfrm.data.frame()`:
## ! cannot xtfrm data frames
location_upset_written <- write_upset_groups(location_upsetr, excel = "37upset_location/ds_p15_hetko_upset_groups.xlsx")## Error:
## ! object 'location_upsetr' not found
## Error:
## ! object 'location_upsetr' not found
pp(file = "37upset_location/test_location_ds_p15_hetko_upset.pdf")
print(location_upsetr[["all_plot"]])## Error:
## ! object 'location_upsetr' not found
msigdb <- "reference/msigdb_v2024.1.Mm.db"
if (file.exists(msigdb)) {
v3_h_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "mh",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_h_gsva
v3_h_gsva_sig <- get_sig_gsva_categories(
v3_h_gsva, excel = "38msigdb/gsva_sig_hallmark_categories.xlsx")
v3_h_gsva_sig
v3_m1_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "m1",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_m1_gsva
v3_m1_gsva_sig <- get_sig_gsva_categories(
v3_m1_gsva, excel = "38msigdb/gsva_sig_positional_categories.xlsx")
v3_m1_gsva_sig
v3_m2_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "m2",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_m2_gsva
v3_m2_gsva_sig <- get_sig_gsva_categories(
v3_m2_gsva, excel = "38msigdb/gsva_sig_curated_categories.xlsx")
v3_m2_gsva_sig
v3_m3_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "m3",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_m3_gsva
v3_m3_gsva_sig <- get_sig_gsva_categories(
v3_m3_gsva, excel = "38msigdb/gsva_sig_regulatory_categories.xlsx")
v3_m3_gsva_sig
v3_m5_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "m5",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_m5_gsva
v3_m5_gsva_sig <- get_sig_gsva_categories(
v3_m5_gsva, excel = "38msigdb/gsva_sig_ontology_categories.xlsx")
v3_m5_gsva_sig
v3_m8_gsva <- simple_gsva(v3_pairwise_input, orgdb = "org.Mm.eg.db", signature_category = "m8",
signatures = msigdb, id_source = "fdata",
required_id = "mgi_symbol")
v3_m8_gsva
v3_m8_gsva_sig <- get_sig_gsva_categories(
v3_m8_gsva, excel = "38msigdb/gsva_sig_celltype_categories.xlsx")
v3_m8_gsva_sig
}## Error:
## ! no such table: gene_set
Up above I created a fairly large set of enrichment/GSEA analyses. Let us pull some of the most interesting results here and look at them.
Here are the specific queries from Rashmi:
Let us take a moment and see for which contrasts I acquired results:
I need to make a little summary for clusterprofiler too so that I can easily see how many hits there are for each contrast.
## Length Class Mode
## kh_p08_dlgn_up 20 gprofiler_result list
## kh_p15_dlgn_up 27 gprofiler_result list
## kh_p08_scn_up 20 gprofiler_result list
## kh_p08_scn_down 24 gprofiler_result list
## kh_p15_scn_down 25 gprofiler_result list
## [1] "kh_p08_dlgn_up"
## BP CC CORUM HP KEGG MF REAC TF WP
## 0 0 0 0 0 0 0 0 0
## [1] "kh_p15_dlgn_up"
## BP CC CORUM HP KEGG MF REAC TF WP
## 63 32 0 118 1 21 6 3 0
## [1] "kh_p08_scn_up"
## BP CC CORUM HP KEGG MF REAC TF WP
## 0 0 0 0 0 0 0 0 0
## [1] "kh_p08_scn_down"
## BP CC CORUM HP KEGG MF REAC TF WP
## 58 21 0 0 0 3 0 113 0
## [1] "kh_p15_scn_down"
## BP CC CORUM HP KEGG MF REAC TF WP
## 49 9 0 0 1 4 0 1 0
## Error:
## ! object 'genotype_full_cp' not found
## Error:
## ! object 'genotype_full_cp' not found
This contrast, even before filtering away the high-wt genes, only has 8 genes in the set of up and down genes combined. As a result, my function which performs gProfiler/clusterProfiler skips it, and also skips the p15 het/ko for retina samples.
This has a bunch more genes: 51 up and 128 down. Unfortunately, gProfiler sees no significant over-representation in the up category of genes. The down category has
The up/down sets from clusterProfiler have enrich_go, gse_go, and go_data to look at.
## BP CC CORUM HP KEGG MF REAC TF WP
## 0 0 0 0 0 0 0 0 0
## BP CC CORUM HP KEGG MF REAC TF WP
## 58 21 0 0 0 3 0 113 0
## Warning in (function (model, data, ...) : Arguments in `...` must be used.
## x Problematic argument:
## * by = "Count"
## i Did you misspell an argument name?
Perhaps I should just ask the question: for which categories did I get results back?
## Length Class Mode
## kh_p08_dlgn_up 20 gprofiler_result list
## kh_p15_dlgn_up 27 gprofiler_result list
## kh_p08_scn_up 20 gprofiler_result list
## kh_p08_scn_down 24 gprofiler_result list
## kh_p15_scn_down 25 gprofiler_result list
kh_p08_dlgn_up: No significant gProfiler results. kh_p15_dlgn_up: Significant BP, HP, KEGG, MF, REAC, TF kh_p08_scn_up: No significant gProfiler results. kh_p08_scn_down: Significant BP, MiRNA, MF, TF kh_p15_scn_down: Significant BP, MF
## Warning in (function (model, data, ...) : Arguments in `...` must be used.
## x Problematic argument:
## * by = "Count"
## i Did you misspell an argument name?
plots <- plot_enrichresult(location_gp[["sr_p08_ko"]][["sr_p08_ko_up"]][["BP_enrich"]])
plots[["dot"]]## NULL
plots <- plot_enrichresult(location_gp[["sr_p08_ko"]][["sr_p08_ko_down"]][["BP_enrich"]])
plots[["dot"]]## NULL
Enriched groups: BP, KEGG, MF, TF, CC
## Length Class Mode
## 0 NULL NULL
plots <- plot_enrichresult(location_gp[["sr_p08_het"]][["sr_p08_het_up"]][["BP_enrich"]])
plots[["dot"]]## NULL
plots <- plot_enrichresult(location_gp[["sr_p08_het"]][["sr_p08_het_up"]][["CC_enrich"]])
plots[["dot"]]## NULL
plots <- plot_enrichresult(location_gp[["sr_p08_het"]][["sr_p08_het_down"]][["BP_enrich"]])
plots[["dot"]]## NULL
## Error in `if (nrow(gse) < topn) ...`:
## ! argument is of length zero
Ups: significant results for BP, MF, TF Downs: BP, MF, REAC, TF, WP
plots <- plot_enrichresult(time_gp[["t_het_retina"]][["t_het_retina_up"]][["BP_enrich"]])
plots[["dot"]]## NULL
plots <- plot_enrichresult(time_gp[["t_het_retina"]][["t_het_retina_down"]][["BP_enrich"]])
plots[["dot"]]## NULL
Up: BP, MiRNA, MF Down: BP, MF, REAC, TF
plots <- plot_enrichresult(time_gp[["t_ko_retina"]][["t_ko_retina_up"]][["BP_enrich"]])
plots[["dot"]]## NULL
plots <- plot_enrichresult(time_gp[["t_ko_retina"]][["t_ko_retina_down"]][["BP_enrich"]])
plots[["dot"]]## NULL
Neither of the SCN gProfiler queries provided any results.
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
message(paste0("Saving to ", savefile))
tmp <- sm(saveme(filename = savefile))