This document will visualize the TMRC2 samples before completing the various differential expression and variant analyses in the hopes of getting an understanding of how the various samples relate to each other.
Start off with the library sizes of the original dataset. The main thing to note is that we have quite a large variance in coverage. A few of these samples are highly likely to be removed shortly (looking at you, TMRC20001 and TMRC20095)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_libsize': object 'lp_expt' not found
## Error: object 'libsizes' not found
## Error: object 'libsizes' not found
Library sizes of the protein coding gene counts observed per sample. The samples were mapped with the EuPathDB revision 36 of the Leishmania (Viannia) panamensis strain MHOM/COL/81L13 genome; the alignments were sorted, indexed, and counted via htseq using the gene features, and non-protein coding features were excluded. The per-sample sums of the remaining matrix were plotted to check that the relative sample coverage is sufficient and not too divergent across samples. Bars are colored according to strain/zymodeme annotation: red: zymodeme 2.3; blue: zymodeme 2.2; Leishmania braziliensis-like strains b2904, z1.0, and z1.5: purple; zymodemes which are most similar to 2.3, comprising z2.4 is light brown; zymodemes most similar to 2.2, comprising z3.0, z2.0, z2.1, and z3.2 are light gray, dark gray, dark brown, and gray respectively.
This plot is usually our primary arbiter for sample removing based on coverage. We pick a semi-arbitrary cutoff based on both coverage and genes observed. In this instance 8,600 genes seems likely?
The cutoff argument prints out samples with gene coverage < that proportion. I think we already dropped in the sample sheet the most problematic samples, so it may not actually print anything.
## I think samples 7,10 should be removed at minimum, probably also 9,11
nonzero <- plot_nonzero(lp_expt, cutoff = 0.7)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_nonzero': object 'lp_expt' not found
## Error: object 'nonzero' not found
## Error: object 'nonzero' not found
Differences in relative gene content with respect to sequencing coverage. The per-sample number of observed genes was plotted with respect to the relative CPM coverage in order to check that the samples are sufficiently and similarly diverse. Many samples were observed near or at the putative asymptote of likely gene content; no samples were observed with fewer than 65% of the Leishmania panamensis genes included. Note that the range of genes observed is quite small, 8500 <= x < 8700 genes, however this was plotted after already excluding samples with fewer than 8500 genes observed (of which there were 2) and any samples with fewer than 5 million protein coding mapped reads (there were 2 samples that had more than 8500 genes observed in less than 5 million reads).
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_boxplot': object 'lp_expt' not found
## Error: object 'lp_box' not found
## Error: object 'lp_box' not found
The distribution of observed counts / gene for all samples was plotted as a boxplot on the log2 (it looks like it is log10, but I checked) scale. In contrast to host transcriptome distribution, the parasite distribution of reads/gene is log-normal.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_libsize': object 'lp_expt' not found
## Error: object 'filter_plot' not found
## Error: object 'filter_plot' not found
The numbers of genes removed by low-count filtering is drastically lower in parasite samples than human. Thus, even though the range of coverage for the parasite samples is from near 0 to ~ 150 CPM, the number of genes removed by the default low-count filter ranges only from 40 to 129, and the number of reads associated with them ranges only from 100 to 3168.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_expt' not found
Najib’s favorite plots are of course the PCA/TNSE. These are nice to look at in order to get a sense of the relationships between samples. They also provide a good opportunity to see what happens when one applies different normalizations, surrogate analyses, filters, etc. In addition, one may set different experimental factors as the primary ‘condition’ (usually the color of plots) and surrogate ‘batches’.
Column ‘Q’ in the sample sheet, make a categorical version of it with these parameters:
strain_norm <- normalize_expt(lp_strain, norm = "quant", transform = "log2",
convert = "cpm", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_strain' not found
zymo_pca <- plot_pca(strain_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_norm' not found
## Error: object 'zymo_pca' not found
## Error: object 'zymo_pca' not found
closed <- dev.off()
lp_strain_known <- subset_expt(lp_strain, subset = "clinicalcategorical!='unknown'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_strain' not found
strain_known_norm <- normalize_expt(lp_strain_known, norm = "quant", transform = "log2",
convert = "cpm", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_strain_known' not found
zymo_known_pca <- plot_pca(strain_known_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_known_norm' not found
## Error: object 'zymo_known_pca' not found
dev <- pp(file = "figures/promastigote_zymocol_sensshape_z21_to_z24_only_known_clinical.pdf")
zymo_known_pca$plot## Error: object 'zymo_known_pca' not found
only_three_types <- subset_expt(lp_strain, subset = "condition=='z2.1'|condition=='z2.3'|condition=='z2.2'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_strain' not found
only_three_norm <- normalize_expt(only_three_types, norm = "quant", transform = "log2",
convert = "cpm", batch = FALSE, filter = TRUE) %>%
set_expt_batches(fact = "phase")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'set_expt_batches': error in evaluating the argument 'input' in selecting a method for function 'state': object 'only_three_types' not found
onlythree_pca <- plot_pca(only_three_norm, plot_title = "PCA of z2.1, z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'only_three_norm' not found
## Error: object 'onlythree_pca' not found
## png
## 2
## Error: object 'onlythree_pca' not found
I added the result from my kmer classifier to the sample sheet, let us see how that looks.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_strain' not found
strain_norm_knn <- normalize_expt(lp_strain_knn, norm = "quant", transform = "log2",
convert = "cpm", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_strain_knn' not found
zymo_pca_knn <- plot_pca(strain_norm_knn, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_norm_knn' not found
## Error: object 'zymo_pca_knn' not found
## Error: object 'zymo_pca_knn' not found
## Error: object 'zymo_pca_knn' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'set_expt_batches': object 'strain_norm' not found
zymo_pcav2 <- plot_pca(strain_nobatch, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_nobatch' not found
## Error: object 'zymo_pcav2' not found
## Error: object 'zymo_pcav2' not found
strain_nb <- normalize_expt(lp_strain, convert = "cpm", transform = "log2",
filter = TRUE, batch = "svaseq")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_strain' not found
strain_nb_pca <- plot_pca(strain_nb, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_nb' not found
## Error: object 'strain_nb_pca' not found
## Error: object 'strain_nb_pca' not found
Add explicit labels for a few reference strains:
** NOTE ** These samples were all removed from examination in the sample_sheet in 202404 and so will not appear in this plot. Thus I am turning off the following block.
samples_to_label <- tolower(c("TMRC20023", "TMRC20006", "TMRC20029", "TMRC20007", "TMRC20034",
"TMRC20008", "TMRC20027", "TMRC20028", "TMRC20032", "TMRC20040"))
label_entries <- zymo_pca$table[samples_to_label, ]
zymo_pca$plot +
geom_text(mapping = aes_string("x" = "PC1", "y" = "PC2", label = "sampleid"),
data = label_entries)Some likely text for a figure legend might include something like the following (paraphrased from Najib’s 2016 dual transcriptome profiling paper (10.1128/mBio.00027-16)):
Expression profiles of the promastigote samples across multiple strains. Each glyph represents one sample, colors delineate the various strains and fall into two primary clades. Red samples are zymodeme 2.3, blue samples are zymodeme 2.2. The difference between these two primary groups make up approximately 17% of the variance in the PCA. Purple samples are Leishmania braziliensis or zymodeme 1.0/1.5 samples, orange are z2.4, browns and greys are z2.1, z2.0, z3.0, and z3.2 respectively. This analysis was performed following a low-count filter, cpm conversion, quantile normalization, and a log2 transformation. No batch factor was used, nor was a surrogate variable estimation performed.
Some interpretation for this figure might include:
When PCA was performed on the promastigote samples, the dominant (but still relatively small amount of variance) component observed coincided with the two primary strain groups, zymodeme 2.2 and 2.3. With the exception of some Leishmania braziliensis samples, all promatigote samples assayed fell into one of these two categories.
When surrogate varialbe estimation was performed on the entire set of samples, it increased the apparent strain-dependent variance, but had some potentially problematic effects for a couple of samples (one z2.3 sample now lies with the other z2.2 samples); it is assumed that this is because sva attempted to estimate surrogate values for the less-represented strains with some unintended consequences for sample TMRC20095 (which, along with TMRC20008 are the two least covered samples by a significant margin); this hypothesis may be tested by excluding the braziliensis and non-z2.2/2.3 samples and repeating (when this is performed later in the document, the difference between the two primary clades increases to 49.33% of the variance and there are no odd samples).
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_norm' not found
## Error: object 'zymo_tsne' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'strain_nb' not found
## Error: object 'strain_nb_tsne' not found
corheat <- plot_corheat(strain_norm, plot_title = "Correlation heatmap of parasite
expression values
")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'strain_norm' not found
## Error: object 'corheat' not found
disheat <- plot_disheat(strain_norm, plot_title = "Distance heatmap of parasite
expression values
")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'strain_norm' not found
## Error: object 'disheat' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_sm': object 'strain_norm' not found
Potential start for a figure legend:
Global relationships among the promastigote transcriptional profiles. Pairwise pearson correlations and Euclidean distances were calculated using the normalized expression matrices. Colors along the top row delineate the experimental conditions (same colors as the PCA) Samples were clustered by nearest neighbor clustering and each colored tile describes one correlation value between two samples (red to white delineates pearson correlation values of the 8,710 normalized gene values between two samples ranging from <= 0.7 to >= 1.0) or the euclidean distance between two samples (dark blue to white delineates identical to a normalized euclidean distance of >= 110).
Some interpretation for this figure might include:
When the global relationships among the samples were distilled down to individual euclidean distances or pearson correlation coefficients between pairs of samples, the primary clustering among samples observed was according to strain. The primary significant outlier sample (TMRC20095) is explicitly due to low coverage. The other outlier strains are either braziliensis (purple) or a series of strains which, when viewed in IGV, appear to have genetic variants which bridge the differences between the two primary zymodemes, particularly on the known aneuploid chromosomes.
lp_two_strains_norm <- sm(normalize_expt(lp_zymo, norm = "quant", transform = "log2",
convert = "cpm", batch = FALSE, filter = TRUE))## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_zymo' not found
onlytwo_pca <- plot_pca(lp_two_strains_norm, plot_title = "PCA of z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'lp_two_strains_norm' not found
## Error: object 'onlytwo_pca' not found
## Error: object 'onlytwo_pca' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_zymo' not found
lp_two_strains_known_norm <- sm(normalize_expt(lp_two_strains_known, norm = "quant", transform = "log2",
convert = "cpm", batch = FALSE, filter = TRUE))## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_two_strains_known' not found
onlytwo_known_pca <- plot_pca(lp_two_strains_known_norm, plot_title = "PCA of z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'lp_two_strains_known_norm' not found
## Error: object 'onlytwo_pca' not found
## Error: object 'onlytwo_pca' not found
lp_two_strains_nb <- normalize_expt(lp_zymo, norm = "quant", transform = "log2",
convert = "cpm", batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_zymo' not found
onlytwo_pca_nb <- plot_pca(lp_two_strains_nb, plot_title = "PCA of z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'lp_two_strains_nb' not found
## Error: object 'onlytwo_pca_nb' not found
## Error: object 'onlytwo_pca_nb' not found
This is by far the most problematic comparison, I think the only interpretation of the following images is that the parasite has little effect on the likelihood that a person will successfully end treatment. There does appear to be some variance associated with cure/fail, but only in a few samples (visible in ~10 fail samples and perhaps ~8 cure samples when sva is applied to the data).
cf_norm <- normalize_expt(lp_cf, convert = "cpm", transform = "log2",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_cf' not found
start_cf <- plot_pca(cf_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'cf_norm' not found
## Error: object 'start_cf' not found
## Error: object 'start_cf' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_cf' not found
cf_known_norm <- normalize_expt(lp_cf_known, convert = "cpm", transform = "log2",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_cf_known' not found
start_cf_known <- plot_pca(cf_known_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'cf_known_norm' not found
## Error: object 'start_cf_known' not found
## Error: object 'start_cf_known' not found
only_two_cf <- set_expt_conditions(lp_zymo, fact = "clinicalcategorical") %>%
set_expt_batches(fact = "sus_category_current") %>%
set_expt_colors(color_choices[["cf"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': error in evaluating the argument 'expt' in selecting a method for function 'set_expt_batches': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_zymo' not found
only_two_cf_norm <- normalize_expt(only_two_cf, norm = "quant", transform = "log2",
convert = "cpm", batch = FALSE, filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'only_two_cf' not found
only_two_cf_pca <- plot_pca(only_two_cf_norm, plot_title = "PCA of z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'only_two_cf_norm' not found
## Error: object 'only_two_cf_pca' not found
## png
## 2
## Error: object 'only_two_cf_pca' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'only_two_cf' not found
only_two_cf_known_norm <- normalize_expt(only_two_cf_known, norm = "quant", transform = "log2",
convert = "cpm", batch = FALSE, filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'only_two_cf_known' not found
only_two_cf_known_pca <- plot_pca(only_two_cf_known_norm,
plot_title = "PCA of z2.2 and z2.3 parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'only_two_cf_known_norm' not found
## Error: object 'only_two_cf_known_pca' not found
## png
## 2
## Error: object 'only_two_cf_known_pca' not found
cf_nb <- normalize_expt(lp_cf, convert = "cpm", transform = "log2",
filter = TRUE, batch = "svaseq")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_cf' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'cf_nb' not found
## Error: object 'cf_nb_pca' not found
## Error: object 'cf_nb_pca' not found
cf_norm <- normalize_expt(lp_cf, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_cf' not found
test <- pca_information(cf_norm,
expt_factors = c("clinicalcategorical", "zymodemecategorical",
"pathogenstrain", "passagenumber"),
num_components = 6, plot_pcas = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'pca_information': object 'cf_norm' not found
## Error: object 'test' not found
## Error: object 'test' not found
We have two competing metrics of antmonial sensitivity; one historical and one current. In both cases there is a reasonable expectation that resistant strains tend to be zymodeme 2.3 and sensitive strains tend to be zymodeme 2.2. There appear to be more exceptions to this rule of thumb in the current data than the historical.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'lp_susceptibility' not found
sus_norm <- normalize_expt(lp_susceptibility, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_susceptibility' not found
sus_pca <- plot_pca(sus_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_norm' not found
## Error: object 'sus_pca' not found
## Error: object 'sus_pca' not found
## Error: object 'sus_pca' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_susceptibility' not found
sus_known_norm <- normalize_expt(lp_susceptibility_known, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_susceptibility_known' not found
sus_known_pca <- plot_pca(sus_known_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_known_norm' not found
## Error: object 'sus_known_pca' not found
## Error: object 'sus_known_pca' not found
lp_sus_two <- subset_expt(lp_susceptibility, subset = "zymodemecategorical!='z21'") %>%
subset_expt(subset = "zymodemecategorical!='z24'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_susceptibility' not found
sus_two_norm <- normalize_expt(lp_sus_two, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_sus_two' not found
sus_two_pca <- plot_pca(sus_two_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_two_norm' not found
## Error: object 'sus_two_pca' not found
## Error: object 'sus_two_pca' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_sus_two' not found
sus_two_known_norm <- normalize_expt(lp_sus_two_known, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_sus_two_known' not found
sus_two_known_pca <- plot_pca(sus_two_known_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_two_known_norm' not found
## Error: object 'sus_two_known_pca' not found
## Error: object 'sus_two_known_pca' not found
sus_nb <- normalize_expt(lp_susceptibility, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_susceptibility' not found
sus_nb_pca <- plot_pca(sus_nb, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_nb' not found
## Error: object 'sus_nb_pca' not found
## Error: object 'sus_nb_pca' not found
sus_hist_norm <- normalize_expt(lp_susceptibility_historical, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_susceptibility_historical' not found
sus_hist_pca <- plot_pca(sus_hist_norm, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_hist_norm' not found
## Error: object 'sus_hist_pca' not found
## Error: object 'sus_hist_pca' not found
sus_hist_nb <- normalize_expt(lp_susceptibility_historical, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_susceptibility_historical' not found
sus_hist_nb_pca <- plot_pca(sus_hist_nb, plot_title = "PCA of parasite expression values",
plot_labels = FALSE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'sus_hist_nb' not found
## Error: object 'sus_hist_nb_pca' not found
## Error: object 'sus_hist_nb_pca' not found
Najib read me an email listing off the gene names associated with the zymodeme classification. I took those names and cross referenced them against the Leishmania panamensis gene annotations and found the following:
They are:
Given these 6 gene IDs (NH has two gene IDs associated with it), I can do some looking for specific differences among the various samples.
The following creates a colorspace (red to green) heatmap showing the observed expression of these genes in every sample.
my_genes <- c("LPAL13_120010900", "LPAL13_340013000", "LPAL13_000054100",
"LPAL13_140006100", "LPAL13_180018500", "LPAL13_320022300",
"other")
my_names <- c("ALAT", "ASAT", "G6PD", "NHv1", "NHv2", "MPI", "other")
zymo_expt <- exclude_genes_expt(strain_norm, ids = my_genes, method = "keep")## Error in exclude_genes_expt(strain_norm, ids = my_genes, method = "keep"): could not find function "exclude_genes_expt"
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'zymo_expt' not found
## Error: object 'zymo_heatmap' not found
A recent suggestion included a query about the relationship of our amastigote TMRC2 samples which were the result of infecting a set of macrophages vs. these promastigote samples.
So far, we have kept these two experiments separate, now let us merge them.
tmrc2_macrophage_norm <- normalize_expt(lp_macrophage, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'lp_macrophage' not found
## Hey you, this annotation call should be made automatic for the container!
annotation(lp_expt) <- "org.Lpanamensis.MHOMCOL81L13.v46.eg.db"## Error: object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'annotation': object 'lp_expt' not found
## Error: object 'lp_expt' not found
Before we can use the combined data, we must reconcile a few of aspects of it, notably we need to specify which samples are amastigotes and which are promastigotes.
## Error: object 'all_tmrc2' not found
## Error: object 'all_nosb' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'all_nosb' not found
## Error: object 'all_nosb' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'all_nosb' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'all_nosb' not found
## Error: object 'all_nosb' not found
## Make sure that the zymodeme does not have the inf_ prefix.
zymodeme_char <- gsub(x = pData(all_nosb)[["condition"]], pattern = "^inf_", replacement = "")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'gsub': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'all_nosb' not found
## Error: object 'zymodeme_char' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'all_nosb' not found
all_norm <- normalize_expt(all_nosb, convert = "cpm", norm = "quant",
transform = "log2", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'all_nosb' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'all_norm' not found
## Error: object 'pro_ama_pca' not found
I think the above picture is sort of the opposite of what we want to compare in a DE analysis for this set of data, e.g. we want to compare promastigotes from amastigotes?
two_nosb <- set_expt_batches(all_nosb, fact = "condition") %>%
set_expt_conditions(fact = "stage") %>%
subset_expt(subset = "batch=='z2.2'|batch=='z2.3'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': error in evaluating the argument 'object' in selecting a method for function 'pData': error in evaluating the argument 'expt' in selecting a method for function 'set_expt_batches': object 'all_nosb' not found
two_norm <- normalize_expt(two_nosb, convert = "cpm", norm = "quant",
transform = "log2", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'two_nosb' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'two_norm' not found
## Error: object 'pro_ama_two_pca' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'two_nosb' not found
## Error: object 'zy_stage_factor' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'two_nosb' not found
zystage_norm <- normalize_expt(zystage, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2")## Error in h(simpleError(msg, call)): error in evaluating the argument 'input' in selecting a method for function 'state': object 'zystage' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'zystage_norm' not found
zystage_keepers <- list(
"z2322_ama" = c("z23amastigote", "z22amastigote"),
"z2322_pro" = c("z23promastigote", "z22promastigote"),
"proama_z23" = c("z23amastigote", "z23promastigote"),
"proama_z22" = c("z22amastigote", "z22promastigote"))
zystage_de <- all_pairwise(zystage, filter = TRUE, model_batch = "svaseq")## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'zystage' not found
zystage_tables <- combine_de_tables(
zystage_de, keepers = zystage_keepers,
excel = glue("excel/zymodeme_stage_table-v{ver}.xlsx"))## Error: object 'zystage_de' not found
I want to make a plot where the x-axis is the number of genes on a chromosome and the y-axis is the mean of the expression of those genes.
## Error in plot_exprs_by_chromosome(lp_zymo): could not find function "plot_exprs_by_chromosome"
## Error: object 'exprs_by_chr_plot' not found
One potentially interesting aspect of the variant data: it may be able to help us define the zymodeme state of previous, untested samples.
In order to test this, I am loading some of the 2016 data alongside the new TMRC2 data to see if they fit together.
This is using an older dataset for which I am not sure we have permissions to include in the container, so I am turning them off for now.
old_expt <- create_expt("sample_sheets/tmrc2_samples_20191203.xlsx",
file_column = "tophat2file")
tt <- old_expt$expressionset
rownames(tt) <- gsub(pattern = "^exon_", replacement = "", x = rownames(tt))
rownames(tt) <- gsub(pattern = "\\.1$", replacement = "", x = rownames(tt))
old_expt$expressionset <- tt
rm(tt)One other important caveat, we have a group of new samples which have not yet run through the variant search pipeline, so I need to remove them from consideration. Though it looks like they finished overnight…
In the non-containerized version of this document, the following block combines an older dataset with the current data.
both_norm <- normalize_expt(new_snps, transform = "log2", norm = "quant") %>%
set_expt_conditions(fact = "pathogenstrain")## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': error in evaluating the argument 'input' in selecting a method for function 'state': object 'new_snps' not found
The data structure ‘both_norm’ now contains our 2016 data along with the newer data collected since 2019.
The following plot shows the SNP profiles of all samples (old and new) where the colors at the top show either the 2.2 strains (orange), 2.3 strains (green), the previous samples (purple), or the various lab strains (pink etc).
## Error in h(simpleError(msg, call)): error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'new_snps' not found
## Error: object 'new_variant_heatmap' not found
## Error: object 'new_variant_heatmap' not found
The function get_snp_sets() takes the provided metadata factor (in this case ‘condition’) and looks for variants which are exclusive to each element in it. In this case, this is looking for differences between 2.2 and 2.3, as well as the set shared among them.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'colData': object 'new_snps' not found
## Error: object 'snp_sets' not found
##Biobase::annotation(old_expt$expressionset) = Biobase::annotation(lp_expt$expressionset)
##both_expt <- combine_expts(lp_expt, old_expt)
snp_genes <- sm(snps_vs_genes(lp_expt, snp_sets, expt_name_col = "chromosome"))## Error in snps_vs_genes(lp_expt, snp_sets, expt_name_col = "chromosome"): unused argument (expt_name_col = "chromosome")
## Error: object 'snp_genes' not found
## I think we have some metrics here we can plot...
snp_subset <- snp_subset_genes(
lp_expt, new_snps,
genes = c("LPAL13_120010900", "LPAL13_340013000", "LPAL13_000054100",
"LPAL13_140006100", "LPAL13_180018500", "LPAL13_320022300"))## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'snp_subset' not found
## Error: object 'zymo_heat' not found
Najib has asked a few times about the relationship between variants and DE genes. In subsequent conversations I figured out what he really wants to learn is variants in the UTR (most likely 5’) which might affect expression of genes. The following explicitly does not help this question, but is a paralog: is there a relationship between variants in the CDS and differential expression?
In order to do this comparison, we need to reload some of the DE results.
These blocks need to be moved to post-differential analyses
rda <- glue("rda/zymo_tables_sva-v{ver}.rda")
varname <- gsub(x = basename(rda), pattern = "\\.rda", replacement = "")
loaded <- load(file = rda)
zy_df <- get0(varname)[["data"]][["zymodeme"]]vars_df <- data.frame(ID = names(snp_genes$summary_by_gene), variants = as.numeric(snp_genes$summary_by_gene))
vars_df[["variants"]] <- log2(vars_df[["variants"]] + 1)
vars_by_de_gene <- merge(zy_df, vars_df, by.x = "row.names", by.y = "ID")
cor.test(vars_by_de_gene$deseq_logfc, vars_by_de_gene$variants)
variants_wrt_logfc <- plot_linear_scatter(vars_by_de_gene[, c("deseq_logfc", "variants")])
variants_wrt_logfc$scatter
## It looks like there might be some genes of interest, even though this is not actually
## the question of interest.Didn’t I create a set of densities by chromosome? Oh I think they come in from get_snp_sets()
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'colData': object 'new_snps' not found
## Error: object 'clinical_sets' not found
## Error: object 'clinical_sets' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'grep': object 'density_vec' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'density_vec' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'density_df' not found
## Error: object 'density_df' not found
var_den_chr <- ggplot(density_df, aes(x = chr, y = density_vec)) +
ggplot2::geom_col() +
ggplot2::theme(axis.text = ggplot2::element_text(size = 10, colour = "black"),
axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5))## Error: object 'density_df' not found
## Error: object 'var_den_chr' not found
## Error: object 'var_den_chr' not found
## png
## 2
## oops, forgot to export write_snps... fixed.
clinical_written <- write_snps(new_snps, output_file = "clinical_variants.aln")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'assay': object 'new_snps' not found
## Error in snps_vs_genes(lp_expt, clinical_sets, expt_name_col = "chromosome"): unused argument (expt_name_col = "chromosome")
snp_density <- merge(as.data.frame(clinical_genes[["summary"]]),
as.data.frame(fData(lp_expt)),
by = "row.names")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'clinical_genes' not found
## Error: object 'snp_density' not found
## Error: object 'snp_density' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'tolower': object 'snp_density' not found
## Error: object 'snp_density' not found
## Error: object 'snp_density' not found
## Error: object 'snp_density' not found
## Error: object 'snp_density' not found
removers <- c("amastin", "gp63", "leishmanolysin")
for (r in removers) {
drop_idx <- grepl(pattern = r, x = snp_density[["product"]])
snp_density <- snp_density[!drop_idx, ]
}## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'grepl': object 'snp_density' not found
Let us grab out the number of variants/gene for the cure/fail samples, merge them into a dataframe, and add that to the gene annotations for the lp_expt datastructure.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'clinical_snps' not found
## Error: object 'fail_ref_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'clinical_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'fail_ref_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'cure_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'cure_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'fail_ref_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'fData': object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'clinical_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'clinical_interest_cure' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'clinical_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'clinical_interest_fail' not found
clinical_interest <- merge(clinical_interest_cure,
clinical_interest_fail,
by = "row.names", all = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'clinical_interest_cure' not found
## Error: object 'clinical_interest' not found
## Error: object 'clinical_interest' not found
## Error: object 'clinical_interest' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'annot' not found
## Error: object 'annot' not found
## Error: object 'annot' not found
## Error: object 'annot' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'fData': object 'lp_expt' not found
## Error: object 'annot' not found
The heatmap produced here should show the variants only for the zymodeme genes.
I am thinking that if we find clusters of locations which are variant, that might provide some PCR testing possibilities.
## Drop the 2.1, 2.4, unknown, and null
pruned_snps <- subset_expt(new_snps, subset = "condition=='z2.2'|condition=='z2.3'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'new_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'colData': object 'pruned_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'new_sets' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'new_sets' not found
## Error in eval(expr, p): object 'new_sets' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'new_sets' not found
## Error in eval(expr, p): object 'new_sets' not found
Thus we see that there are 3,553 variants associated with 2.2 and 81,589 associated with 2.3.
The following function uses the positional data to look for sequential mismatches associated with zymodeme in the hopes that there will be some regions which would provide good potential targets for a PCR-based assay.
sequential_variants <- function(snp_sets, conditions = NULL, minimum = 3, maximum_separation = 3) {
if (is.null(conditions)) {
conditions <- 1
}
intersection_sets <- snp_sets[["intersections"]]
intersection_names <- snp_sets[["set_names"]]
chosen_intersection <- 1
if (is.numeric(conditions)) {
chosen_intersection <- conditions
} else {
intersection_idx <- intersection_names == conditions
chosen_intersection <- names(intersection_names)[intersection_idx]
}
possible_positions <- intersection_sets[[chosen_intersection]]
position_table <- data.frame(row.names = possible_positions)
pat <- "^chr_(.+)_pos_(.+)_ref_.*$"
position_table[["chr"]] <- gsub(pattern = pat, replacement = "\\1", x = rownames(position_table))
position_table[["pos"]] <- as.numeric(gsub(pattern = pat, replacement = "\\2", x = rownames(position_table)))
position_idx <- order(position_table[, "chr"], position_table[, "pos"])
position_table <- position_table[position_idx, ]
position_table[["dist"]] <- 0
last_chr <- ""
for (r in 1:nrow(position_table)) {
this_chr <- position_table[r, "chr"]
if (r == 1) {
position_table[r, "dist"] <- position_table[r, "pos"]
last_chr <- this_chr
next
}
if (this_chr == last_chr) {
position_table[r, "dist"] <- position_table[r, "pos"] - position_table[r - 1, "pos"]
} else {
position_table[r, "dist"] <- position_table[r, "pos"]
}
last_chr <- this_chr
}
## Working interactively here.
doubles <- position_table[["dist"]] == 1
doubles <- position_table[doubles, ]
write.csv(doubles, "doubles.csv")
one_away <- position_table[["dist"]] == 2
one_away <- position_table[one_away, ]
write.csv(one_away, "one_away.csv")
two_away <- position_table[["dist"]] == 3
two_away <- position_table[two_away, ]
write.csv(two_away, "two_away.csv")
combined <- rbind(doubles, one_away)
combined <- rbind(combined, two_away)
position_idx <- order(combined[, "chr"], combined[, "pos"])
combined <- combined[position_idx, ]
this_chr <- ""
for (r in 1:nrow(combined)) {
this_chr <- combined[r, "chr"]
if (r == 1) {
combined[r, "dist_pair"] <- combined[r, "pos"]
last_chr <- this_chr
next
}
if (this_chr == last_chr) {
combined[r, "dist_pair"] <- combined[r, "pos"] - combined[r - 1, "pos"]
} else {
combined[r, "dist_pair"] <- combined[r, "pos"]
}
last_chr <- this_chr
}
dist_pair_maximum <- 1000
dist_pair_minimum <- 200
dist_pair_idx <- combined[["dist_pair"]] <= dist_pair_maximum &
combined[["dist_pair"]] >= dist_pair_minimum
remaining <- combined[dist_pair_idx, ]
no_weak_idx <- grepl(pattern = "ref_(G|C)", x = rownames(remaining))
remaining <- remaining[no_weak_idx, ]
print(head(table(position_table[["dist"]])))
sequentials <- position_table[["dist"]] <= maximum_separation
message("There are ", sum(sequentials), " candidate regions.")
## The following can tell me how many runs of each length occurred, that is not quite what I want.
## Now use run length encoding to find the set of sequential sequentials!
rle_result <- rle(sequentials)
rle_values <- rle_result[["values"]]
## The following line is equivalent to just leaving values alone:
## true_values <- rle_result[["values"]] == TRUE
rle_lengths <- rle_result[["lengths"]]
true_sequentials <- rle_lengths[rle_values]
rle_idx <- cumsum(rle_lengths)[which(rle_values)]
position_table[["last_sequential"]] <- 0
count <- 0
for (r in rle_idx) {
count <- count + 1
position_table[r, "last_sequential"] <- true_sequentials[count]
}
message("The maximum sequential set is: ", max(position_table[["last_sequential"]]), ".")
wanted_idx <- position_table[["last_sequential"]] >= minimum
wanted <- position_table[wanted_idx, c("chr", "pos")]
return(wanted)
}
zymo22_sequentials <- sequential_variants(new_sets, conditions = "z22",
minimum = 1, maximum_separation = 2)
dim(zymo22_sequentials)
## 7 candidate regions for zymodeme 2.2 -- thus I am betting that the reference strain is a 2.2
zymo23_sequentials <- sequential_variants(new_sets, conditions = "z23",
minimum = 2, maximum_separation = 2)
dim(zymo23_sequentials)
## In contrast, there are lots (587) of interesting regions for 2.3!The first 4 candidate regions from my set of remaining: * Chr Pos. Distance * LpaL13-15 238433 448 * LpaL13-18 142844 613 * LpaL13-29 830342 252 * LpaL13-33 1331507 843
Lets define a couple of terms: * Third: Each of the 4 above positions. * Second: Third - Distance * End: Third + PrimerLen * Start: Second - Primerlen
In each instance, these are the last positions, so we want to grab three things:
## * LpaL13-15 238433 448
first_candidate_chr <- lp_genome[["LpaL13_15"]]
primer_length <- 22
amplicon_length <- 448
first_candidate_third <- 238433
first_candidate_second <- first_candidate_third - amplicon_length
first_candidate_start <- first_candidate_second - primer_length
first_candidate_end <- first_candidate_third + primer_length
first_candidate_region <- subseq(first_candidate_chr, first_candidate_start, first_candidate_end)
first_candidate_region
first_candidate_5p <- subseq(first_candidate_chr, first_candidate_start, first_candidate_second)
as.character(first_candidate_5p)
first_candidate_3p <- spgs::reverseComplement(subseq(first_candidate_chr, first_candidate_third, first_candidate_end))
first_candidate_3p
## * LpaL13-18 142844 613
second_candidate_chr <- lp_genome[["LpaL13_18"]]
primer_length <- 22
amplicon_length <- 613
second_candidate_third <- 142844
second_candidate_second <- second_candidate_third - amplicon_length
second_candidate_start <- second_candidate_second - primer_length
second_candidate_end <- second_candidate_third + primer_length
second_candidate_region <- subseq(second_candidate_chr, second_candidate_start, second_candidate_end)
second_candidate_region
second_candidate_5p <- subseq(second_candidate_chr, second_candidate_start, second_candidate_second)
as.character(second_candidate_5p)
second_candidate_3p <- spgs::reverseComplement(subseq(second_candidate_chr, second_candidate_third, second_candidate_end))
second_candidate_3p
## * LpaL13-29 830342 252
third_candidate_chr <- lp_genome[["LpaL13_29"]]
primer_length <- 22
amplicon_length <- 252
third_candidate_third <- 830342
third_candidate_second <- third_candidate_third - amplicon_length
third_candidate_start <- third_candidate_second - primer_length
third_candidate_end <- third_candidate_third + primer_length
third_candidate_region <- subseq(third_candidate_chr, third_candidate_start, third_candidate_end)
third_candidate_region
third_candidate_5p <- subseq(third_candidate_chr, third_candidate_start, third_candidate_second)
as.character(third_candidate_5p)
third_candidate_3p <- spgs::reverseComplement(subseq(third_candidate_chr, third_candidate_third, third_candidate_end))
third_candidate_3p
## You are a garbage polypyrimidine tract.
## Which is actually interesting if the mutations mess it up.
## * LpaL13-33 1331507 843
fourth_candidate_chr <- lp_genome[["LpaL13_33"]]
primer_length <- 22
amplicon_length <- 843
fourth_candidate_third <- 1331507
fourth_candidate_second <- fourth_candidate_third - amplicon_length
fourth_candidate_start <- fourth_candidate_second - primer_length
fourth_candidate_end <- fourth_candidate_third + primer_length
fourth_candidate_region <- subseq(fourth_candidate_chr, fourth_candidate_start, fourth_candidate_end)
fourth_candidate_region
fourth_candidate_5p <- subseq(fourth_candidate_chr, fourth_candidate_start, fourth_candidate_second)
as.character(fourth_candidate_5p)
fourth_candidate_3p <- spgs::reverseComplement(subseq(fourth_candidate_chr, fourth_candidate_third, fourth_candidate_end))
fourth_candidate_3pI made a fun little function which should find regions which have lots of variants associated with a given experimental factor.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_expt' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'pheno' not found
## Error in count_expt_snps(pheno, annot_column = "freebayessummary", snp_column = "PAIRED"): could not find function "count_expt_snps"
I cannot run the following block in the container unless/until I copy the gff into it…
fun_stuff <- snp_density_primers(
pheno_snps,
bsgenome = "BSGenome.Leishmania.panamensis.MHOMCOL81L13.v53",
gff = "reference/TriTrypDB-53_LpanamensisMHOMCOL81L13.gff")
drop_scaffolds <- grepl(x = rownames(fun_stuff$favorites), pattern = "SCAF")
favorite_primer_regions <- fun_stuff[["favorites"]][!drop_scaffolds, ]
favorite_primer_regions[["bin"]] <- rownames(favorite_primer_regions)
favorite_primer_regions <- favorite_primer_regions %>%
relocate(bin)Here is my note from our meeting:
Cross reference primers to DE genes of 2.2/2.3 and/or resistance/suscpetible, add a column to the primer spreadsheet with the DE genes (in retrospect I am guessing this actually means to put the logFC as a column.
One nice thing, I did a semantic removal on the lp_expt, so the set of logFC/pvalues should not have any of the offending types; thus I should be able to automagically get rid of them in the merge.
This block needs to go after differential expression analyses.
logfc <- zy_table_sva[["data"]][["z23_vs_z22"]]
logfc_columns <- logfc[, c("deseq_logfc", "deseq_adjp")]
colnames(logfc_columns) <- c("z23_logfc", "z23_adjp")
new_table <- merge(favorite_primer_regions, logfc_columns,
by.x = "closest_gene_before_id", by.y = "row.names")
sus <- sus_table_sva[["data"]][["sensitive_vs_resistant"]]
sus_columns <- sus[, c("deseq_logfc", "deseq_adjp")]
colnames(sus_columns) <- c("sus_logfc", "sus_adjp")
new_table <- merge(new_table, sus_columns,
by.x = "closest_gene_before_id", by.y = "row.names") %>%
relocate(bin)
written <- write_xlsx(data = new_table,
excel = "excel/favorite_primers_xref_zy_sus.xlsx")We can cross reference the variants against the zymodeme status and plot a heatmap of the results and hopefully see how they separate.
## Error in snps_vs_genes(lp_expt, new_sets, expt_name_col = "chromosome"): unused argument (expt_name_col = "chromosome")
clinical_colors_v2 <- list(
"z22" = "#0000cc",
"z23" = "#cc0000")
new_zymo_norm <- normalize_expt(pruned_snps, norm = "quant") %>%
set_expt_conditions(fact = "zymodemecategorical") %>%
set_expt_colors(clinical_colors_v2)## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': error in evaluating the argument 'object' in selecting a method for function 'pData': error in evaluating the argument 'input' in selecting a method for function 'state': object 'pruned_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'new_zymo_norm' not found
## Error: object 'zymo_heat' not found
## Error: object 'zymo_heat' not found
Now let us try to make a heatmap which includes some of the annotation data.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'both_norm' not found
## Error: object 'des' not found
## Error: object 'des' not found
##hmcols <- colorRampPalette(c("yellow","black","darkblue"))(256)
correlations <- hpgl_cor(exprs(both_norm))## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'both_norm' not found
## Error: object 'correlations' not found
## Error: object 'correlations' not found
## Make an initial heatmap via plot_disheat, which may get used as the figure:
initial_snps <- set_expt_conditions(both_norm, fact = "zymodemereference", colors = color_choices[["strain"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'both_norm' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'both_norm' not found
dev <- pp(file = "figures/initial_snp_heatmap.pdf", width = 20, height = 20)
initial_disheat[["plot"]]## Error: object 'initial_disheat' not found
## Error: object 'zymo_heat' not found
## Error: object 'des' not found
## Error: object 'des' not found
## Error: object 'des' not found
## Error: object 'des' not found
mydendro <- list(
"clustfun" = hclust,
"lwd" = 2.0)
col_data <- as.data.frame(des[, c("zymodemecategorical")])## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'des' not found
## Error: object 'des' not found
## Error: object 'col_data' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'des' not found
## Error: object 'row_data' not found
## Error: object 'row_data' not found
## Error: object 'col_data' not found
myclust <- list("cuth" = 1.0,
"col" = BrewerClusterCol)
mylabs <- list(
"Row" = list("nrow" = 4),
"Col" = list("nrow" = 4))
hmcols <- colorRampPalette(c("darkblue", "beige"))(240)
zymo_annot_heat <- annHeatmap2(
correlations,
dendrogram = mydendro,
annotation = myannot,
cluster = myclust,
labels = mylabs,
## The following controls if the picture is symmetric
scale = "none",
col = hmcols)## Error: object 'correlations' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'zymo_annot_heat' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'zymo_annot_heat' not found
Print the larger heatmap so that all the labels appear. Keep in mind that as we get more samples, this image needs to continue getting bigger.
I cannot run the following block until/unless I install cmplot in the container. Oh, I did! Let us run it and see what happens.
## Error: object 'pheno_snps' not found
## Error: object 'pheno_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'pheno_snps' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'pheno_snps' not found
for (n in names(xref_prop)) {
new_tbl[[n]] <- 0
idx_cols <- which(pheno_snps[["conditions"]] == n)
prop_col <- rowSums(idx_tbl[, idx_cols]) / xref_prop[n]
new_tbl[n] <- prop_col
}## Error: object 'xref_prop' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'grepl': error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'new_tbl' not found
new_tbl[["Chromosome"]] <- gsub(x = new_tbl[["SNP"]], pattern = "chr_(.*)_pos_.*", replacement = "\\1")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'gsub': object 'new_tbl' not found
new_tbl[["Position"]] <- gsub(x = new_tbl[["SNP"]], pattern = ".*_pos_(\\d+)_.*", replacement = "\\1")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'gsub': object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'new_tbl' not found
## Error: object 'simplify' not found
CMplot(new_tbl, bin.size = 10000, threshold = c(0.01, 0.05), plot.type = "d",
file.name = "variant_density_10k")## Error: object 'new_tbl' not found
CMplot(new_tbl, bin.size = 1000, threshold = c(0.01, 0.05), plot.type = "d",
file.name = "variant_density_1k")## Error: object 'new_tbl' not found
CMplot(new_tbl, bin.size = 100000, threshold = c(0.01, 0.05), plot.type = "d",
file.name = "variant_density_100k")## Error: object 'new_tbl' not found
CMplot(new_tbl, plot.type = "m", multracks = TRUE, threshold = c(0.01, 0.05),
threshold.lwd = c(1,1), threshold.col = c("black","grey"),
amplify = TRUE, bin.size = 1000,
chr.den.col = c("darkgreen", "yellow", "red"),
signal.col = c("red", "green", "blue"),
signal.cex = 1, file = "jpg", dpi = 300, file.output = TRUE, verbose = TRUE)## Error: object 'new_tbl' not found
I have been a bit frustrated with the clunkyness of cmplot, so I did some reading and found autoplot. It makes use of g/iranges to plot arbitrary data and as such has the potential to be significantly more generally useful than cmplot. I think I will be able to use it to view a lot of interesting different data types. In this instance I want to plot density of variants associated with various conditions in the data (z2.3/z2.2, cure/fail, whatever). In addition, it might be nice to have the ORFs displayed in some fashion (space permitting).
## Error in get_eupath_entry(species = "MHOM/COL", metadata = meta): could not find function "get_eupath_entry"
## These lines cannot run in the container because it cannot write
##txdb_pkgname <- make_eupath_txdb(lp_entry)
##grange_name <- make_eupath_granges(lp_entry)
grange_name <- gsub(x = lp_entry[["GrangesPkg"]], pattern = "\\.rda$", replacement = "")## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'gsub': object 'lp_entry' not found
## Error: object 'lp_entry' not found
if (file.exists(grange_filename)) {
load(grange_filename)
} else {
created <- dir.create("build/gff", recursive = TRUE)
grange_build <- make_eupath_granges(lp_entry)
grange_filename <- grange_build[["rda"]]
load(grange_filename)
}## Error: object 'grange_filename' not found
## Error: object 'grange_name' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'grepl': error in evaluating the argument 'x' in selecting a method for function 'seqnames': object 'grange_data' not found
## Error: object 'grange_data' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'grepl': error in evaluating the argument 'x' in selecting a method for function 'seqinfo': object 'grange_data' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqinfo': object 'grange_data' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqinfo': object 'grange_data' not found
## Error: object 'new_tbl' not found
## Error: object 'auto_tbl' not found
## Error: object 'auto_tbl' not found
## Error: object 'auto_tbl' not found
tilesize <- 1000
bins_1k <- GenomicRanges::tileGenome(seqlengths(no_scaffolds), tilewidth = 1000,
cut.last.tile.in.chrom = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqlengths': object 'no_scaffolds' not found
bins_5k <- GenomicRanges::tileGenome(seqlengths(no_scaffolds), tilewidth = 5000,
cut.last.tile.in.chrom = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqlengths': object 'no_scaffolds' not found
bins_10k <- GenomicRanges::tileGenome(seqlengths(no_scaffolds), tilewidth = 10000,
cut.last.tile.in.chrom = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqlengths': object 'no_scaffolds' not found
bins_1nt <- GenomicRanges::tileGenome(seqlengths(no_scaffolds), tilewidth = 1,
cut.last.tile.in.chrom = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqlengths': object 'no_scaffolds' not found
## Error: object 'auto_tbl' not found
## I want to calculate the number of intersecting positions between my auto_tbl and the 1k bins.
start <- auto_tbl[, c("Chromosome", "Position", "position2", "strand", "strong23")]## Error: object 'auto_tbl' not found
## Error in `colnames<-`(`*tmp*`, value = c("chr", "start", "end", "strand", : attempt to set 'colnames' on an object with less than two dimensions
## Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'gsub': object of type 'closure' is not subsettable
## Error: object 'no_scaffolds' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'query' in selecting a method for function 'findOverlaps': object 'bins_1k' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'bins_1k' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'vars_per_bin_numeric' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'vars_per_bin' not found
## Error: object 'count_per_bin' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'vars_per_bin_numeric' not found
## Error: object 'vars_per_bin_numeric' not found
## Error: object 'vars_per_bin_numeric' not found
## Error: object 'vars_per_bin_numeric' not found
vpb_grange <- makeGRangesFromDataFrame(vars_per_bin, seqinfo = no_scaffolds, keep.extra.columns = TRUE)## Error: object 'vars_per_bin' not found
kary <- autoplot(vpb_grange, layout = "karyogram", aes(color = num, fill = num)) +
scale_color_gradient(low = "blue", high = "red") +
scale_fill_gradient(low = "blue", high = "red")## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'autoplot': object 'vpb_grange' not found
## Error: object 'kary' not found
## png
## 2
var_kary <- ggbio() +
layout_karyogram(vpb_grange, aes(color = num, fill = num)) +
scale_fill_gradient(low = "blue", high = "white") +
scale_color_gradient(low = "blue", high = "white") +
theme_bw(base_size = 10)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'layout_karyogram': object 'vpb_grange' not found
## Error: object 'var_kary' not found
This tool looks a little opaque, but provides sample data with things that make sense to me and should be pretty easy to recapitulate in our data.
## For this, let us use the 'new_snps' data structure.
## Caveat here: these need to be coerced to numbers.
my_covariates <- pData(new_snps)[, c("zymodemecategorical", "clinicalcategorical")]
for (col in colnames(my_covariates)) {
my_covariates[[col]] <- as.numeric(as.factor(my_covariates[[col]]))
}
my_covariates <- t(my_covariates)
my_geneloc <- fData(lp_expt)[, c("gid", "chromosome", "start", "end")]
colnames(my_geneloc) <- c("geneid", "chr", "left", "right")
my_ge <- exprs(normalize_expt(lp_expt, transform = "log2", filter = TRUE, convert = "cpm"))
used_samples <- tolower(colnames(my_ge)) %in% colnames(exprs(new_snps))
my_ge <- my_ge[, used_samples]
my_snpsloc <- data.frame(rownames = rownames(exprs(new_snps)))
## Oh, caveat here: Because of the way I stored the data,
## I could have duplicate rows which presumably will make matrixEQTL sad
my_snpsloc[["chr"]] <- gsub(pattern = "^chr_(.+)_pos(.+)_ref_.*$", replacement = "\\1",
x = rownames(my_snpsloc))
my_snpsloc[["pos"]] <- gsub(pattern = "^chr_(.+)_pos(.+)_ref_.*$", replacement = "\\2",
x = rownames(my_snpsloc))
test <- duplicated(my_snpsloc)
## Each duplicated row would be another variant at that position;
## so in theory we would do a rle to number them I am guessing
## However, I do not have different variants so I think I can ignore this for the moment
## but will need to make my matrix either 0 or 1.
if (sum(test) > 0) {
message("There are: ", sum(duplicated), " duplicated entries.")
keep_idx <- ! test
my_snpsloc <- my_snpsloc[keep_idx, ]
}
my_snps <- exprs(new_snps)
one_idx <- my_snps > 0
my_snps[one_idx] <- 1
## Ok, at this point I think I have all the pieces which this method wants...
## Oh, no I guess not; it actually wants the data as a set of filenames...
library(MatrixEQTL)
write.table(my_snps, "eqtl/snps.tsv", na = "NA", col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
## readr::write_tsv(my_snps, "eqtl/snps.tsv", )
write.table(my_snpsloc, "eqtl/snpsloc.tsv", na = "NA", col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
## readr::write_tsv(my_snpsloc, "eqtl/snpsloc.tsv")
write.table(as.data.frame(my_ge), "eqtl/ge.tsv", na = "NA", col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
## readr::write_tsv(as.data.frame(my_ge), "eqtl/ge.tsv")
write.table(as.data.frame(my_geneloc), "eqtl/geneloc.tsv", na = "NA", col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
## readr::write_tsv(as.data.frame(my_geneloc), "eqtl/geneloc.tsv")
write.table(as.data.frame(my_covariates), "eqtl/covariates.tsv", na = "NA", col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
## readr::write_tsv(as.data.frame(my_covariates), "eqtl/covariates.tsv")
useModel = modelLINEAR # modelANOVA, modelLINEAR, or modelLINEAR_CROSS
# Genotype file name
SNP_file_name = "eqtl/snps.tsv"
snps_location_file_name = "eqtl/snpsloc.tsv"
expression_file_name = "eqtl/ge.tsv"
gene_location_file_name = "eqtl/geneloc.tsv"
covariates_file_name = "eqtl/covariates.tsv"
# Output file name
output_file_name_cis = tempfile()
output_file_name_tra = tempfile()
# Only associations significant at this level will be saved
pvOutputThreshold_cis = 0.1
pvOutputThreshold_tra = 0.1
# Error covariance matrix
# Set to numeric() for identity.
errorCovariance = numeric()
# errorCovariance = read.table("Sample_Data/errorCovariance.txt");
# Distance for local gene-SNP pairs
cisDist = 1e6
## Load genotype data
snps = SlicedData$new()
snps$fileDelimiter = "\t" # the TAB character
snps$fileOmitCharacters = "NA" # denote missing values;
snps$fileSkipRows = 1 # one row of column labels
snps$fileSkipColumns = 1 # one column of row labels
snps$fileSliceSize = 2000 # read file in slices of 2,000 rows
snps$LoadFile(SNP_file_name)
## Load gene expression data
gene = SlicedData$new()
gene$fileDelimiter = "\t" # the TAB character
gene$fileOmitCharacters = "NA" # denote missing values;
gene$fileSkipRows = 1 # one row of column labels
gene$fileSkipColumns = 1 # one column of row labels
gene$fileSliceSize = 2000 # read file in slices of 2,000 rows
gene$LoadFile(expression_file_name)
## Load covariates
cvrt = SlicedData$new()
cvrt$fileDelimiter = "\t" # the TAB character
cvrt$fileOmitCharacters = "NA" # denote missing values;
cvrt$fileSkipRows = 1 # one row of column labels
cvrt$fileSkipColumns = 1 # one column of row labels
if(length(covariates_file_name) > 0) {
cvrt$LoadFile(covariates_file_name)
}
## Run the analysis
snpspos = read.table(snps_location_file_name, header = TRUE, stringsAsFactors = FALSE)
genepos = read.table(gene_location_file_name, header = TRUE, stringsAsFactors = FALSE)
me = Matrix_eQTL_main(
snps = snps,
gene = gene,
cvrt = cvrt,
output_file_name = output_file_name_tra,
pvOutputThreshold = pvOutputThreshold_tra,
useModel = useModel,
errorCovariance = errorCovariance,
verbose = TRUE,
output_file_name.cis = output_file_name_cis,
pvOutputThreshold.cis = pvOutputThreshold_cis,
snpspos = snpspos,
genepos = genepos,
cisDist = cisDist,
pvalue.hist = "qqplot",
min.pv.by.genesnp = FALSE,
noFDRsaveMemory = FALSE);## Warning: Your system is mis-configured: '/etc/localtime' is not a symlink
## Warning: It is strongly recommended to set envionment variable TZ to
## 'America/New_York' (or equivalent)
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
locale: C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ggbio(v.1.56.1), GenomicRanges(v.1.60.0), GenomeInfoDb(v.1.44.2), IRanges(v.2.42.0), S4Vectors(v.0.46.0), BiocGenerics(v.0.54.0), generics(v.0.1.4), glue(v.1.8.0), CMplot(v.4.5.1), dplyr(v.1.1.4), ggplot2(v.4.0.0), hpgltools(v.1.2) and Heatplus(v.3.16.0)
loaded via a namespace (and not attached): splines(v.4.5.0), later(v.1.4.3), BiocIO(v.1.18.0), bitops(v.1.0-9), filelock(v.1.0.3), tibble(v.3.3.0), R.oo(v.1.27.1), graph(v.1.86.0), XML(v.3.99-0.19), rpart(v.4.1.24), lifecycle(v.1.0.4), httr2(v.1.2.1), lattice(v.0.22-7), ensembldb(v.2.32.0), OrganismDbi(v.1.50.0), backports(v.1.5.0), magrittr(v.2.0.4), openxlsx(v.4.2.8), Hmisc(v.5.2-4), plotly(v.4.11.0), sass(v.0.4.10), rmarkdown(v.2.29), jquerylib(v.0.1.4), yaml(v.2.3.10), httpuv(v.1.6.16), zip(v.2.3.3), cowplot(v.1.2.0), DBI(v.1.2.3), RColorBrewer(v.1.1-3), abind(v.1.4-8), purrr(v.1.1.0), R.utils(v.2.13.0), AnnotationFilter(v.1.32.0), biovizBase(v.1.56.0), RCurl(v.1.98-1.17), yulab.utils(v.0.2.1), nnet(v.7.3-20), VariantAnnotation(v.1.54.1), rappdirs(v.0.3.3), GenomeInfoDbData(v.1.2.14), annotate(v.1.86.1), codetools(v.0.2-20), DelayedArray(v.0.34.1), DOSE(v.4.2.0), xml2(v.1.4.0), tidyselect(v.1.2.1), UCSC.utils(v.1.4.0), farver(v.2.1.2), matrixStats(v.1.5.0), BiocFileCache(v.2.16.1), base64enc(v.0.1-3), GenomicAlignments(v.1.44.0), jsonlite(v.2.0.0), Formula(v.1.2-5), iterators(v.1.0.14), foreach(v.1.5.2), tools(v.4.5.0), progress(v.1.2.3), Rcpp(v.1.1.0), gridExtra(v.2.3), SparseArray(v.1.8.1), xfun(v.0.53), qvalue(v.2.40.0), MatrixGenerics(v.1.20.0), withr(v.3.0.2), BiocManager(v.1.30.26), fastmap(v.1.2.0), GGally(v.2.4.0), digest(v.0.6.37), R6(v.2.6.1), mime(v.0.13), colorspace(v.2.1-2), GO.db(v.3.21.0), dichromat(v.2.0-0.1), biomaRt(v.2.64.0), RSQLite(v.2.4.3), R.methodsS3(v.1.8.2), tidyr(v.1.3.1), data.table(v.1.17.8), rtracklayer(v.1.68.0), prettyunits(v.1.2.0), httr(v.1.4.7), htmlwidgets(v.1.6.4), S4Arrays(v.1.8.1), ggstats(v.0.11.0), pkgconfig(v.2.0.3), gtable(v.0.3.6), blob(v.1.2.4), S7(v.0.2.0), XVector(v.0.48.0), htmltools(v.0.5.8.1), fgsea(v.1.34.2), RBGL(v.1.84.0), GSEABase(v.1.70.0), ProtGenerics(v.1.40.0), scales(v.1.4.0), Biobase(v.2.68.0), png(v.0.1-8), knitr(v.1.50), rstudioapi(v.0.17.1), reshape2(v.1.4.4), rjson(v.0.2.23), checkmate(v.2.3.3), curl(v.7.0.0), cachem(v.1.1.0), stringr(v.1.5.1), parallel(v.4.5.0), foreign(v.0.8-90), AnnotationDbi(v.1.70.0), restfulr(v.0.0.16), pillar(v.1.11.0), grid(v.4.5.0), vctrs(v.0.6.5), promises(v.1.3.3), dbplyr(v.2.5.0), xtable(v.1.8-4), cluster(v.2.1.8.1), htmlTable(v.2.4.3), evaluate(v.1.0.4), GenomicFeatures(v.1.60.0), cli(v.3.6.5), compiler(v.4.5.0), Rsamtools(v.2.24.0), rlang(v.1.1.6), crayon(v.1.5.3), plyr(v.1.8.9), fs(v.1.6.6), pander(v.0.6.6), stringi(v.1.8.7), viridisLite(v.0.4.2), BiocParallel(v.1.42.1), txdbmaker(v.1.4.2), Biostrings(v.2.76.0), lazyeval(v.0.2.2), GOSemSim(v.2.34.0), Matrix(v.1.7-3), BSgenome(v.1.76.0), hms(v.1.1.3), bit64(v.4.6.0-1), KEGGREST(v.1.48.1), shiny(v.1.11.1), SummarizedExperiment(v.1.38.1), broom(v.1.0.10), memoise(v.2.0.1), bslib(v.0.9.0), fastmatch(v.1.1-6) and bit(v.4.6.0)
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 5043179aa73cd71040a7ba82276b0cf26cc661bd
## This is hpgltools commit: Mon Oct 6 12:01:25 2025 -0400: 5043179aa73cd71040a7ba82276b0cf26cc661bd