Moving all of the visualization and diagnostic tasks to this document. The metadata and gene annotation data collection tasks are therefore in tmrc3_data_structures.Rmd. The reasons for some of the data structure creation in that document is made clear here.
Thus the lesion size is the more inclusive metric, but potentially ulcer size is more informative? Any inflammation in the skin causes the person to be defined as failure.
These samples are from patients who either successfully cleared a Leishmania panamensis infection following treatment, or did not. They include biopsies from each patient along with purifications for Monocytes, Neutrophils, and Eosinophils. When possible, this process was repeated over three visits; but some patients did not return for the second or third visit.
The over-arching goal is to look for attributes(most likely genes) which distinguish patients who do and do not cure the infection after treatment. If possible, these will be apparent on the first visit.
plot_legend(hs_expt)## The colors used in the expressionset are: #7570B3, #1B9E77, #D95F02.
plot_nonzero(hs_expt)## The following samples have less than 12968.8 genes.
## [1] "TMRC30010" "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30050" "TMRC30056"
## [7] "TMRC30052" "TMRC30058" "TMRC30031" "TMRC30038" "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 210 samples.
## These samples have an average 45.18 CPM coverage and 14385 genes observed, ranging from 7647 to
## 16739.
## Warning: ggrepel: 194 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
The following plot is essentially identical to the previous with two exceptions:
plot_nonzero(tc_valid, plot_labels = FALSE)## The following samples have less than 12968.8 genes.
## [1] "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30056" "TMRC30058" "TMRC30031"
## [7] "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Not putting labels on the plot.
##
## A non-zero genes plot of 184 samples.
## These samples have an average 48.79 CPM coverage and 14466 genes observed, ranging from 11448 to
## 16739.
Maria Adelaida’s quote: “I would like one picture of all samples including the miltefosine so that I can keep in my mind why we removed them.”
The following block will illustrate why we chose to remove the samples which were treated with miltefosine. The short reason: too few samples. The slightly longer reason: miltefosine has a different mode of action.
tc_expt_norm <- normalize_expt(hs_expt, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2") %>%
set_expt_batches(fact = "drug")## Removing 5168 low-count genes (14784 remaining).
## transform_counts: Found 858 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## antimony miltefosine
## 202 8
tc_expt_drug_pca <- plot_pca(tc_expt_norm, cis = NULL)
tc_expt_drug_pca <- plot_pca(tc_expt_norm)
tc_expt_drug_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure, lost
## Shapes are defined by antimony, miltefosine.
tc_expt_nb <- normalize_expt(hs_expt, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq") %>%
set_expt_batches(fact = "drug")## Removing 5168 low-count genes (14784 remaining).
## Setting 35726 low elements to zero.
## transform_counts: Found 35726 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## antimony miltefosine
## 202 8
tc_expt_drug_nb_pca <- plot_pca(tc_expt_nb)
tc_expt_drug_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure, lost
## Shapes are defined by antimony, miltefosine.
t_expt_drug <- subset_expt(hs_expt, subset = "clinic=='tumaco'")## The samples excluded are
## subset_expt(): There were 210, now there are 143 samples.
t_expt_norm <- normalize_expt(t_expt_drug, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2") %>%
set_expt_batches(fact = "drug")## Removing 5698 low-count genes (14254 remaining).
## transform_counts: Found 388 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## antimony miltefosine
## 135 8
t_expt_drug_pca <- plot_pca(t_expt_norm)
t_expt_drug_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure, lost
## Shapes are defined by antimony, miltefosine.
t_expt_nb <- normalize_expt(t_expt_drug, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq") %>%
set_expt_batches(fact = "drug")## Removing 5698 low-count genes (14254 remaining).
## Setting 18887 low elements to zero.
## transform_counts: Found 18887 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## antimony miltefosine
## 135 8
t_expt_drug_nb_pca <- plot_pca(t_expt_nb)
t_expt_drug_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure, lost
## Shapes are defined by antimony, miltefosine.
The sets of samples used to visualize the data will also comprise the sets used when later performing the various differential expression analyses.
Start out with some initial metrics of all samples. The most obvious are plots of the numbers of non-zero genes observed, heatmaps showing the relative relationships among the samples, the relative library sizes, and some PCA. It might be smart to split the library sizes up across subsets of the data, because they have expanded too far to see well on a computer screen.
The most likely factors to query when considering the entire dataset are cure/fail, visit, and cell type. This is the level at which we will choose samples to exclude from future analyses.
plot_legend(tc_biopsies)## The colors used in the expressionset are: #1B9E77, #7670B3, #E7298A.
plot_libsize(tc_biopsies)## Library sizes of 18 samples,
## ranging from 3,592,709 to 35,274,577.
plot_nonzero(tc_biopsies)## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 18 samples.
## These samples have an average 14.35 CPM coverage and 15702 genes observed, ranging from 15246 to
## 16366.
plot_libsize_prepost attempts to provide an idea about how much data is lost when low-count filtering the data.
The first plot it produces is a barplot of the number of reads removed by the filter from each sample. The second plot has two bars, the top bar is labeled with the number of low-count genes before the filter. The lower bar represents the number after the filter and is assumed to be quite low.
biopsy_prepost <- plot_libsize_prepost(tc_biopsies)
biopsy_prepost## A comparison of the counts before and after filtering.
## The number of genes with low coverage changes by NA-NA genes.
## Warning: Using alpha for a discrete variable is not advised.
#biopsy_prepost$count_plot
#biopsy_prepost$lowgene_plot
## Minimum number of biopsy genes: ~ 14,000
plot_libsize(tc_eosinophils)## Library sizes of 41 samples,
## ranging from 7,223,543 to 252,496,897.
plot_nonzero(tc_eosinophils)## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 41 samples.
## These samples have an average 51.77 CPM coverage and 14599 genes observed, ranging from 13052 to
## 16739.
## Warning: ggrepel: 25 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
eosinophil_prepost <- plot_libsize_prepost(tc_eosinophils)
eosinophil_prepost[["count_plot"]]eosinophil_prepost[["lowgene_plot"]]## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of eosinophil genes: ~ 13,500
plot_libsize(tc_monocytes)## Library sizes of 63 samples,
## ranging from 2,922,176 to 260,933,745.
plot_nonzero(tc_monocytes)## The following samples have less than 12968.8 genes.
## [1] "TMRC30056"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 63 samples.
## These samples have an average 51.28 CPM coverage and 14542 genes observed, ranging from 11448 to
## 16512.
## Warning: ggrepel: 47 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
monocyte_prepost <- plot_libsize_prepost(tc_monocytes)
monocyte_prepost[["count_plot"]]monocyte_prepost[["lowgene_plot"]]## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of monocyte genes: ~ 7,500 before setting the minimum.
plot_libsize(tc_neutrophils)## Library sizes of 62 samples,
## ranging from 4,642,715 to 224,886,922.
plot_nonzero(tc_neutrophils)## The following samples have less than 12968.8 genes.
## [1] "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30058" "TMRC30031" "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 62 samples.
## These samples have an average 54.28 CPM coverage and 13941 genes observed, ranging from 11759 to
## 16401.
## Warning: ggrepel: 38 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
neutrophil_prepost <- plot_libsize_prepost(tc_neutrophils)
neutrophil_prepost[["count_plot"]]neutrophil_prepost[["lowgene_plot"]]## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of neutrophil genes: ~ 10,000 before setting minimum coverage.The above block just repeats the same two plots on a per-celltype basis: the number of reads observed / sample and a plot of observed genes with respect to coverage. I made some comments with my observations about the number of genes.
Now that those ‘global’ metrics are out of the way, lets look at some global metrics of the data following normalization; the most likely plots are of course PCA but also a couple of heatmaps.
Over time the preference for which samples to include in a ‘global’ PCA has changed, as well as preferences for how to arrange/label/color them. The following shows a couple of perspectives.
tc_type <- set_expt_conditions(tc_valid, fact = "typeofcells") %>%
set_expt_batches(fact = "finaloutcome") %>%
set_expt_colors(color_choices[["type"]])## The numbers of samples by condition are:
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
## The number of samples by batch are:
##
## cure failure
## 122 62
tc_norm <- sm(normalize_expt(tc_type, transform = "log2", norm = "quant",
convert = "cpm", filter = TRUE))
tc_pca <- plot_pca(tc_norm, plot_labels = FALSE,
plot_title = "PCA - Cell type", size_column = "visitnumber")
pp(file = "figures/tc_pca_sized.pdf")
tc_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
dev.off()## png
## 2
tc_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
tc_pca <- plot_pca(tc_norm, plot_labels = FALSE,
plot_title = "PCA - Cell type")
pp(file = "figures/tc_pca_nosize.pdf")
tc_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
dev.off()## png
## 2
tc_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
write.csv(tc_pca[["table"]], file = "excel/tc_donor_pca_coords.csv")
tc_cf_norm <- set_expt_batches(tc_norm, fact = "visitnumber")## The number of samples by batch are:
##
## 3 2 1
## 51 50 83
tc_cf_corheat <- plot_corheat(tc_cf_norm, plot_title = "Heirarchical clustering:
cell types")
tc_cf_corheat## A heatmap of pairwise sample correlations ranging from:
## 0.51183235766239 to 0.998356314777661.
tc_cf_disheat <- plot_disheat(tc_cf_norm, plot_title = "Heirarchical clustering:
cell types")
tc_cf_disheat## A heatmap of pairwise sample distances ranging from:
## 18.6829431312211 to 322.033876692148.
A potential figure legend for the following images might include:
The observed counts per gene for all of the clinical samples were filtered, log transformed, cpm converted, and quantile normalized. The colors were defined by cell types and shapes by patient visit. When the first two principle components were plotted, clustering was observed by cell type. The biopsy samples were significantly different from the innate immune cell types.
fig1v2_norm <- normalize_expt(tc_type, transform = "log2",
convert = "cpm", norm = "quant", filter = TRUE)## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
fig1v2_pca <- plot_pca(fig1v2_norm, cis = FALSE)
pp(file = "figures/tc_type_v2.pdf")
fig1v2_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
dev.off()## png
## 2
fig1v2_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by biopsy, eosinophils, monocytes, neutrophils
## Shapes are defined by cure, failure.
Spoiler alert: This section will eventually suggest pretty strongly that we will not easily be able to use the Cali samples. Thus, after finishing it, we will likely exclude those samples.
Take a moment to view the biopsy samples. We separated them by clinic (Cali or Tumaco), and this view of the samples is the only one which does not suggest a strong difference between the two clinics. However, it also suggests that the biopsy samples will not prove very helpful.
There are too few biopsy samples to get a strong view of cure/fail. In addition, these are ‘messier’ than any other sample type. As a result, it is difficult to discern a pattern in them which help elucidate cure vs. fail. If we play with the various parameters used to perform the count modification via ruv/sva, we get slightly different views, some more evocative than others; but the following is our most canonical view.
tc_biopsies_norm <- normalize_expt(tc_biopsies, transform = "log2",
convert = "cpm", norm = "quant", filter = TRUE)## Removing 6337 low-count genes (13615 remaining).
## transform_counts: Found 206 values equal to 0, adding 1 to the matrix.
tc_biopsies_pca <- plot_pca(tc_biopsies_norm)
tc_biopsies_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_cure, tumaco_cure, tumaco_failure
## Shapes are defined by 1.
tc_biopsies_nb <- normalize_expt(tc_biopsies, transform = "log2",
convert = "cpm", batch = "svaseq", filter = TRUE)## Removing 6337 low-count genes (13615 remaining).
## Setting 289 low elements to zero.
## transform_counts: Found 289 values equal to 0, adding 1 to the matrix.
tc_biopsies_nb_pca <- plot_pca(tc_biopsies_nb)
pp(file = "figures/figure3E_biopsies.svg")
tc_biopsies_nb_pca$plot
dev.off()## png
## 2
tc_biopsies_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_cure, tumaco_cure, tumaco_failure
## Shapes are defined by 1.
I worry that we rely too heavily on PCA.
How strong is the effect of ethnicity/ethnicity+clinic? In the worst case scenario, these surrogates could make interpreting the results problematic. The following blocks will explore that question a little and I think come to the general conclusion that race and/or clinic are not significant problems.
Compared to the cell type effect, clinic/race is, as we already know, utterly insignificant. The question still stands, how significant? There does appear to be an effect in the data which is relevant to race. I think if we want to be able to explore this fully, we would need more people.
etnia_expt <- set_expt_conditions(tc_valid, fact = "clinic_etnia") %>%
set_expt_colors(color_choices[["clinic_etnia"]])## The numbers of samples by condition are:
##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 15 27 19 76 19
## tumaco_mestiza
## 28
etnia_norm <- normalize_expt(etnia_expt, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 1, 2, 3.
etnia_nb <- normalize_expt(etnia_expt, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 5654 low-count genes (14298 remaining).
## Setting 26479 low elements to zero.
## transform_counts: Found 26479 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 1, 2, 3.
There is an imbalance in the identity of people who attended each clinic. Given that we are focusing on the Tumaco samples, here is the distribution of race/cell type:
t_etnia_norm <- normalize_expt(t_etnia_expt, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 5796 low-count genes (14156 remaining).
## transform_counts: Found 299 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
t_etnia_nb <- normalize_expt(t_etnia_expt, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 5796 low-count genes (14156 remaining).
## Setting 15870 low elements to zero.
## transform_counts: Found 15870 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
The biopsy samples are missing people of indigenous origin who went to the Tumaco clinic.
tc_bp_ec <- set_expt_conditions(tc_biopsies, fact = "clinic_etnia") %>%
set_expt_colors(color_choices[["clinic_etnia"]])## The numbers of samples by condition are:
##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 1 1 2 8 2
## tumaco_mestiza
## 4
etnia_bp_norm <- normalize_expt(tc_bp_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 6337 low-count genes (13615 remaining).
## transform_counts: Found 206 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_bp_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 1.
The biopsy samples are by far the ‘messiest,’ that remains true when considering the ethnicity of the individual patients.
t_bp_ec <- set_expt_conditions(tc_biopsies, fact = "etnia") %>%
set_expt_colors(color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 9 3 6
t_etnia_bp_norm <- normalize_expt(t_bp_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 6337 low-count genes (13615 remaining).
## transform_counts: Found 206 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_bp_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 1.
I think there are not enough samples to try sva with this.
When we ask the same question of the clinical cell types, it is possible to see more samples, but not a significantly clearer view of the race effect on the transcriptional profile.
tc_eo_ec <- set_expt_conditions(tc_eosinophils, fact = "clinic_etnia") %>%
set_expt_colors(color_choices[["clinic_etnia"]])## The numbers of samples by condition are:
##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 2 8 5 14 5
## tumaco_mestiza
## 7
etnia_eo_norm <- normalize_expt(tc_eo_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 9085 low-count genes (10867 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_eo_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
etnia_eo_nb <- normalize_expt(tc_eo_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 9085 low-count genes (10867 remaining).
## Setting 1079 low elements to zero.
## transform_counts: Found 1079 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_eo_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
The eosinophils are our least-abundant cell type, as such the view of ethnicity using them is particularly problematic; but we do at least have a few samples from each group. With that in mind, these appear to show some significant difference among the three groups.
t_eo_ec <- set_expt_conditions(t_eosinophils, fact = "etnia") %>%
set_expt_colors(color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 14 5 7
t_etnia_eo_norm <- normalize_expt(t_eo_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 9420 low-count genes (10532 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_eo_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
t_etnia_eo_nb <- normalize_expt(t_eo_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 9420 low-count genes (10532 remaining).
## Setting 326 low elements to zero.
## transform_counts: Found 326 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_eo_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
In general, the monocytes show the strongest differences in any comparison we have performed. This is true in the context of race as well. Thus, even before applying sva, we see some separation among the monocyte samples with respect to ethnicity.
tc_mo_ec <- set_expt_conditions(tc_monocytes, fact = "clinic_etnia") %>%
set_expt_colors(color_choices[["clinic_etnia"]])## The numbers of samples by condition are:
##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 6 9 6 27 6
## tumaco_mestiza
## 9
etnia_mo_norm <- normalize_expt(tc_mo_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 8844 low-count genes (11108 remaining).
## transform_counts: Found 12 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_mo_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
etnia_mo_nb <- normalize_expt(tc_mo_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 8844 low-count genes (11108 remaining).
## Setting 1590 low elements to zero.
## transform_counts: Found 1590 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_mo_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
The ability to see some separation by ethnicity among the monocyte samples remains, at least slightly, true when considering only the Tumaco samples.
t_mo_ec <- set_expt_conditions(t_monocytes, fact = "etnia") %>%
set_expt_colors(color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 27 6 9
t_etnia_mo_norm <- normalize_expt(t_mo_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 9090 low-count genes (10862 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_mo_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
t_etnia_mo_nb <- normalize_expt(t_mo_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 9090 low-count genes (10862 remaining).
## Setting 765 low elements to zero.
## transform_counts: Found 765 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_mo_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
In a fashion similar to our other effects, the neutrophils are intermediate.
tc_ne_ec <- set_expt_conditions(tc_neutrophils, fact = "clinic_etnia") %>%
set_expt_colors(color_choices[["clinic_etnia"]])## The numbers of samples by condition are:
##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 6 9 6 27 6
## tumaco_mestiza
## 8
etnia_ne_norm <- normalize_expt(tc_ne_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 10708 low-count genes (9244 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_ne_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
etnia_ne_nb <- normalize_expt(tc_ne_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 10708 low-count genes (9244 remaining).
## Setting 1628 low elements to zero.
## transform_counts: Found 1628 values equal to 0, adding 1 to the matrix.
plot_pca(etnia_ne_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_afrocol, cali_indigena, cali_mestiza, tumaco_afrocol, tumaco_indigena, tumaco_mestiza
## Shapes are defined by 3, 2, 1.
The Tumaco-only neutrophils are something of a counter example to the previous statement. The easiest to discern race-effect appears to me to come from the Neutrophils from Tumaco.
t_ne_ec <- set_expt_conditions(t_neutrophils, fact = "etnia") %>%
set_expt_colors(color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 27 6 8
t_etnia_ne_norm <- normalize_expt(t_ne_ec, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 10851 low-count genes (9101 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_ne_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
t_etnia_ne_nb <- normalize_expt(t_ne_ec, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 10851 low-count genes (9101 remaining).
## Setting 823 low elements to zero.
## transform_counts: Found 823 values equal to 0, adding 1 to the matrix.
plot_pca(t_etnia_ne_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by afrocol, indigena, mestiza
## Shapes are defined by 3, 2, 1.
The imbalances observed with respect to clinic/race are significantly less profound than those observed with respect to the sex of patients who participated in the study. It is almost certainly possible to see some degree of a sex-based effect in the available transcriptomes.
sex_expt <- set_expt_conditions(tc_valid, fact = "sex") %>%
set_expt_colors(color_choices[["sex"]])## The numbers of samples by condition are:
##
## female male
## 28 156
sex_norm <- normalize_expt(sex_expt, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
plot_pca(sex_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by female, male
## Shapes are defined by 1, 2, 3.
sex_nb <- normalize_expt(sex_expt, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 5654 low-count genes (14298 remaining).
## Setting 26368 low elements to zero.
## transform_counts: Found 26368 values equal to 0, adding 1 to the matrix.
plot_pca(sex_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by female, male
## Shapes are defined by 1, 2, 3.
clinic_sex_expt <- set_expt_conditions(tc_valid, fact = "clinic_sex") %>%
set_expt_colors(color_choices[["clinic_sex"]])## The numbers of samples by condition are:
##
## cali_female cali_male tumaco_female tumaco_male
## 6 55 22 101
clinic_sex_norm <- normalize_expt(clinic_sex_expt, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
plot_pca(clinic_sex_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_female, cali_male, tumaco_female, tumaco_male
## Shapes are defined by 1, 2, 3.
clinic_sex_nb <- normalize_expt(clinic_sex_expt, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 5654 low-count genes (14298 remaining).
## Setting 29063 low elements to zero.
## transform_counts: Found 29063 values equal to 0, adding 1 to the matrix.
plot_pca(clinic_sex_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_female, cali_male, tumaco_female, tumaco_male
## Shapes are defined by 1, 2, 3.
tc_bp_sc <- set_expt_conditions(tc_biopsies, fact = "clinic_sex") %>%
set_expt_colors(color_choices[["clinic_sex"]])## The numbers of samples by condition are:
##
## cali_male tumaco_female tumaco_male
## 4 3 11
## Warning in set_expt_colors(., color_choices[["clinic_sex"]]): Colors for the
## following categories are not being used: cali_female.
clinic_sex_bp_norm <- normalize_expt(tc_bp_sc, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 6337 low-count genes (13615 remaining).
## transform_counts: Found 206 values equal to 0, adding 1 to the matrix.
plot_pca(clinic_sex_bp_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_male, tumaco_female, tumaco_male
## Shapes are defined by 1.
I think there are not enough samples to try sva with this.
tc_eo_sc <- set_expt_conditions(tc_eosinophils, fact = "clinic_sex") %>%
set_expt_colors(color_choices[["clinic_sex"]])
clinic_sex_eo_norm <- normalize_expt(tc_eo_sc, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")
plot_pca(clinic_sex_eo_norm)
tc_mo_clinic_sex <- set_expt_conditions(tc_monocytes, fact = "clinic_sex") %>%
set_expt_colors(color_choices[["clinic_sex"]])## The numbers of samples by condition are:
##
## cali_female cali_male tumaco_female tumaco_male
## 2 19 7 35
tc_mo_clinic_sex_norm <- normalize_expt(tc_mo_clinic_sex, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 8844 low-count genes (11108 remaining).
## transform_counts: Found 12 values equal to 0, adding 1 to the matrix.
plot_pca(tc_mo_clinic_sex_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_female, cali_male, tumaco_female, tumaco_male
## Shapes are defined by 3, 2, 1.
tc_mo_clinic_sex_nb <- normalize_expt(tc_mo_clinic_sex, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 8844 low-count genes (11108 remaining).
## Setting 1425 low elements to zero.
## transform_counts: Found 1425 values equal to 0, adding 1 to the matrix.
plot_pca(tc_mo_clinic_sex_nb)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_female, cali_male, tumaco_female, tumaco_male
## Shapes are defined by 3, 2, 1.
tc_ne_clinic_sex <- set_expt_conditions(tc_neutrophils, fact = "clinic_sex") %>%
set_expt_colors(color_choices[["clinic_sex"]])## The numbers of samples by condition are:
##
## cali_female cali_male tumaco_female tumaco_male
## 2 19 7 34
tc_ne_clinic_sex_norm <- normalize_expt(tc_ne_clinic_sex, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")## Removing 10708 low-count genes (9244 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(tc_ne_clinic_sex_norm)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_female, cali_male, tumaco_female, tumaco_male
## Shapes are defined by 3, 2, 1.
In contrast, the Eosinophil samples do have significant amounts of variance which discriminates the two clinics. At the time of this writing, there are fewer eosinophil samples than monocytes and neutrophils; as a result there are no samples which failed from Cali. This is somewhat limiting is we wish to look for differences between the cure and fail samples which came from the two clinics.
tc_eosinophils_norm <- normalize_expt(tc_eosinophils, transform = "log2",
convert = "cpm", norm = "quant", filter = TRUE)## Removing 9085 low-count genes (10867 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
tc_eosinophils_pca <- plot_pca(tc_eosinophils_norm, plot_labels = FALSE)
tc_eosinophils_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_cure, tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
tc_eosinophils_nb <- normalize_expt(tc_eosinophils, transform = "log2",
convert = "cpm", batch = "svaseq", filter = TRUE)## Removing 9085 low-count genes (10867 remaining).
## Setting 1048 low elements to zero.
## transform_counts: Found 1048 values equal to 0, adding 1 to the matrix.
tc_eosinophils_nb_pca <- plot_pca(tc_eosinophils_nb, plot_labels = FALSE)
pp(file = "figures/figure3B_eosinophils.svg")
tc_eosinophils_nb_pca$plot
dev.off()## png
## 2
tc_eosinophils_nb_pca$plotIn contrast with the eosinophil samples, we have one patient’s monocyte and neutrophil samples which did not cure. As we will see, there is one person from Cali who did not cure, this person is not different with respect to tracscriptome than the other people from Cali.
tc_monocytes_norm <- normalize_expt(tc_monocytes, transform = "log2",
convert = "cpm", norm = "quant", filter = TRUE)## Removing 8844 low-count genes (11108 remaining).
## transform_counts: Found 12 values equal to 0, adding 1 to the matrix.
tc_monocytes_pca <- plot_pca(tc_monocytes_norm, plot_labels = FALSE)
tc_monocytes_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_cure, cali_failure, tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
tc_monocytes_nb <- normalize_expt(tc_monocytes, transform = "log2",
convert = "cpm", batch = "svaseq", filter = TRUE)## Removing 8844 low-count genes (11108 remaining).
## Setting 1455 low elements to zero.
## transform_counts: Found 1455 values equal to 0, adding 1 to the matrix.
tc_monocytes_nb_pca <- plot_pca(tc_monocytes_nb, plot_labels = FALSE)
pp(file = "figures/figure3C_monocytes.svg")
tc_monocytes_nb_pca$plot
dev.off()## png
## 2
tc_monocytes_nb_pca$plotFinally, that same one person does appear to be different than the others from Cali when looking at neutrophils.
tc_neutrophils_norm <- normalize_expt(tc_neutrophils, transform = "log2",
convert = "cpm", norm = "quant", filter = TRUE)## Removing 10708 low-count genes (9244 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
tc_neutrophils_pca <- plot_pca(tc_neutrophils_norm, plot_labels = FALSE)
tc_neutrophils_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali_cure, cali_failure, tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
tc_neutrophils_nb <- normalize_expt(tc_neutrophils, transform = "log2",
convert = "cpm", batch = "svaseq", filter = TRUE)## Removing 10708 low-count genes (9244 remaining).
## Setting 1539 low elements to zero.
## transform_counts: Found 1539 values equal to 0, adding 1 to the matrix.
tc_neutrophils_nb_pca <- plot_pca(tc_neutrophils_nb, plot_labels = FALSE)
pp(file = "figures/figure3D_neutrophils.svg")
tc_neutrophils_nb_pca$plot
dev.off()## png
## 2
tc_neutrophils_nb_pca$plotNow that we have these various subsets, perform an explicit comparison of the samples which came from the two clinics.
tc_clinic_type <- tc_valid %>%
set_expt_conditions(fact = "clinic") %>%
set_expt_batches(fact = "typeofcells")## The numbers of samples by condition are:
##
## cali tumaco
## 61 123
## The number of samples by batch are:
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
tc_clinic_type_norm <- normalize_expt(tc_clinic_type, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
tc_clinic_type_pca <- plot_pca(tc_clinic_type_norm)
tc_clinic_type_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cali, tumaco
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
tc_clinic_type_nb <- normalize_expt(tc_clinic_type, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 5654 low-count genes (14298 remaining).
## Setting 31394 low elements to zero.
## transform_counts: Found 31394 values equal to 0, adding 1 to the matrix.
tc_clinic_type_nb_pca <- plot_pca(tc_clinic_type_nb)
tc_clinic_type_nb_pca$plotpp(file = "figures/figure3a_all_samples.svg")
tc_clinic_type_nb_pca$plot
dev.off()## png
## 2
tc_clinical_norm <- sm(normalize_expt(tc_clinical, filter = "simple", transform = "log2",
norm = "quant", convert = "cpm"))
clinical_pca <- plot_pca(tc_clinical_norm, plot_labels = FALSE,
cis = NULL,
plot_title = "PCA - clinical samples")
clinical_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
tc_clinical_nb <- normalize_expt(tc_clinical, filter = "simple", transform = "log2",
batch = "svaseq", convert = "cpm")## Removing 1881 low-count genes (18071 remaining).
## Setting 157339 low elements to zero.
## transform_counts: Found 157339 values equal to 0, adding 1 to the matrix.
tc_clinical_nb_pca <- plot_pca(tc_clinical_nb)
tc_clinical_nb_pca$plotclinical_pca_info <- pca_information(
tc_clinical_norm, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells", "finaloutcome",
"clinic", "donor"))
clinical_pca_info$anova_neglogp_heatmapclinical_pca_info$pca_plots$PC4_PC7## Warning: ggrepel: 113 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
clinical_scores <- pca_highscores(tc_clinical_norm)
clinical_scores[["highest"]][,"Comp.4"]## [1] "15.73:ENSG00000168329" "14.97:ENSG00000133574" "14.03:ENSG00000204389"
## [4] "14.02:ENSG00000171115" "13.9:ENSG00000163563" "13.47:ENSG00000179144"
## [7] "13.18:ENSG00000004799" "13.12:ENSG00000180871" "13:ENSG00000172086"
## [10] "12.77:ENSG00000091106" "12.62:ENSG00000121858" "12.37:ENSG00000123405"
## [13] "12.36:ENSG00000175538" "12.04:ENSG00000138449" "12.02:ENSG00000109971"
## [16] "11.84:ENSG00000165118" "11.6:ENSG00000088986" "11.59:ENSG00000135828"
## [19] "11.38:ENSG00000038274" "11.17:ENSG00000130150"
test_factors <- c("visitnumber", "typeofcells", "finaloutcome",
"clinic", "sex", "etnia")
clinical_varpart <- simple_varpart(tc_clinical, factors = test_factors)## Subsetting on features.
## remove_genes_expt(), before removal, there were 18071 genes, now there are 17909.
clinical_varpart## The result of using variancePartition with the model:
## ~ visitnumber + typeofcells + finaloutcome + clinic + sex + etnia
Another way to explore the effect of SVA is to iteratively increase the number of SVs removed by it and look at some simple plots of the resulting data. Ideally, this should complement the comparison of individual SVs vs. PCs performed by Theresa (spoiler alert, I think it did).
first <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 1)## Removing 5654 low-count genes (14298 remaining).
## Setting 193257 low elements to zero.
## transform_counts: Found 193257 values equal to 0, adding 1 to the matrix.
first_info <- pca_information(
first, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
first_info$anova_neglogp_heatmapfirst_info$pca_plots[["PC1_PC2"]]## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning: ggrepel: 175 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
second <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 2) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 31359 low elements to zero.
## transform_counts: Found 31359 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
second_info <- pca_information(
second, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
second_info$anova_neglogp_heatmapthird <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 3) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 27378 low elements to zero.
## transform_counts: Found 27378 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
third_info <- pca_information(
third, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
third_info$anova_neglogp_heatmapfourth <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 4) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 26043 low elements to zero.
## transform_counts: Found 26043 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
fourth_info <- pca_information(
fourth, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
fourth_info$anova_neglogp_heatmapfourth_info[["pca_plots"]][["PC1_PC2"]]## Warning: ggrepel: 107 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fifth <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 5) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 27144 low elements to zero.
## transform_counts: Found 27144 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
fifth_info <- pca_information(
fifth, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
fifth_info$anova_neglogp_heatmapfifth_info[["pca_plots"]][["PC1_PC12"]]## Warning: ggrepel: 106 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
sixth <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch="svaseq", surrogates = 6) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 24054 low elements to zero.
## transform_counts: Found 24054 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
sixth_info <- pca_information(
sixth, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
sixth_info$anova_neglogp_heatmapseventh <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 7) %>%
set_expt_batches(fact = "clinic")## Removing 5654 low-count genes (14298 remaining).
## Setting 24579 low elements to zero.
## transform_counts: Found 24579 values equal to 0, adding 1 to the matrix.
## The number of samples by batch are:
##
## cali tumaco
## 61 123
seventh_info <- pca_information(
seventh, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
seventh_info$anova_neglogp_heatmapeighth <- normalize_expt(tc_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq", surrogates = 8)## Removing 5654 low-count genes (14298 remaining).
## Setting 24194 low elements to zero.
## transform_counts: Found 24194 values equal to 0, adding 1 to the matrix.
eighth_info <- pca_information(
eighth, plot_pcas = TRUE, num_components = 30,
expt_factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
eighth_info$anova_neglogp_heatmapvariancePartition (Hoffman and Schadt (2016)) provides a nice toolbox of methods to examine the relationship between various metadata factors in a dataset with respect to the variance observed in the dataset’s expression. We usually use it as a quick way to see the relative likelihood that a differential expression of various factors will provide useful/helpful output.
## Mostly running twice to make sure that reordering the factors does not affect the end result.
tc_varpart <- simple_varpart(
tc_clinical_nobiop, factors = c("visitnumber", "typeofcells",
"finaloutcome", "clinic", "sex", "etnia"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
tc_varpart## The result of using variancePartition with the model:
## ~ visitnumber + typeofcells + finaloutcome + clinic + sex + etnia
tc_varpartv2 <- simple_varpart(
tc_clinical_nobiop, factors = c("donor", "visitnumber", "typeofcells",
"finaloutcome"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
pp(file = "images/tc_donor_visit_type_finaloutcome_varpart.pdf")
tc_varpartv2## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells + finaloutcome
dev.off()## png
## 2
tc_varpartv2## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells + finaloutcome
tc_varpartv3 <- simple_varpart(
tc_clinical_nobiop, factors = c("donor", "visitnumber", "typeofcells"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
pp(file = "images/tc_donor_visit_type_varpart.pdf")
tc_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
dev.off()## png
## 2
tc_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
tc_varpartv4 <- simple_varpart(
tc_clinical_nobiop, factors = c("finaloutcome", "sex", "Ethnicity"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
pp(file = "images/tc_final_sex_ethnicity_varpart.pdf")
tc_varpartv4## The result of using variancePartition with the model:
## ~ finaloutcome + sex + Ethnicity
dev.off()## png
## 2
tc_varpartv4## The result of using variancePartition with the model:
## ~ finaloutcome + sex + Ethnicity
t_varpartv3 <- simple_varpart(
t_clinical_nobiop, factors = c("donor", "visitnumber", "typeofcells"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17801 genes, now there are 17633.
pp(file = "images/t_donor_visit_type_varpart.pdf")
t_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
dev.off()## png
## 2
t_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
c_varpartv3 <- simple_varpart(
c_clinical, factors = c("donor", "visitnumber", "typeofcells"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17545 genes, now there are 16170.
pp(file = "images/c_donor_visit_type_varpart.pdf")
c_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
dev.off()## png
## 2
c_varpartv3## The result of using variancePartition with the model:
## ~ donor + visitnumber + typeofcells
Maria Adelaida asked about using variancePartition to query a few other factors in the Cali, Tumaco, and both datasets. These factors include: Sex, Age, Ethnicity, Clinic; and potentially Adherence, time of evolution, and previous diagnosis.
I am not sure if those factors are already in the expressionset metadata, but if not we can certainly bring them back. In the following block I will therefore repeat a simple variancePartition analysis using first the full dataset (Tumaco+Cali), then each clinic alone; in each instance I will do one round with sex, ethnicity, age, and clinic followed by the same and finaloutcome (as a reference point to something we are already looking at).
table(pData(tc_clinical_nobiop)$typeofcells)##
## eosinophils monocytes neutrophils
## 41 63 62
table(pData(t_clinical_nobiop)$typeofcells)##
## eosinophils monocytes neutrophils
## 26 42 41
table(pData(c_clinical_nobiop)$typeofcells)##
## eosinophils monocytes neutrophils
## 15 21 21
tc_fun_varpart <- simple_varpart(tc_clinical_nobiop,
factors = c("sex", "etnia", "Age", "clinic"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
pp(file = "images/tc_fun_varpart.pdf")
tc_fun_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + clinic
dev.off()## png
## 2
tc_fun_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + clinic
tc_fun_outcome_varpart <- simple_varpart(tc_clinical_nobiop,
factors = c("sex", "etnia", "Age", "clinic", "finaloutcome"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17871 genes, now there are 17730.
pp(file = "images/tc_fun_outcome_varpart.pdf")
tc_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + clinic + finaloutcome
dev.off()## png
## 2
tc_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + clinic + finaloutcome
c_fun_varpart <- simple_varpart(c_clinical_nobiop,
factors = c("sex", "etnia", "Age"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17142 genes, now there are 16146.
pp(file = "images/c_fun_varpart.pdf")
c_fun_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age
dev.off()## png
## 2
c_fun_outcome_varpart <- simple_varpart(c_clinical_nobiop,
factors = c("sex", "etnia", "Age", "finaloutcome"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17142 genes, now there are 16146.
pp(file = "images/c_fun_outcome_varpart.pdf")
c_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + finaloutcome
dev.off()## png
## 2
c_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + finaloutcome
t_fun_varpart <- simple_varpart(t_clinical_nobiop,
factors = c("sex", "etnia", "Age"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17801 genes, now there are 17633.
pp(file = "images/t_fun_varpart.pdf")
t_fun_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age
dev.off()## png
## 2
t_fun_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age
t_fun_outcome_varpart <- simple_varpart(t_clinical_nobiop,
factors = c("sex", "etnia", "Age", "finaloutcome"))## Subsetting on features.
## remove_genes_expt(), before removal, there were 17801 genes, now there are 17633.
pp(file = "images/t_fun_outcome_varpart.pdf")
t_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + finaloutcome
dev.off()## png
## 2
t_fun_outcome_varpart## The result of using variancePartition with the model:
## ~ sex + etnia + Age + finaloutcome
The following should be a nearly copy/pasted version of the above, but limited to the Tumaco samples.
t_clinical_nobiop_norm <- normalize_expt(t_clinical_nobiop, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2")## Removing 8042 low-count genes (11910 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
t_clinical_nobiop_pca <- plot_pca(t_clinical_nobiop_norm, plot_labels = FALSE)
pp(file = "figures/t_clinical_nobiop_figxxa.pdf")
t_clinical_nobiop_pca[["plot"]]
dev.off()## png
## 2
t_clinical_nobiop_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
t_clinical_nobiop_nb <- normalize_expt(t_clinical_nobiop, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")## Removing 8042 low-count genes (11910 remaining).
## Setting 9605 low elements to zero.
## transform_counts: Found 9605 values equal to 0, adding 1 to the matrix.
t_clinical_nobiop_nb_pca <- plot_pca(t_clinical_nobiop_nb, plot_labels = FALSE)
pp(file = "figures/t_clinical_nobiop_sva_figxxb.pdf")
t_clinical_nobiop_nb_pca[["plot"]]
dev.off()## png
## 2
t_clinical_nobiop_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
tc_clinical_nobiop_norm <- normalize_expt(tc_clinical_nobiop, filter = TRUE, norm = "quant",
convert = "cpm", transform = "log2")## Removing 7790 low-count genes (12162 remaining).
## transform_counts: Found 124 values equal to 0, adding 1 to the matrix.
tc_clinical_nobiop_pca <- plot_pca(tc_clinical_nobiop_norm, plot_labels = FALSE)
pp(file = "figures/tc_clinical_nobiop_figxxc.pdf")
tc_clinical_nobiop_pca[["plot"]]## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
dev.off()## png
## 2
tc_clinical_nobiop_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
tc_clinical_nobiop_nb <- normalize_expt(tc_clinical_nobiop, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")## Removing 7790 low-count genes (12162 remaining).
## Setting 17777 low elements to zero.
## transform_counts: Found 17777 values equal to 0, adding 1 to the matrix.
tc_clinical_nobiop_nb_pca <- plot_pca(tc_clinical_nobiop_nb, plot_labels = FALSE)
pp(file = "figures/tc_clinical_nobiop_sva_figxxd.pdf")
tc_clinical_nobiop_nb_pca[["plot"]]
dev.off()## png
## 2
tc_clinical_nobiop_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
Now we have a new, smaller set of primary samples which are categorized by cell type.
The biopsy samples remain basically impenetrable. I think it would be particularly nice if we could judge cure/fail from a visit 1 biopsy.
t_biopsies_norm <- normalize_expt(t_biopsies, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 6439 low-count genes (13513 remaining).
## transform_counts: Found 136 values equal to 0, adding 1 to the matrix.
t_biopsies_pca <- plot_pca(t_biopsies_norm,
plot_labels = FALSE)
t_biopsies_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 1.
t_biopsies_nb <- normalize_expt(t_biopsies, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 6439 low-count genes (13513 remaining).
## Setting 146 low elements to zero.
## transform_counts: Found 146 values equal to 0, adding 1 to the matrix.
t_biopsies_nb_pca <- plot_pca(t_biopsies_nb, plot_labels = FALSE)
t_biopsies_nb_pca$plotIn contrast, I suspect that we can get meaningful data from the other cell types. The monocyte samples are still a bit messy.
t_monocyte_norm <- normalize_expt(t_monocytes, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 9090 low-count genes (10862 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
t_monocyte_pca <- plot_pca(t_monocyte_norm,
plot_labels = FALSE)
t_monocyte_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
t_monocyte_nb <- normalize_expt(t_monocytes, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 9090 low-count genes (10862 remaining).
## Setting 736 low elements to zero.
## transform_counts: Found 736 values equal to 0, adding 1 to the matrix.
t_monocyte_nb_pca <- plot_pca(t_monocyte_nb, plot_labels = FALSE)
pp(file = "figures/figure4A_monocytes.svg")
t_monocyte_nb_pca$plot
dev.off()## png
## 2
t_monocyte_nb_pca$plott_neutrophil_norm <- normalize_expt(t_neutrophils, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 10851 low-count genes (9101 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_neutrophil_pca <- plot_pca(t_neutrophil_norm,
plot_labels = FALSE)
t_neutrophil_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
t_neutrophil_nb <- normalize_expt(t_neutrophils, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 10851 low-count genes (9101 remaining).
## Setting 754 low elements to zero.
## transform_counts: Found 754 values equal to 0, adding 1 to the matrix.
t_neutrophil_nb_pca <- plot_pca(t_neutrophil_nb, plot_labels = FALSE)
pp(file = "figures/figure4A_neutrophils.svg")
t_neutrophil_nb_pca$plot
dev.off()## png
## 2
t_neutrophil_nb_pca$plott_eosinophil_norm <- normalize_expt(t_eosinophils, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 9420 low-count genes (10532 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_pca <- plot_pca(t_eosinophil_norm,
plot_labels = FALSE)
t_eosinophil_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3, 2, 1.
t_eosinophil_nb <- normalize_expt(t_eosinophils, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 9420 low-count genes (10532 remaining).
## Setting 327 low elements to zero.
## transform_counts: Found 327 values equal to 0, adding 1 to the matrix.
t_eosinophil_nb_pca <- plot_pca(t_eosinophil_nb, plot_labels = FALSE)
pp(file = "figures/figure4A_eosinophils.svg")
t_eosinophil_nb_pca$plot
dev.off()## png
## 2
t_eosinophil_nb_pca$plott_monocyte_v1 <- subset_expt(t_monocytes, subset = "visitnumber=='1'")## The samples excluded are: TMRC30056, TMRC30105, TMRC30082, TMRC30169, TMRC30096, TMRC30115, TMRC30030, TMRC30037, TMRC30194, TMRC30049, TMRC30055, TMRC30171, TMRC30139, TMRC30157, TMRC30183, TMRC30072, TMRC30078, TMRC30129, TMRC30172, TMRC30142, TMRC30145, TMRC30199, TMRC30201, TMRC30205, TMRC30217, TMRC30219.
## subset_expt(): There were 42, now there are 16 samples.
t_monocyte_v1_norm <- normalize_expt(t_monocyte_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9470 low-count genes (10482 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_monocyte_v1_pca <- plot_pca(t_monocyte_v1_norm, plot_labels = FALSE)
t_monocyte_v1_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 1.
t_monocyte_v1_nb <- normalize_expt(t_monocyte_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9470 low-count genes (10482 remaining).
## Setting 190 low elements to zero.
## transform_counts: Found 190 values equal to 0, adding 1 to the matrix.
t_monocyte_v1_nb_pca <- plot_pca(t_monocyte_v1_nb, plot_labels = FALSE)
t_monocyte_v1_nb_pca$plott_monocyte_v2 <- subset_expt(t_monocytes, subset = "visitnumber=='2'")## The samples excluded are: TMRC30105, TMRC30080, TMRC30169, TMRC30107, TMRC30115, TMRC30014, TMRC30037, TMRC30165, TMRC30046, TMRC30055, TMRC30191, TMRC30041, TMRC30139, TMRC30132, TMRC30183, TMRC30123, TMRC30078, TMRC30184, TMRC30172, TMRC30174, TMRC30145, TMRC30197, TMRC30201, TMRC30203, TMRC30205, TMRC30237, TMRC30207, TMRC30219, TMRC30264.
## subset_expt(): There were 42, now there are 13 samples.
t_monocyte_v2_norm <- normalize_expt(t_monocyte_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9429 low-count genes (10523 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_monocyte_v2_pca <- plot_pca(t_monocyte_v2_norm, plot_labels = FALSE)
t_monocyte_v2_pca$plott_monocyte_v2_nb <- normalize_expt(t_monocyte_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9429 low-count genes (10523 remaining).
## Setting 117 low elements to zero.
## transform_counts: Found 117 values equal to 0, adding 1 to the matrix.
t_monocyte_v2_nb_pca <- plot_pca(t_monocyte_v2_nb, plot_labels = FALSE)
t_monocyte_v2_nb_pca$plott_monocyte_v3 <- subset_expt(t_monocytes, subset = "visitnumber=='3'")## The samples excluded are: TMRC30056, TMRC30080, TMRC30082, TMRC30107, TMRC30096, TMRC30014, TMRC30030, TMRC30165, TMRC30194, TMRC30046, TMRC30049, TMRC30191, TMRC30041, TMRC30171, TMRC30132, TMRC30157, TMRC30123, TMRC30072, TMRC30184, TMRC30129, TMRC30174, TMRC30142, TMRC30197, TMRC30199, TMRC30203, TMRC30237, TMRC30207, TMRC30217, TMRC30264.
## subset_expt(): There were 42, now there are 13 samples.
t_monocyte_v3_norm <- normalize_expt(t_monocyte_v3, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9575 low-count genes (10377 remaining).
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
t_monocyte_v3_pca <- plot_pca(t_monocyte_v3_norm, plot_labels = FALSE)
t_monocyte_v3_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3.
t_monocyte_v3_nb <- normalize_expt(t_monocyte_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9575 low-count genes (10377 remaining).
## Setting 58 low elements to zero.
## transform_counts: Found 58 values equal to 0, adding 1 to the matrix.
t_monocyte_v3_nb_pca <- plot_pca(t_monocyte_v3_nb, plot_labels = FALSE)
t_monocyte_v3_nb_pca$plot```{r} neutrophils_by_visit_v1} t_neutrophil_v1 <- subset_expt(t_neutrophils, subset = “visitnumber==‘1’”) t_neutrophil_v1_norm <- normalize_expt(t_neutrophil_v1, norm = “quant”, convert = “cpm”, transform = “log2”, filter = TRUE) t_neutrophil_v1_pca <- plot_pca(t_neutrophil_v1_norm, plot_labels = FALSE) t_neutrophil_v1_pca$plot
t_neutrophil_v1_nb <- normalize_expt(t_neutrophil_v1, convert = “cpm”, transform = “log2”, filter = TRUE, batch = “ruvg”) t_neutrophil_v1_nb_pca <- plot_pca(t_neutrophil_v1_nb, plot_labels = FALSE) t_neutrophil_v1_nb_pca$plot
#### Neutrophils Visit 2
```r
t_neutrophil_v2 <- subset_expt(t_neutrophils, subset = "visitnumber=='2'")
## The samples excluded are: TMRC30094, TMRC30103, TMRC30170, TMRC30083, TMRC30121, TMRC30021, TMRC30027, TMRC30166, TMRC30047, TMRC30068, TMRC30192, TMRC30042, TMRC30160, TMRC30167, TMRC30133, TMRC30116, TMRC30088, TMRC30134, TMRC30175, TMRC30146, TMRC30198, TMRC30202, TMRC30204, TMRC30206, TMRC30238, TMRC30208, TMRC30220, TMRC30265.
## subset_expt(): There were 41, now there are 13 samples.
t_neutrophil_v2_norm <- normalize_expt(t_neutrophil_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 11500 low-count genes (8452 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
t_neutrophil_v2_pca <- plot_pca(t_neutrophil_v2_norm, plot_labels = FALSE)
t_neutrophil_v2_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 2.
t_neutrophil_v2_nb <- normalize_expt(t_neutrophil_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 11500 low-count genes (8452 remaining).
## Setting 78 low elements to zero.
## transform_counts: Found 78 values equal to 0, adding 1 to the matrix.
t_neutrophil_v2_nb_pca <- plot_pca(t_neutrophil_v2_nb, plot_labels = FALSE)
t_neutrophil_v2_nb_pca$plott_neutrophil_v3 <- subset_expt(t_neutrophils, subset = "visitnumber=='3'")## The samples excluded are: TMRC30058, TMRC30103, TMRC30093, TMRC30083, TMRC30118, TMRC30021, TMRC30031, TMRC30166, TMRC30195, TMRC30047, TMRC30053, TMRC30192, TMRC30042, TMRC30158, TMRC30167, TMRC30181, TMRC30116, TMRC30076, TMRC30134, TMRC30137, TMRC30175, TMRC30143, TMRC30198, TMRC30200, TMRC30204, TMRC30238, TMRC30208, TMRC30218, TMRC30265.
## subset_expt(): There were 41, now there are 12 samples.
t_neutrophil_v3_norm <- normalize_expt(t_neutrophil_v3, norm = "quant", convert = "cpm",
transform = "log3", filter = TRUE)## Removing 11447 low-count genes (8505 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
## Did not recognize the transformation, leaving the table.
## Recognized transformations include: 'log2', 'log10', 'log'
t_neutrophil_v3_pca <- plot_pca(t_neutrophil_v3_norm, plot_labels = FALSE)
t_neutrophil_v3_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3.
t_neutrophil_v3_nb <- normalize_expt(t_neutrophil_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 11447 low-count genes (8505 remaining).
## Setting 83 low elements to zero.
## transform_counts: Found 83 values equal to 0, adding 1 to the matrix.
t_neutrophil_v3_nb_pca <- plot_pca(t_neutrophil_v3_nb, plot_labels = FALSE)
t_neutrophil_v3_nb_pca$plott_eosinophil_v1 <- subset_expt(t_eosinophils, subset = "visitnumber=='1'")## The samples excluded are: TMRC30113, TMRC30164, TMRC30119, TMRC30122, TMRC30032, TMRC30028, TMRC30196, TMRC30054, TMRC30070, TMRC30159, TMRC30161, TMRC30182, TMRC30136, TMRC30077, TMRC30079, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 26, now there are 8 samples.
t_eosinophil_v1_norm <- normalize_expt(t_eosinophil_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9973 low-count genes (9979 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_v1_pca <- plot_pca(t_eosinophil_v1_norm, plot_labels = FALSE)
t_eosinophil_v1_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 1.
t_eosinophil_v1_nb <- normalize_expt(t_eosinophil_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9973 low-count genes (9979 remaining).
## Setting 57 low elements to zero.
## transform_counts: Found 57 values equal to 0, adding 1 to the matrix.
t_eosinophil_v1_nb_pca <- plot_pca(t_eosinophil_v1_nb, plot_labels = FALSE)
t_eosinophil_v1_nb_pca$plott_eosinophil_v2 <- subset_expt(t_eosinophils, subset = "visitnumber=='2'")## The samples excluded are: TMRC30071, TMRC30164, TMRC30122, TMRC30029, TMRC30028, TMRC30180, TMRC30048, TMRC30070, TMRC30043, TMRC30161, TMRC30168, TMRC30136, TMRC30074, TMRC30079, TMRC30135, TMRC30173, TMRC30147.
## subset_expt(): There were 26, now there are 9 samples.
t_eosinophil_v2_norm <- normalize_expt(t_eosinophil_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9835 low-count genes (10117 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_v2_pca <- plot_pca(t_eosinophil_v2_norm, plot_labels = FALSE)
t_eosinophil_v2_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 2.
t_eosinophil_v2_nb <- normalize_expt(t_eosinophil_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9835 low-count genes (10117 remaining).
## Setting 90 low elements to zero.
## transform_counts: Found 90 values equal to 0, adding 1 to the matrix.
t_eosinophil_v2_nb_pca <- plot_pca(t_eosinophil_v2_nb, plot_labels = FALSE)
t_eosinophil_v2_nb_pca$plott_eosinophil_v3 <- subset_expt(t_eosinophils, subset = "visitnumber=='3'")## The samples excluded are: TMRC30071, TMRC30113, TMRC30119, TMRC30029, TMRC30032, TMRC30180, TMRC30196, TMRC30048, TMRC30054, TMRC30043, TMRC30159, TMRC30168, TMRC30182, TMRC30074, TMRC30077, TMRC30135, TMRC30144.
## subset_expt(): There were 26, now there are 9 samples.
t_eosinophil_v3_norm <- normalize_expt(t_eosinophil_v3, norm = "quant", convert = "cpm",
transform = "log3", filter = TRUE)## Removing 9872 low-count genes (10080 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
## Did not recognize the transformation, leaving the table.
## Recognized transformations include: 'log2', 'log10', 'log'
t_eosinophil_v3_pca <- plot_pca(t_eosinophil_v3_norm, plot_labels = FALSE)
t_eosinophil_v3_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by tumaco_cure, tumaco_failure
## Shapes are defined by 3.
t_eosinophil_v3_nb <- normalize_expt(t_eosinophil_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9872 low-count genes (10080 remaining).
## Setting 48 low elements to zero.
## transform_counts: Found 48 values equal to 0, adding 1 to the matrix.
t_eosinophil_v3_nb_pca <- plot_pca(t_eosinophil_v3_nb, plot_labels = FALSE)
t_eosinophil_v3_nb_pca$plotIn the following block the experimental condition was reset to the concatenation of clinical outcome and type of cells. There are an insufficient number of biopsy samples for them to be useful in this visualization, so they are ignored.
desired_levels <- c("cure_biopsy", "failure_biopsy", "cure_eosinophils", "failure_eosinophils",
"cure_monocytes", "failure_monocytes", "cure_neutrophils", "failure_neutrophils")
new_fact <- factor(
paste0(pData(t_clinical)[["condition"]], "_",
pData(t_clinical)[["batch"]]),
levels = desired_levels)
t_clinical_concat <- set_expt_conditions(t_clinical, fact = new_fact) %>%
set_expt_batches(fact = "visitnumber") %>%
set_expt_colors(color_choices[["cf_type"]]) %>%
subset_expt(subset="typeofcells!='biopsy'")## The numbers of samples by condition are:
##
## cure_biopsy failure_biopsy cure_eosinophils failure_eosinophils
## 9 5 17 9
## cure_monocytes failure_monocytes cure_neutrophils failure_neutrophils
## 21 21 20 21
## The number of samples by batch are:
##
## 3 2 1
## 34 35 54
## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 123, now there are 109 samples.
## Try to ensure that the levels stay in the order I want
meta <- pData(t_clinical_concat) %>%
mutate(condition = fct_relevel(condition, desired_levels))## Warning: There was 1 warning in `mutate()`.
## i In argument: `condition = fct_relevel(condition, desired_levels)`.
## Caused by warning:
## ! 2 unknown levels in `f`: cure_biopsy and failure_biopsy
pData(t_clinical_concat) <- metaThe following block is pretty wild to my eyes; it seems to me that the variances introduced by cell type basically wipe out the apparent differences between cure/fail that we were able to see previously.
I suppose this is not entirely surprising, but when we had the Cali samples it at least looked like there were differences which were explicitly between cure/fail across cell types. I suppose this means those differences were actually coming from the unbalanced state of the two clinics from the perspective of clinic.
t_clinical_concat_norm <- normalize_expt(t_clinical_concat, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 8042 low-count genes (11910 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
t_clinical_concat_norm_pca <- plot_pca(t_clinical_concat_norm)
t_clinical_concat_norm_pca$plott_clinical_concat_nb <- normalize_expt(t_clinical_concat, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 8042 low-count genes (11910 remaining).
## Setting 9595 low elements to zero.
## transform_counts: Found 9595 values equal to 0, adding 1 to the matrix.
t_clinical_concat_nb_pca <- plot_pca(t_clinical_concat_nb)
t_clinical_concat_nb_pca$plotLet us shift the focus from cell type and/or Cure/Fail to the visit number. As you are likely aware, the three visits are significantly spread apart according to the clinical treatment of each patient. Thus we will now separate the samples by visit in order to more easily see what new patterns emerge.
Now let us shift the view slightly to focus on changes observed over time.
I have a note from Maria Adelaida that she would like to flesh this section out with some more pdf versions of various pre/post SVA plots. If I understood/wrote down correctly her goals:
tc_visit_expt <- set_expt_conditions(tc_clinical, fact = "visitnumber") %>%
set_expt_batches(fact = "finaloutcome") %>%
set_expt_colors(color_choices[["visit2"]])## The numbers of samples by condition are:
##
## 3 2 1
## 51 50 83
## The number of samples by batch are:
##
## cure failure
## 122 62
tc_visit_norm <- normalize_expt(tc_visit_expt, filter = TRUE, transform = "log2",
convert = "cpm", norm = "quant")## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 677 values equal to 0, adding 1 to the matrix.
tc_visit_norm_pca <- plot_pca(tc_visit_norm)
pp(file = "images/tc_visit_norm_alltypes.pdf")
tc_visit_norm_pca$plot
dev.off()## png
## 2
tc_visit_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
tc_visit_nb <- normalize_expt(tc_visit_expt, filter = TRUE, transform = "log2",
convert = "cpm", batch = "svaseq")## Removing 5654 low-count genes (14298 remaining).
## Setting 39181 low elements to zero.
## transform_counts: Found 39181 values equal to 0, adding 1 to the matrix.
tc_visit_nb_pca <- plot_pca(tc_visit_nb)
pp(file = "images/tc_visit_sva_alltypes.pdf")
tc_visit_nb_pca$plot
dev.off()## png
## 2
tc_visit_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
## Repeat for only Tumaco
t_visit_expt <- subset_expt(tc_clinical, subset = "clinic=='tumaco'") %>%
set_expt_conditions(fact = "visitnumber") %>%
set_expt_batches(fact = "finaloutcome") %>%
set_expt_colors(color_choices[["visit2"]])## The samples excluded are
## subset_expt(): There were 184, now there are 123 samples.
## The numbers of samples by condition are:
##
## 3 2 1
## 34 35 54
## The number of samples by batch are:
##
## cure failure
## 67 56
t_visit_norm <- normalize_expt(t_visit_expt, filter = TRUE, transform = "log2",
convert = "cpm", norm = "quant")## Removing 5796 low-count genes (14156 remaining).
## transform_counts: Found 299 values equal to 0, adding 1 to the matrix.
t_visit_norm_pca <- plot_pca(t_visit_norm)
pp(file = "images/t_visit_norm_alltypes.pdf")
t_visit_norm_pca$plot
dev.off()## png
## 2
t_visit_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
t_visit_nb <- normalize_expt(t_visit_expt, filter = TRUE, transform = "log2",
convert = "cpm", batch = "svaseq")## Removing 5796 low-count genes (14156 remaining).
## Setting 19869 low elements to zero.
## transform_counts: Found 19869 values equal to 0, adding 1 to the matrix.
t_visit_nb_pca <- plot_pca(t_visit_nb)
pp(file = "images/t_visit_sva_alltypes.pdf")
t_visit_nb_pca$plot
dev.off()## png
## 2
t_visit_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
## Finally, limit to only the clinical celltypes
t_visit_clinical_expt <- subset_expt(t_visit_expt, subset = "typeofcells!='biopsy'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 123, now there are 109 samples.
t_visit_clinical_norm <- normalize_expt(t_visit_clinical_expt, filter = TRUE, transform = "log2",
convert = "cpm", norm = "quant")## Removing 8042 low-count genes (11910 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
t_visit_clinical_norm_pca <- plot_pca(t_visit_clinical_norm)
pp(file = "images/t_visit_clinical_norm_alltypes.pdf")
t_visit_clinical_norm_pca$plot
dev.off()## png
## 2
t_visit_clinical_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
t_visit_clinical_nb <- normalize_expt(t_visit_clinical_expt, filter = TRUE,
transform = "log2", convert = "cpm", batch = "svaseq")## Removing 8042 low-count genes (11910 remaining).
## Setting 9636 low elements to zero.
## transform_counts: Found 9636 values equal to 0, adding 1 to the matrix.
t_visit_clinical_nb_pca <- plot_pca(t_visit_clinical_nb)
pp(file = "images/t_visit_nobiop_sva_alltypes.pdf")
t_visit_clinical_nb_pca$plot
dev.off()## png
## 2
t_visit_clinical_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by 3, 2, 1
## Shapes are defined by cure, failure.
When looking at all cell types, it is quite difficult to see differences among the three visits.
Wen we had both Cali and Tumaco samples, it looked like there was variance suggesting differences between cure and fail for visit 1. I think the following block will suggest pretty strongly that this was not true.
tv1_samples <- set_expt_batches(tv1_samples, fact = "typeofcells")## The number of samples by batch are:
##
## biopsy eosinophils monocytes neutrophils
## 14 8 16 16
tv1_norm <- normalize_expt(tv1_samples, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 5929 low-count genes (14023 remaining).
## transform_counts: Found 272 values equal to 0, adding 1 to the matrix.
tv1_pca <- plot_pca(tv1_norm)
pp(file = "images/tv1_pca.pdf")
tv1_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
dev.off()## png
## 2
tv1_pca$plottv1_nb <- normalize_expt(tv1_samples, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 5929 low-count genes (14023 remaining).
## Setting 7655 low elements to zero.
## transform_counts: Found 7655 values equal to 0, adding 1 to the matrix.
tv1_nb_pca <- plot_pca(tv1_nb, plot_labels = FALSE)
pp(file = "images/tv1_sva_pca.pdf")
tv1_nb_pca$plot
dev.off()## png
## 2
tv1_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by biopsy, eosinophils, monocytes, neutrophils.
tv2_samples <- set_expt_batches(tv2_samples, fact = "typeofcells")## The number of samples by batch are:
##
## eosinophils monocytes neutrophils
## 9 13 13
tv2_norm <- normalize_expt(tv2_samples, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 8390 low-count genes (11562 remaining).
## transform_counts: Found 14 values equal to 0, adding 1 to the matrix.
tv2_pca <- plot_pca(tv2_norm)
pp(file = "images/tv2_pca.pdf")
tv2_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
dev.off()## png
## 2
tv2_pca$plottv2_nb <- normalize_expt(tv2_samples, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 8390 low-count genes (11562 remaining).
## Setting 2857 low elements to zero.
## transform_counts: Found 2857 values equal to 0, adding 1 to the matrix.
tv2_nb_pca <- plot_pca(tv2_nb, plot_labels = FALSE)
pp(file = "images/tv2_sva_pca.pdf")
tv2_nb_pca$plot
dev.off()## png
## 2
tv2_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
tv3_samples <- set_expt_batches(tv3_samples, fact = "typeofcells")## The number of samples by batch are:
##
## eosinophils monocytes neutrophils
## 9 13 12
tv3_norm <- normalize_expt(tv3_samples, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 8500 low-count genes (11452 remaining).
## transform_counts: Found 35 values equal to 0, adding 1 to the matrix.
tv3_pca <- plot_pca(tv3_norm)
pp(file = "images/tv3_pca.pdf")
tv3_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
dev.off()## png
## 2
tv3_pca$plottv3_nb <- normalize_expt(tv3_samples, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 8500 low-count genes (11452 remaining).
## Setting 1887 low elements to zero.
## transform_counts: Found 1887 values equal to 0, adding 1 to the matrix.
tv3_nb_pca <- plot_pca(tv3_nb, plot_labels = FALSE)
pp(file = "images/tv3_sva_pca.pdf")
tv3_nb_pca$plot
dev.off()## png
## 2
tv3_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by cure, failure
## Shapes are defined by eosinophils, monocytes, neutrophils.
Separate the samples by cell type in order to more easily observe patterns with respect to visit and clinical outcome.
In the following few blocks we are coloring the samples by visit and final outcome. We are also separating the three primary celltypes of interest. If I understand correctly, Maria Adelaida has an interest in a nice version of each of these 6 plots (normalized pca before/after SVA for each celltype).
t_visitcf_monocyte_norm <- normalize_expt(t_visitcf_monocyte, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9090 low-count genes (10862 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
t_visitcf_monocyte_pca <- plot_pca(t_visitcf_monocyte_norm, plot_labels = FALSE)
pp(file = "images/t_monocyte_visitcf_norm_pca.pdf")
t_visitcf_monocyte_pca$plot
dev.off()## png
## 2
t_visitcf_monocyte_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by monocytes.
t_visitcf_monocyte_disheat <- plot_disheat(t_visitcf_monocyte_norm)
t_visitcf_monocyte_disheat$plott_visitcf_monocyte_nb <- normalize_expt(t_visitcf_monocyte, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9090 low-count genes (10862 remaining).
## Setting 700 low elements to zero.
## transform_counts: Found 700 values equal to 0, adding 1 to the matrix.
t_visitcf_monocyte_nb_pca <- plot_pca(t_visitcf_monocyte_nb, plot_labels = FALSE)
pp(file = "images/t_monocyte_visitcf_sva_pca.pdf")
t_visitcf_monocyte_nb_pca$plot
dev.off()## png
## 2
t_visitcf_monocyte_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by monocytes.
Repeat the above with Eosinophils, we should therefore have slightly fewer glyphs on the plot.
t_visitcf_eosinophil_norm <- normalize_expt(t_visitcf_eosinophil, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 9420 low-count genes (10532 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_visitcf_eosinophil_pca <- plot_pca(t_visitcf_eosinophil_norm, plot_labels = FALSE)
pp(file = "images/t_eosinophil_visitcf_norm_pca.pdf")
t_visitcf_eosinophil_pca$plot
dev.off()## png
## 2
t_visitcf_eosinophil_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by eosinophils.
t_visitcf_eosinophil_disheat <- plot_disheat(t_visitcf_eosinophil_norm)
t_visitcf_eosinophil_disheat$plott_visitcf_eosinophil_nb <- normalize_expt(t_visitcf_eosinophil, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 9420 low-count genes (10532 remaining).
## Setting 373 low elements to zero.
## transform_counts: Found 373 values equal to 0, adding 1 to the matrix.
t_visitcf_eosinophil_nb_pca <- plot_pca(t_visitcf_eosinophil_nb, plot_labels = FALSE)
pp(file = "images/t_eosinophil_visitcf_sva_pca.pdf")
t_visitcf_eosinophil_nb_pca$plot
dev.off()## png
## 2
t_visitcf_eosinophil_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by eosinophils.
t_visitcf_neutrophil_norm <- normalize_expt(t_visitcf_neutrophil, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)## Removing 10851 low-count genes (9101 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_visitcf_neutrophil_pca <- plot_pca(t_visitcf_neutrophil_norm, plot_labels = FALSE)
pp(file = "images/t_neutrophil_visitcf_norm_pca.pdf")
t_visitcf_neutrophil_pca$plot
dev.off()## png
## 2
t_visitcf_neutrophil_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by neutrophils.
t_visitcf_neutrophil_disheat <- plot_disheat(t_visitcf_neutrophil_norm)
t_visitcf_neutrophil_disheat$plott_visitcf_neutrophil_nb <- normalize_expt(t_visitcf_neutrophil, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")## Removing 10851 low-count genes (9101 remaining).
## Setting 685 low elements to zero.
## transform_counts: Found 685 values equal to 0, adding 1 to the matrix.
t_visitcf_neutrophil_nb_pca <- plot_pca(t_visitcf_neutrophil_nb, plot_labels = FALSE)
pp(file = "images/t_neutrophil_visitcf_sva_pca.pdf")
t_visitcf_neutrophil_nb_pca$plot
dev.off()## png
## 2
t_visitcf_neutrophil_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by neutrophils.
We are backing off the granular view of visit and Fail/Cure in the following block and instead just considering the three visits. This previously only considered the normalized result, now we wish to add the sva modified result and print out pdfs thereof. Once again, we are repeating 3 times, once for each cell type.
t_visit_monocyte <- set_expt_conditions(t_visitcf_monocyte, prefix = "v",
fact = "visitnumber") %>%
set_expt_batches("finaloutcome") %>%
set_expt_colors(color_choices[["visit"]])## The numbers of samples by condition are:
##
## v1 v2 v3
## 16 13 13
## The number of samples by batch are:
##
## cure failure
## 21 21
t_visit_monocyte_norm <- normalize_expt(t_visit_monocyte,
transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 9090 low-count genes (10862 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
t_visit_monocyte_norm_pca <- plot_pca(t_visit_monocyte_norm, plot_labels = FALSE)
pp(file = "images/t_monocyte_visit_norm_pca.pdf")
t_visit_monocyte_norm_pca$plot
dev.off()## png
## 2
t_visit_monocyte_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1, v2, v3
## Shapes are defined by cure, failure.
t_visitcf_monocyte_nb <- normalize_expt(t_visitcf_monocyte,
transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 9090 low-count genes (10862 remaining).
## Setting 700 low elements to zero.
## transform_counts: Found 700 values equal to 0, adding 1 to the matrix.
t_visitcf_monocyte_nb_pca <- plot_pca(t_visitcf_monocyte_nb, plot_labels = FALSE)
pp(file = "images/t_monocyte_visit_sva_pca.pdf")
t_visitcf_monocyte_nb_pca$plot
dev.off()## png
## 2
t_visitcf_monocyte_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1cure, v1failure, v2cure, v2failure, v3cure, v3failure
## Shapes are defined by monocytes.
t_visit_eosinophil <- set_expt_conditions(t_visitcf_eosinophil, prefix = "v",
fact = "visitnumber") %>%
set_expt_batches("finaloutcome") %>%
set_expt_colors(color_choices[["visit"]])## The numbers of samples by condition are:
##
## v1 v2 v3
## 8 9 9
## The number of samples by batch are:
##
## cure failure
## 17 9
t_visit_eosinophil_norm <- normalize_expt(t_visit_eosinophil,
transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 9420 low-count genes (10532 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_visit_eosinophil_norm_pca <- plot_pca(t_visit_eosinophil_norm, plot_labels = FALSE)
pp(file = "images/t_eosinophil_visit_norm_pca.pdf")
t_visit_eosinophil_norm_pca$plot
dev.off()## png
## 2
t_visit_eosinophil_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1, v2, v3
## Shapes are defined by cure, failure.
t_visit_eosinophil_nb <- normalize_expt(t_visit_eosinophil,
transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 9420 low-count genes (10532 remaining).
## Setting 271 low elements to zero.
## transform_counts: Found 271 values equal to 0, adding 1 to the matrix.
t_visit_eosinophil_nb_pca <- plot_pca(t_visit_eosinophil_nb, plot_labels = FALSE)
pp(file = "images/t_eosinophil_visit_sva_pca.pdf")
t_visit_eosinophil_nb_pca$plot
dev.off()## png
## 2
t_visit_eosinophil_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1, v2, v3
## Shapes are defined by cure, failure.
t_visit_neutrophil <- set_expt_conditions(t_visitcf_neutrophil, prefix = "v",
fact = "visitnumber") %>%
set_expt_batches("finaloutcome") %>%
set_expt_colors(color_choices[["visit"]])## The numbers of samples by condition are:
##
## v1 v2 v3
## 16 13 12
## The number of samples by batch are:
##
## cure failure
## 20 21
t_visit_neutrophil_norm <- normalize_expt(t_visit_neutrophil,
transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Removing 10851 low-count genes (9101 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_visit_neutrophil_norm_pca <- plot_pca(t_visit_neutrophil_norm, plot_labels = FALSE)
pp(file = "images/t_neutrophil_visit_norm_pca.pdf")
t_visit_neutrophil_norm_pca$plot
dev.off()## png
## 2
t_visit_neutrophil_norm_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1, v2, v3
## Shapes are defined by cure, failure.
t_visit_neutrophil_nb <- normalize_expt(t_visit_neutrophil,
transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)## Removing 10851 low-count genes (9101 remaining).
## Setting 593 low elements to zero.
## transform_counts: Found 593 values equal to 0, adding 1 to the matrix.
t_visit_neutrophil_nb_pca <- plot_pca(t_visit_neutrophil_nb, plot_labels = FALSE)
pp(file = "images/t_neutrophil_visit_sva_pca.pdf")
t_visit_neutrophil_nb_pca$plot
dev.off()## png
## 2
t_visit_neutrophil_nb_pca## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by v1, v2, v3
## Shapes are defined by cure, failure.
See if there are any patterns which look usable.
## All
t_persistence_norm <- normalize_expt(t_persistence, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
plot_pca(t_persistence_norm)$plot
t_persistence_nb <- normalize_expt(t_persistence, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
plot_pca(t_persistence_nb)$plot
## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
## norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)$plot
## Insufficient data
## Monocytes
t_persistence_monocyte_norm <- normalize_expt(t_persistence_monocyte, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
plot_pca(t_persistence_monocyte_norm)$plot
t_persistence_monocyte_nb <- normalize_expt(t_persistence_monocyte, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
plot_pca(t_persistence_monocyte_nb)$plot
## Neutrophils
t_persistence_neutrophil_norm <- normalize_expt(t_persistence_neutrophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
plot_pca(t_persistence_neutrophil_norm)$plot
t_persistence_neutrophil_nb <- normalize_expt(t_persistence_neutrophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
plot_pca(t_persistence_neutrophil_nb)$plot
## Eosinophils
t_persistence_eosinophil_norm <- normalize_expt(t_persistence_eosinophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
plot_pca(t_persistence_eosinophil_norm)$plot
t_persistence_eosinophil_nb <- normalize_expt(t_persistence_eosinophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
plot_pca(t_persistence_eosinophil_nb)$plotI wrote out all the z2.2 and z2.3 specific variants to a couple files, I want to see if I can classify a human sample as infected with 2.2 or 2.3.
z22 <- read.csv("csv/variants_22.csv")
z23 <- read.csv("csv/variants_23.csv")
cure <- read.csv("csv/cure_variants.txt")
fail <- read.csv("csv/fail_variants.txt")
z22_vec <- gsub(pattern="\\-", replacement="_", x=z22[["x"]])
z23_vec <- gsub(pattern="\\-", replacement="_", x=z23[["x"]])
cure_vec <- gsub(pattern="\\-", replacement="_", x=cure)
fail_vec <- gsub(pattern="\\-", replacement="_", x=fail)
classify_zymo <- function(sample) {
arbitrary_tags <- sm(readr::read_tsv(sample))
arbitrary_ids <- arbitrary_tags[["position"]]
message("Length: ", length(arbitrary_ids), ", z22: ",
sum(arbitrary_ids %in% z22_vec) / (length(z22_vec)), " z23: ",
sum(arbitrary_ids %in% z23_vec) / (length(z23_vec)))
}
arbitrary_sample <- "preprocessing/TMRC30156/outputs/40freebayes_lpanamensis_v36/all_tags.txt.xz"
classify_zymo(arbitrary_sample)First lets get the gene IDs and colors for these plots.
library(viridis)## Loading required package: viridisLite
wanted_genes <- c("IFI44L", "IFI27", "PRR5", "PRR5-ARHGAP8", "RHCE",
"FBXO39", "RSAD2", "SMTNL1", "USP18", "AFAP1")
wanted_idx <- fData(tc_valid)[["hgnc_symbol"]] %in% wanted_genes
wanted_ids <- rownames(fData(tc_valid))[wanted_idx]few <- subset_genes(tc_valid, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 184 samples which kept less than 90 percent counts.
## TMRC30156 TMRC30185 TMRC30186 TMRC30178 TMRC30179 TMRC30221 TMRC30222 TMRC30223
## 0.074962 0.005211 0.004192 0.010342 0.012853 0.008657 0.011388 0.012338
## TMRC30224 TMRC30269 TMRC30148 TMRC30149 TMRC30253 TMRC30150 TMRC30140 TMRC30138
## 0.012827 0.113766 0.025329 0.080425 0.092054 0.022887 0.058740 0.013443
## TMRC30176 TMRC30153 TMRC30151 TMRC30234 TMRC30235 TMRC30270 TMRC30225 TMRC30226
## 0.033930 0.056567 0.010845 0.022164 0.036248 0.105898 0.018196 0.066755
## TMRC30227 TMRC30016 TMRC30228 TMRC30229 TMRC30230 TMRC30017 TMRC30231 TMRC30232
## 0.012886 0.053396 0.010446 0.017396 0.007234 0.163526 0.011375 0.015075
## TMRC30233 TMRC30018 TMRC30209 TMRC30210 TMRC30211 TMRC30212 TMRC30213 TMRC30216
## 0.007098 0.198302 0.014238 0.030266 0.013335 0.013080 0.013782 0.013332
## TMRC30214 TMRC30215 TMRC30271 TMRC30273 TMRC30275 TMRC30272 TMRC30274 TMRC30276
## 0.026954 0.037395 0.002904 0.008120 0.004977 0.003982 0.011099 0.009685
## TMRC30254 TMRC30255 TMRC30256 TMRC30277 TMRC30239 TMRC30240 TMRC30278 TMRC30279
## 0.002683 0.004997 0.003678 0.028913 0.011614 0.011747 0.005125 0.009030
## TMRC30280 TMRC30257 TMRC30019 TMRC30258 TMRC30281 TMRC30283 TMRC30284 TMRC30282
## 0.004587 0.056479 0.137228 0.042362 0.006149 0.006348 0.018823 0.007745
## TMRC30285 TMRC30071 TMRC30020 TMRC30056 TMRC30113 TMRC30105 TMRC30058 TMRC30164
## 0.011814 0.003604 0.069015 0.024331 0.003870 0.027699 0.064838 0.004041
## TMRC30080 TMRC30094 TMRC30119 TMRC30082 TMRC30103 TMRC30122 TMRC30022 TMRC30169
## 0.029852 0.057588 0.011127 0.004927 0.250987 0.005859 0.087372 0.007921
## TMRC30093 TMRC30029 TMRC30107 TMRC30170 TMRC30032 TMRC30096 TMRC30083 TMRC30028
## 0.008303 0.013187 0.007179 0.010832 0.010258 0.010897 0.026117 0.010995
## TMRC30115 TMRC30118 TMRC30180 TMRC30014 TMRC30121 TMRC30196 TMRC30030 TMRC30021
## 0.008703 0.021574 0.021135 0.004655 0.016174 0.010626 0.006697 0.011014
## TMRC30026 TMRC30037 TMRC30031 TMRC30165 TMRC30027 TMRC30044 TMRC30194 TMRC30166
## 0.110316 0.009305 0.011136 0.033554 0.010749 0.073281 0.006814 0.164852
## TMRC30195 TMRC30048 TMRC30054 TMRC30045 TMRC30046 TMRC30070 TMRC30049 TMRC30055
## 0.007850 0.019523 0.042559 0.073178 0.031598 0.015248 0.062950 0.026184
## TMRC30047 TMRC30191 TMRC30053 TMRC30041 TMRC30068 TMRC30171 TMRC30192 TMRC30139
## 0.171838 0.004217 0.318379 0.009402 0.055379 0.016379 0.003692 0.023784
## TMRC30042 TMRC30158 TMRC30132 TMRC30160 TMRC30157 TMRC30183 TMRC30167 TMRC30123
## 0.009871 0.018550 0.005344 0.033926 0.009121 0.007546 0.009458 0.129523
## TMRC30181 TMRC30072 TMRC30133 TMRC30043 TMRC30078 TMRC30116 TMRC30184 TMRC30076
## 0.010955 0.074089 0.012120 0.004400 0.077858 0.332468 0.008483 0.280252
## TMRC30159 TMRC30129 TMRC30088 TMRC30172 TMRC30134 TMRC30174 TMRC30137 TMRC30161
## 0.006340 0.023852 0.366218 0.020875 0.008721 0.007188 0.028016 0.008835
## TMRC30142 TMRC30175 TMRC30145 TMRC30143 TMRC30168 TMRC30197 TMRC30146 TMRC30182
## 0.016100 0.006401 0.014411 0.037261 0.002339 0.011999 0.016008 0.003287
## TMRC30199 TMRC30198 TMRC30201 TMRC30200 TMRC30203 TMRC30202 TMRC30205 TMRC30204
## 0.011593 0.040711 0.009860 0.012649 0.019503 0.007974 0.076899 0.070550
## TMRC30152 TMRC30177 TMRC30155 TMRC30154 TMRC30241 TMRC30237 TMRC30206 TMRC30136
## 0.081844 0.136279 0.178394 0.109730 0.191405 0.007216 0.234762 0.002704
## TMRC30207 TMRC30238 TMRC30074 TMRC30217 TMRC30208 TMRC30077 TMRC30219 TMRC30218
## 0.006025 0.015708 0.054751 0.008582 0.014484 0.043051 0.004538 0.006692
## TMRC30079 TMRC30220 TMRC30135 TMRC30173 TMRC30264 TMRC30144 TMRC30147 TMRC30265
## 0.065252 0.002911 0.010142 0.013178 0.062792 0.008604 0.007424 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 122 62
## transform_counts: Found 102 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_clinical, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 123 samples which kept less than 90 percent counts.
## TMRC30016 TMRC30017 TMRC30018 TMRC30019 TMRC30071 TMRC30020 TMRC30056 TMRC30113
## 0.053396 0.163526 0.198302 0.137228 0.003604 0.069015 0.024331 0.003870
## TMRC30105 TMRC30058 TMRC30164 TMRC30080 TMRC30094 TMRC30119 TMRC30082 TMRC30103
## 0.027699 0.064838 0.004041 0.029852 0.057588 0.011127 0.004927 0.250987
## TMRC30122 TMRC30022 TMRC30169 TMRC30093 TMRC30029 TMRC30107 TMRC30170 TMRC30032
## 0.005859 0.087372 0.007921 0.008303 0.013187 0.007179 0.010832 0.010258
## TMRC30096 TMRC30083 TMRC30028 TMRC30115 TMRC30118 TMRC30180 TMRC30014 TMRC30121
## 0.010897 0.026117 0.010995 0.008703 0.021574 0.021135 0.004655 0.016174
## TMRC30196 TMRC30030 TMRC30021 TMRC30026 TMRC30037 TMRC30031 TMRC30165 TMRC30027
## 0.010626 0.006697 0.011014 0.110316 0.009305 0.011136 0.033554 0.010749
## TMRC30044 TMRC30194 TMRC30166 TMRC30195 TMRC30048 TMRC30054 TMRC30045 TMRC30046
## 0.073281 0.006814 0.164852 0.007850 0.019523 0.042559 0.073178 0.031598
## TMRC30070 TMRC30049 TMRC30055 TMRC30047 TMRC30191 TMRC30053 TMRC30041 TMRC30068
## 0.015248 0.062950 0.026184 0.171838 0.004217 0.318379 0.009402 0.055379
## TMRC30171 TMRC30192 TMRC30139 TMRC30042 TMRC30158 TMRC30132 TMRC30160 TMRC30157
## 0.016379 0.003692 0.023784 0.009871 0.018550 0.005344 0.033926 0.009121
## TMRC30183 TMRC30167 TMRC30123 TMRC30181 TMRC30072 TMRC30133 TMRC30043 TMRC30078
## 0.007546 0.009458 0.129523 0.010955 0.074089 0.012120 0.004400 0.077858
## TMRC30116 TMRC30184 TMRC30076 TMRC30159 TMRC30129 TMRC30088 TMRC30172 TMRC30134
## 0.332468 0.008483 0.280252 0.006340 0.023852 0.366218 0.020875 0.008721
## TMRC30174 TMRC30137 TMRC30161 TMRC30142 TMRC30175 TMRC30145 TMRC30143 TMRC30168
## 0.007188 0.028016 0.008835 0.016100 0.006401 0.014411 0.037261 0.002339
## TMRC30197 TMRC30146 TMRC30182 TMRC30199 TMRC30198 TMRC30201 TMRC30200 TMRC30203
## 0.011999 0.016008 0.003287 0.011593 0.040711 0.009860 0.012649 0.019503
## TMRC30202 TMRC30205 TMRC30204 TMRC30152 TMRC30177 TMRC30155 TMRC30154 TMRC30241
## 0.007974 0.076899 0.070550 0.081844 0.136279 0.178394 0.109730 0.191405
## TMRC30237 TMRC30206 TMRC30136 TMRC30207 TMRC30238 TMRC30074 TMRC30217 TMRC30208
## 0.007216 0.234762 0.002704 0.006025 0.015708 0.054751 0.008582 0.014484
## TMRC30077 TMRC30219 TMRC30218 TMRC30079 TMRC30220 TMRC30135 TMRC30173 TMRC30264
## 0.043051 0.004538 0.006692 0.065252 0.002911 0.010142 0.013178 0.062792
## TMRC30144 TMRC30147 TMRC30265
## 0.008604 0.007424 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 67 56
## transform_counts: Found 39 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_clinical, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 184 samples which kept less than 90 percent counts.
## TMRC30156 TMRC30185 TMRC30186 TMRC30178 TMRC30179 TMRC30221 TMRC30222 TMRC30223
## 0.074962 0.005211 0.004192 0.010342 0.012853 0.008657 0.011388 0.012338
## TMRC30224 TMRC30269 TMRC30148 TMRC30149 TMRC30253 TMRC30150 TMRC30140 TMRC30138
## 0.012827 0.113766 0.025329 0.080425 0.092054 0.022887 0.058740 0.013443
## TMRC30176 TMRC30153 TMRC30151 TMRC30234 TMRC30235 TMRC30270 TMRC30225 TMRC30226
## 0.033930 0.056567 0.010845 0.022164 0.036248 0.105898 0.018196 0.066755
## TMRC30227 TMRC30016 TMRC30228 TMRC30229 TMRC30230 TMRC30017 TMRC30231 TMRC30232
## 0.012886 0.053396 0.010446 0.017396 0.007234 0.163526 0.011375 0.015075
## TMRC30233 TMRC30018 TMRC30209 TMRC30210 TMRC30211 TMRC30212 TMRC30213 TMRC30216
## 0.007098 0.198302 0.014238 0.030266 0.013335 0.013080 0.013782 0.013332
## TMRC30214 TMRC30215 TMRC30271 TMRC30273 TMRC30275 TMRC30272 TMRC30274 TMRC30276
## 0.026954 0.037395 0.002904 0.008120 0.004977 0.003982 0.011099 0.009685
## TMRC30254 TMRC30255 TMRC30256 TMRC30277 TMRC30239 TMRC30240 TMRC30278 TMRC30279
## 0.002683 0.004997 0.003678 0.028913 0.011614 0.011747 0.005125 0.009030
## TMRC30280 TMRC30257 TMRC30019 TMRC30258 TMRC30281 TMRC30283 TMRC30284 TMRC30282
## 0.004587 0.056479 0.137228 0.042362 0.006149 0.006348 0.018823 0.007745
## TMRC30285 TMRC30071 TMRC30020 TMRC30056 TMRC30113 TMRC30105 TMRC30058 TMRC30164
## 0.011814 0.003604 0.069015 0.024331 0.003870 0.027699 0.064838 0.004041
## TMRC30080 TMRC30094 TMRC30119 TMRC30082 TMRC30103 TMRC30122 TMRC30022 TMRC30169
## 0.029852 0.057588 0.011127 0.004927 0.250987 0.005859 0.087372 0.007921
## TMRC30093 TMRC30029 TMRC30107 TMRC30170 TMRC30032 TMRC30096 TMRC30083 TMRC30028
## 0.008303 0.013187 0.007179 0.010832 0.010258 0.010897 0.026117 0.010995
## TMRC30115 TMRC30118 TMRC30180 TMRC30014 TMRC30121 TMRC30196 TMRC30030 TMRC30021
## 0.008703 0.021574 0.021135 0.004655 0.016174 0.010626 0.006697 0.011014
## TMRC30026 TMRC30037 TMRC30031 TMRC30165 TMRC30027 TMRC30044 TMRC30194 TMRC30166
## 0.110316 0.009305 0.011136 0.033554 0.010749 0.073281 0.006814 0.164852
## TMRC30195 TMRC30048 TMRC30054 TMRC30045 TMRC30046 TMRC30070 TMRC30049 TMRC30055
## 0.007850 0.019523 0.042559 0.073178 0.031598 0.015248 0.062950 0.026184
## TMRC30047 TMRC30191 TMRC30053 TMRC30041 TMRC30068 TMRC30171 TMRC30192 TMRC30139
## 0.171838 0.004217 0.318379 0.009402 0.055379 0.016379 0.003692 0.023784
## TMRC30042 TMRC30158 TMRC30132 TMRC30160 TMRC30157 TMRC30183 TMRC30167 TMRC30123
## 0.009871 0.018550 0.005344 0.033926 0.009121 0.007546 0.009458 0.129523
## TMRC30181 TMRC30072 TMRC30133 TMRC30043 TMRC30078 TMRC30116 TMRC30184 TMRC30076
## 0.010955 0.074089 0.012120 0.004400 0.077858 0.332468 0.008483 0.280252
## TMRC30159 TMRC30129 TMRC30088 TMRC30172 TMRC30134 TMRC30174 TMRC30137 TMRC30161
## 0.006340 0.023852 0.366218 0.020875 0.008721 0.007188 0.028016 0.008835
## TMRC30142 TMRC30175 TMRC30145 TMRC30143 TMRC30168 TMRC30197 TMRC30146 TMRC30182
## 0.016100 0.006401 0.014411 0.037261 0.002339 0.011999 0.016008 0.003287
## TMRC30199 TMRC30198 TMRC30201 TMRC30200 TMRC30203 TMRC30202 TMRC30205 TMRC30204
## 0.011593 0.040711 0.009860 0.012649 0.019503 0.007974 0.076899 0.070550
## TMRC30152 TMRC30177 TMRC30155 TMRC30154 TMRC30241 TMRC30237 TMRC30206 TMRC30136
## 0.081844 0.136279 0.178394 0.109730 0.191405 0.007216 0.234762 0.002704
## TMRC30207 TMRC30238 TMRC30074 TMRC30217 TMRC30208 TMRC30077 TMRC30219 TMRC30218
## 0.006025 0.015708 0.054751 0.008582 0.014484 0.043051 0.004538 0.006692
## TMRC30079 TMRC30220 TMRC30135 TMRC30173 TMRC30264 TMRC30144 TMRC30147 TMRC30265
## 0.065252 0.002911 0.010142 0.013178 0.062792 0.008604 0.007424 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 122 62
## The samples excluded are
## subset_expt(): There were 184, now there are 83 samples.
## transform_counts: Found 43 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_clinical, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 123 samples which kept less than 90 percent counts.
## TMRC30016 TMRC30017 TMRC30018 TMRC30019 TMRC30071 TMRC30020 TMRC30056 TMRC30113
## 0.053396 0.163526 0.198302 0.137228 0.003604 0.069015 0.024331 0.003870
## TMRC30105 TMRC30058 TMRC30164 TMRC30080 TMRC30094 TMRC30119 TMRC30082 TMRC30103
## 0.027699 0.064838 0.004041 0.029852 0.057588 0.011127 0.004927 0.250987
## TMRC30122 TMRC30022 TMRC30169 TMRC30093 TMRC30029 TMRC30107 TMRC30170 TMRC30032
## 0.005859 0.087372 0.007921 0.008303 0.013187 0.007179 0.010832 0.010258
## TMRC30096 TMRC30083 TMRC30028 TMRC30115 TMRC30118 TMRC30180 TMRC30014 TMRC30121
## 0.010897 0.026117 0.010995 0.008703 0.021574 0.021135 0.004655 0.016174
## TMRC30196 TMRC30030 TMRC30021 TMRC30026 TMRC30037 TMRC30031 TMRC30165 TMRC30027
## 0.010626 0.006697 0.011014 0.110316 0.009305 0.011136 0.033554 0.010749
## TMRC30044 TMRC30194 TMRC30166 TMRC30195 TMRC30048 TMRC30054 TMRC30045 TMRC30046
## 0.073281 0.006814 0.164852 0.007850 0.019523 0.042559 0.073178 0.031598
## TMRC30070 TMRC30049 TMRC30055 TMRC30047 TMRC30191 TMRC30053 TMRC30041 TMRC30068
## 0.015248 0.062950 0.026184 0.171838 0.004217 0.318379 0.009402 0.055379
## TMRC30171 TMRC30192 TMRC30139 TMRC30042 TMRC30158 TMRC30132 TMRC30160 TMRC30157
## 0.016379 0.003692 0.023784 0.009871 0.018550 0.005344 0.033926 0.009121
## TMRC30183 TMRC30167 TMRC30123 TMRC30181 TMRC30072 TMRC30133 TMRC30043 TMRC30078
## 0.007546 0.009458 0.129523 0.010955 0.074089 0.012120 0.004400 0.077858
## TMRC30116 TMRC30184 TMRC30076 TMRC30159 TMRC30129 TMRC30088 TMRC30172 TMRC30134
## 0.332468 0.008483 0.280252 0.006340 0.023852 0.366218 0.020875 0.008721
## TMRC30174 TMRC30137 TMRC30161 TMRC30142 TMRC30175 TMRC30145 TMRC30143 TMRC30168
## 0.007188 0.028016 0.008835 0.016100 0.006401 0.014411 0.037261 0.002339
## TMRC30197 TMRC30146 TMRC30182 TMRC30199 TMRC30198 TMRC30201 TMRC30200 TMRC30203
## 0.011999 0.016008 0.003287 0.011593 0.040711 0.009860 0.012649 0.019503
## TMRC30202 TMRC30205 TMRC30204 TMRC30152 TMRC30177 TMRC30155 TMRC30154 TMRC30241
## 0.007974 0.076899 0.070550 0.081844 0.136279 0.178394 0.109730 0.191405
## TMRC30237 TMRC30206 TMRC30136 TMRC30207 TMRC30238 TMRC30074 TMRC30217 TMRC30208
## 0.007216 0.234762 0.002704 0.006025 0.015708 0.054751 0.008582 0.014484
## TMRC30077 TMRC30219 TMRC30218 TMRC30079 TMRC30220 TMRC30135 TMRC30173 TMRC30264
## 0.043051 0.004538 0.006692 0.065252 0.002911 0.010142 0.013178 0.062792
## TMRC30144 TMRC30147 TMRC30265
## 0.008604 0.007424 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 67 56
## The samples excluded are
## subset_expt(): There were 123, now there are 54 samples.
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_eosinophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 41 samples which kept less than 90 percent counts.
## TMRC30138 TMRC30151 TMRC30227 TMRC30230 TMRC30233 TMRC30211 TMRC30216 TMRC30271
## 0.013443 0.010845 0.012886 0.007234 0.007098 0.013335 0.013332 0.002904
## TMRC30272 TMRC30254 TMRC30277 TMRC30278 TMRC30257 TMRC30281 TMRC30282 TMRC30071
## 0.003982 0.002683 0.028913 0.005125 0.056479 0.006149 0.007745 0.003604
## TMRC30113 TMRC30164 TMRC30119 TMRC30122 TMRC30029 TMRC30032 TMRC30028 TMRC30180
## 0.003870 0.004041 0.011127 0.005859 0.013187 0.010258 0.010995 0.021135
## TMRC30196 TMRC30048 TMRC30054 TMRC30070 TMRC30043 TMRC30159 TMRC30161 TMRC30168
## 0.010626 0.019523 0.042559 0.015248 0.004400 0.006340 0.008835 0.002339
## TMRC30182 TMRC30136 TMRC30074 TMRC30077 TMRC30079 TMRC30135 TMRC30173 TMRC30144
## 0.003287 0.002704 0.054751 0.043051 0.065252 0.010142 0.013178 0.008604
## TMRC30147
## 0.007424
## The numbers of samples by condition are:
##
## cure failure
## 32 9
## transform_counts: Found 40 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_eosinophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 26 samples which kept less than 90 percent counts.
## TMRC30071 TMRC30113 TMRC30164 TMRC30119 TMRC30122 TMRC30029 TMRC30032 TMRC30028
## 0.003604 0.003870 0.004041 0.011127 0.005859 0.013187 0.010258 0.010995
## TMRC30180 TMRC30196 TMRC30048 TMRC30054 TMRC30070 TMRC30043 TMRC30159 TMRC30161
## 0.021135 0.010626 0.019523 0.042559 0.015248 0.004400 0.006340 0.008835
## TMRC30168 TMRC30182 TMRC30136 TMRC30074 TMRC30077 TMRC30079 TMRC30135 TMRC30173
## 0.002339 0.003287 0.002704 0.054751 0.043051 0.065252 0.010142 0.013178
## TMRC30144 TMRC30147
## 0.008604 0.007424
## The numbers of samples by condition are:
##
## cure failure
## 17 9
## transform_counts: Found 15 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_eosinophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 41 samples which kept less than 90 percent counts.
## TMRC30138 TMRC30151 TMRC30227 TMRC30230 TMRC30233 TMRC30211 TMRC30216 TMRC30271
## 0.013443 0.010845 0.012886 0.007234 0.007098 0.013335 0.013332 0.002904
## TMRC30272 TMRC30254 TMRC30277 TMRC30278 TMRC30257 TMRC30281 TMRC30282 TMRC30071
## 0.003982 0.002683 0.028913 0.005125 0.056479 0.006149 0.007745 0.003604
## TMRC30113 TMRC30164 TMRC30119 TMRC30122 TMRC30029 TMRC30032 TMRC30028 TMRC30180
## 0.003870 0.004041 0.011127 0.005859 0.013187 0.010258 0.010995 0.021135
## TMRC30196 TMRC30048 TMRC30054 TMRC30070 TMRC30043 TMRC30159 TMRC30161 TMRC30168
## 0.010626 0.019523 0.042559 0.015248 0.004400 0.006340 0.008835 0.002339
## TMRC30182 TMRC30136 TMRC30074 TMRC30077 TMRC30079 TMRC30135 TMRC30173 TMRC30144
## 0.003287 0.002704 0.054751 0.043051 0.065252 0.010142 0.013178 0.008604
## TMRC30147
## 0.007424
## The numbers of samples by condition are:
##
## cure failure
## 32 9
## The samples excluded are: TMRC30151, TMRC30230, TMRC30233, TMRC30216, TMRC30272, TMRC30254, TMRC30277, TMRC30278, TMRC30282, TMRC30113, TMRC30164, TMRC30119, TMRC30122, TMRC30032, TMRC30028, TMRC30196, TMRC30054, TMRC30070, TMRC30159, TMRC30161, TMRC30182, TMRC30136, TMRC30077, TMRC30079, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 41, now there are 14 samples.
## transform_counts: Found 15 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_eosinophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 26 samples which kept less than 90 percent counts.
## TMRC30071 TMRC30113 TMRC30164 TMRC30119 TMRC30122 TMRC30029 TMRC30032 TMRC30028
## 0.003604 0.003870 0.004041 0.011127 0.005859 0.013187 0.010258 0.010995
## TMRC30180 TMRC30196 TMRC30048 TMRC30054 TMRC30070 TMRC30043 TMRC30159 TMRC30161
## 0.021135 0.010626 0.019523 0.042559 0.015248 0.004400 0.006340 0.008835
## TMRC30168 TMRC30182 TMRC30136 TMRC30074 TMRC30077 TMRC30079 TMRC30135 TMRC30173
## 0.002339 0.003287 0.002704 0.054751 0.043051 0.065252 0.010142 0.013178
## TMRC30144 TMRC30147
## 0.008604 0.007424
## The numbers of samples by condition are:
##
## cure failure
## 17 9
## The samples excluded are: TMRC30113, TMRC30164, TMRC30119, TMRC30122, TMRC30032, TMRC30028, TMRC30196, TMRC30054, TMRC30070, TMRC30159, TMRC30161, TMRC30182, TMRC30136, TMRC30077, TMRC30079, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 26, now there are 8 samples.
## transform_counts: Found 6 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_monocytes, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 63 samples which kept less than 90 percent counts.
## TMRC30185 TMRC30178 TMRC30221 TMRC30223 TMRC30148 TMRC30150 TMRC30176 TMRC30234
## 0.005211 0.010342 0.008657 0.012338 0.025329 0.022887 0.033930 0.022164
## TMRC30225 TMRC30228 TMRC30231 TMRC30209 TMRC30212 TMRC30214 TMRC30273 TMRC30274
## 0.018196 0.010446 0.011375 0.014238 0.013080 0.026954 0.008120 0.011099
## TMRC30255 TMRC30239 TMRC30279 TMRC30258 TMRC30283 TMRC30056 TMRC30105 TMRC30080
## 0.004997 0.011614 0.009030 0.042362 0.006348 0.024331 0.027699 0.029852
## TMRC30082 TMRC30169 TMRC30107 TMRC30096 TMRC30115 TMRC30014 TMRC30030 TMRC30037
## 0.004927 0.007921 0.007179 0.010897 0.008703 0.004655 0.006697 0.009305
## TMRC30165 TMRC30194 TMRC30046 TMRC30049 TMRC30055 TMRC30191 TMRC30041 TMRC30171
## 0.033554 0.006814 0.031598 0.062950 0.026184 0.004217 0.009402 0.016379
## TMRC30139 TMRC30132 TMRC30157 TMRC30183 TMRC30123 TMRC30072 TMRC30078 TMRC30184
## 0.023784 0.005344 0.009121 0.007546 0.129523 0.074089 0.077858 0.008483
## TMRC30129 TMRC30172 TMRC30174 TMRC30142 TMRC30145 TMRC30197 TMRC30199 TMRC30201
## 0.023852 0.020875 0.007188 0.016100 0.014411 0.011999 0.011593 0.009860
## TMRC30203 TMRC30205 TMRC30237 TMRC30207 TMRC30217 TMRC30219 TMRC30264
## 0.019503 0.076899 0.007216 0.006025 0.008582 0.004538 0.062792
## The numbers of samples by condition are:
##
## cure failure
## 39 24
## transform_counts: Found 10 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_monocytes, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 42 samples which kept less than 90 percent counts.
## TMRC30056 TMRC30105 TMRC30080 TMRC30082 TMRC30169 TMRC30107 TMRC30096 TMRC30115
## 0.024331 0.027699 0.029852 0.004927 0.007921 0.007179 0.010897 0.008703
## TMRC30014 TMRC30030 TMRC30037 TMRC30165 TMRC30194 TMRC30046 TMRC30049 TMRC30055
## 0.004655 0.006697 0.009305 0.033554 0.006814 0.031598 0.062950 0.026184
## TMRC30191 TMRC30041 TMRC30171 TMRC30139 TMRC30132 TMRC30157 TMRC30183 TMRC30123
## 0.004217 0.009402 0.016379 0.023784 0.005344 0.009121 0.007546 0.129523
## TMRC30072 TMRC30078 TMRC30184 TMRC30129 TMRC30172 TMRC30174 TMRC30142 TMRC30145
## 0.074089 0.077858 0.008483 0.023852 0.020875 0.007188 0.016100 0.014411
## TMRC30197 TMRC30199 TMRC30201 TMRC30203 TMRC30205 TMRC30237 TMRC30207 TMRC30217
## 0.011999 0.011593 0.009860 0.019503 0.076899 0.007216 0.006025 0.008582
## TMRC30219 TMRC30264
## 0.004538 0.062792
## The numbers of samples by condition are:
##
## cure failure
## 21 21
## transform_counts: Found 4 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_monocytes, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 63 samples which kept less than 90 percent counts.
## TMRC30185 TMRC30178 TMRC30221 TMRC30223 TMRC30148 TMRC30150 TMRC30176 TMRC30234
## 0.005211 0.010342 0.008657 0.012338 0.025329 0.022887 0.033930 0.022164
## TMRC30225 TMRC30228 TMRC30231 TMRC30209 TMRC30212 TMRC30214 TMRC30273 TMRC30274
## 0.018196 0.010446 0.011375 0.014238 0.013080 0.026954 0.008120 0.011099
## TMRC30255 TMRC30239 TMRC30279 TMRC30258 TMRC30283 TMRC30056 TMRC30105 TMRC30080
## 0.004997 0.011614 0.009030 0.042362 0.006348 0.024331 0.027699 0.029852
## TMRC30082 TMRC30169 TMRC30107 TMRC30096 TMRC30115 TMRC30014 TMRC30030 TMRC30037
## 0.004927 0.007921 0.007179 0.010897 0.008703 0.004655 0.006697 0.009305
## TMRC30165 TMRC30194 TMRC30046 TMRC30049 TMRC30055 TMRC30191 TMRC30041 TMRC30171
## 0.033554 0.006814 0.031598 0.062950 0.026184 0.004217 0.009402 0.016379
## TMRC30139 TMRC30132 TMRC30157 TMRC30183 TMRC30123 TMRC30072 TMRC30078 TMRC30184
## 0.023784 0.005344 0.009121 0.007546 0.129523 0.074089 0.077858 0.008483
## TMRC30129 TMRC30172 TMRC30174 TMRC30142 TMRC30145 TMRC30197 TMRC30199 TMRC30201
## 0.023852 0.020875 0.007188 0.016100 0.014411 0.011999 0.011593 0.009860
## TMRC30203 TMRC30205 TMRC30237 TMRC30207 TMRC30217 TMRC30219 TMRC30264
## 0.019503 0.076899 0.007216 0.006025 0.008582 0.004538 0.062792
## The numbers of samples by condition are:
##
## cure failure
## 39 24
## The samples excluded are: TMRC30178, TMRC30223, TMRC30150, TMRC30176, TMRC30228, TMRC30231, TMRC30212, TMRC30214, TMRC30274, TMRC30255, TMRC30279, TMRC30056, TMRC30105, TMRC30082, TMRC30169, TMRC30096, TMRC30115, TMRC30030, TMRC30037, TMRC30194, TMRC30049, TMRC30055, TMRC30171, TMRC30139, TMRC30157, TMRC30183, TMRC30072, TMRC30078, TMRC30129, TMRC30172, TMRC30142, TMRC30145, TMRC30199, TMRC30201, TMRC30205, TMRC30217, TMRC30219.
## subset_expt(): There were 63, now there are 26 samples.
## transform_counts: Found 3 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_monocytes, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 42 samples which kept less than 90 percent counts.
## TMRC30056 TMRC30105 TMRC30080 TMRC30082 TMRC30169 TMRC30107 TMRC30096 TMRC30115
## 0.024331 0.027699 0.029852 0.004927 0.007921 0.007179 0.010897 0.008703
## TMRC30014 TMRC30030 TMRC30037 TMRC30165 TMRC30194 TMRC30046 TMRC30049 TMRC30055
## 0.004655 0.006697 0.009305 0.033554 0.006814 0.031598 0.062950 0.026184
## TMRC30191 TMRC30041 TMRC30171 TMRC30139 TMRC30132 TMRC30157 TMRC30183 TMRC30123
## 0.004217 0.009402 0.016379 0.023784 0.005344 0.009121 0.007546 0.129523
## TMRC30072 TMRC30078 TMRC30184 TMRC30129 TMRC30172 TMRC30174 TMRC30142 TMRC30145
## 0.074089 0.077858 0.008483 0.023852 0.020875 0.007188 0.016100 0.014411
## TMRC30197 TMRC30199 TMRC30201 TMRC30203 TMRC30205 TMRC30237 TMRC30207 TMRC30217
## 0.011999 0.011593 0.009860 0.019503 0.076899 0.007216 0.006025 0.008582
## TMRC30219 TMRC30264
## 0.004538 0.062792
## The numbers of samples by condition are:
##
## cure failure
## 21 21
## The samples excluded are: TMRC30056, TMRC30105, TMRC30082, TMRC30169, TMRC30096, TMRC30115, TMRC30030, TMRC30037, TMRC30194, TMRC30049, TMRC30055, TMRC30171, TMRC30139, TMRC30157, TMRC30183, TMRC30072, TMRC30078, TMRC30129, TMRC30172, TMRC30142, TMRC30145, TMRC30199, TMRC30201, TMRC30205, TMRC30217, TMRC30219.
## subset_expt(): There were 42, now there are 16 samples.
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_neutrophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 62 samples which kept less than 90 percent counts.
## TMRC30186 TMRC30179 TMRC30222 TMRC30224 TMRC30149 TMRC30140 TMRC30153 TMRC30235
## 0.004192 0.012853 0.011388 0.012827 0.080425 0.058740 0.056567 0.036248
## TMRC30226 TMRC30229 TMRC30232 TMRC30210 TMRC30213 TMRC30215 TMRC30275 TMRC30276
## 0.066755 0.017396 0.015075 0.030266 0.013782 0.037395 0.004977 0.009685
## TMRC30256 TMRC30240 TMRC30280 TMRC30284 TMRC30285 TMRC30058 TMRC30094 TMRC30103
## 0.003678 0.011747 0.004587 0.018823 0.011814 0.064838 0.057588 0.250987
## TMRC30093 TMRC30170 TMRC30083 TMRC30118 TMRC30121 TMRC30021 TMRC30031 TMRC30027
## 0.008303 0.010832 0.026117 0.021574 0.016174 0.011014 0.011136 0.010749
## TMRC30166 TMRC30195 TMRC30047 TMRC30053 TMRC30068 TMRC30192 TMRC30042 TMRC30158
## 0.164852 0.007850 0.171838 0.318379 0.055379 0.003692 0.009871 0.018550
## TMRC30160 TMRC30167 TMRC30181 TMRC30133 TMRC30116 TMRC30076 TMRC30088 TMRC30134
## 0.033926 0.009458 0.010955 0.012120 0.332468 0.280252 0.366218 0.008721
## TMRC30137 TMRC30175 TMRC30143 TMRC30146 TMRC30198 TMRC30200 TMRC30202 TMRC30204
## 0.028016 0.006401 0.037261 0.016008 0.040711 0.012649 0.007974 0.070550
## TMRC30206 TMRC30238 TMRC30208 TMRC30218 TMRC30220 TMRC30265
## 0.234762 0.015708 0.014484 0.006692 0.002911 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 38 24
## transform_counts: Found 52 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_neutrophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 41 samples which kept less than 90 percent counts.
## TMRC30058 TMRC30094 TMRC30103 TMRC30093 TMRC30170 TMRC30083 TMRC30118 TMRC30121
## 0.064838 0.057588 0.250987 0.008303 0.010832 0.026117 0.021574 0.016174
## TMRC30021 TMRC30031 TMRC30027 TMRC30166 TMRC30195 TMRC30047 TMRC30053 TMRC30068
## 0.011014 0.011136 0.010749 0.164852 0.007850 0.171838 0.318379 0.055379
## TMRC30192 TMRC30042 TMRC30158 TMRC30160 TMRC30167 TMRC30181 TMRC30133 TMRC30116
## 0.003692 0.009871 0.018550 0.033926 0.009458 0.010955 0.012120 0.332468
## TMRC30076 TMRC30088 TMRC30134 TMRC30137 TMRC30175 TMRC30143 TMRC30146 TMRC30198
## 0.280252 0.366218 0.008721 0.028016 0.006401 0.037261 0.016008 0.040711
## TMRC30200 TMRC30202 TMRC30204 TMRC30206 TMRC30238 TMRC30208 TMRC30218 TMRC30220
## 0.012649 0.007974 0.070550 0.234762 0.015708 0.014484 0.006692 0.002911
## TMRC30265
## 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 20 21
## transform_counts: Found 20 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(tc_neutrophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 62 samples which kept less than 90 percent counts.
## TMRC30186 TMRC30179 TMRC30222 TMRC30224 TMRC30149 TMRC30140 TMRC30153 TMRC30235
## 0.004192 0.012853 0.011388 0.012827 0.080425 0.058740 0.056567 0.036248
## TMRC30226 TMRC30229 TMRC30232 TMRC30210 TMRC30213 TMRC30215 TMRC30275 TMRC30276
## 0.066755 0.017396 0.015075 0.030266 0.013782 0.037395 0.004977 0.009685
## TMRC30256 TMRC30240 TMRC30280 TMRC30284 TMRC30285 TMRC30058 TMRC30094 TMRC30103
## 0.003678 0.011747 0.004587 0.018823 0.011814 0.064838 0.057588 0.250987
## TMRC30093 TMRC30170 TMRC30083 TMRC30118 TMRC30121 TMRC30021 TMRC30031 TMRC30027
## 0.008303 0.010832 0.026117 0.021574 0.016174 0.011014 0.011136 0.010749
## TMRC30166 TMRC30195 TMRC30047 TMRC30053 TMRC30068 TMRC30192 TMRC30042 TMRC30158
## 0.164852 0.007850 0.171838 0.318379 0.055379 0.003692 0.009871 0.018550
## TMRC30160 TMRC30167 TMRC30181 TMRC30133 TMRC30116 TMRC30076 TMRC30088 TMRC30134
## 0.033926 0.009458 0.010955 0.012120 0.332468 0.280252 0.366218 0.008721
## TMRC30137 TMRC30175 TMRC30143 TMRC30146 TMRC30198 TMRC30200 TMRC30202 TMRC30204
## 0.028016 0.006401 0.037261 0.016008 0.040711 0.012649 0.007974 0.070550
## TMRC30206 TMRC30238 TMRC30208 TMRC30218 TMRC30220 TMRC30265
## 0.234762 0.015708 0.014484 0.006692 0.002911 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 38 24
## The samples excluded are: TMRC30179, TMRC30224, TMRC30140, TMRC30153, TMRC30229, TMRC30232, TMRC30213, TMRC30215, TMRC30276, TMRC30256, TMRC30280, TMRC30285, TMRC30058, TMRC30094, TMRC30093, TMRC30170, TMRC30118, TMRC30121, TMRC30031, TMRC30027, TMRC30195, TMRC30053, TMRC30068, TMRC30158, TMRC30160, TMRC30181, TMRC30133, TMRC30076, TMRC30088, TMRC30137, TMRC30143, TMRC30146, TMRC30200, TMRC30202, TMRC30206, TMRC30218, TMRC30220.
## subset_expt(): There were 62, now there are 25 samples.
## transform_counts: Found 25 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpfew <- subset_genes(t_neutrophils, ids = wanted_ids, method = "keep") %>%
set_expt_conditions(fact = "finaloutcome") %>%
subset_expt(subset = "visitnumber=='1'") %>%
normalize_expt(transform = "log2", convert = "rpkm",
column = "mean_cds_len")## remove_genes_expt(), before removal, there were 19952 genes, now there are 10.
## There are 41 samples which kept less than 90 percent counts.
## TMRC30058 TMRC30094 TMRC30103 TMRC30093 TMRC30170 TMRC30083 TMRC30118 TMRC30121
## 0.064838 0.057588 0.250987 0.008303 0.010832 0.026117 0.021574 0.016174
## TMRC30021 TMRC30031 TMRC30027 TMRC30166 TMRC30195 TMRC30047 TMRC30053 TMRC30068
## 0.011014 0.011136 0.010749 0.164852 0.007850 0.171838 0.318379 0.055379
## TMRC30192 TMRC30042 TMRC30158 TMRC30160 TMRC30167 TMRC30181 TMRC30133 TMRC30116
## 0.003692 0.009871 0.018550 0.033926 0.009458 0.010955 0.012120 0.332468
## TMRC30076 TMRC30088 TMRC30134 TMRC30137 TMRC30175 TMRC30143 TMRC30146 TMRC30198
## 0.280252 0.366218 0.008721 0.028016 0.006401 0.037261 0.016008 0.040711
## TMRC30200 TMRC30202 TMRC30204 TMRC30206 TMRC30238 TMRC30208 TMRC30218 TMRC30220
## 0.012649 0.007974 0.070550 0.234762 0.015708 0.014484 0.006692 0.002911
## TMRC30265
## 0.255306
## The numbers of samples by condition are:
##
## cure failure
## 20 21
## The samples excluded are: TMRC30058, TMRC30094, TMRC30093, TMRC30170, TMRC30118, TMRC30121, TMRC30031, TMRC30027, TMRC30195, TMRC30053, TMRC30068, TMRC30158, TMRC30160, TMRC30181, TMRC30133, TMRC30076, TMRC30088, TMRC30137, TMRC30143, TMRC30146, TMRC30200, TMRC30202, TMRC30206, TMRC30218, TMRC30220.
## subset_expt(): There were 41, now there are 16 samples.
## transform_counts: Found 9 values equal to 0, adding 1 to the matrix.
shp <- plot_sample_heatmap(few, heatmap_colors = viridis, row_label = wanted_genes)
shpLet us look at a moderately similar Biopsy dataset of braziliensis infected individuals. First, lets do a quick plot of their data, our biopsies, then combine them.
## Load the scott-only and the scott+tmrc3 data
load(glue("rda/tmrc3_external_cf-v{ver}.rda"))
load(glue("rda/tmrc3_external-v{ver}.rda"))our_biopsies <- set_expt_conditions(t_biopsies, "finaloutcome") %>%
set_expt_colors(color_choices[["cf"]])## The numbers of samples by condition are:
##
## cure failure
## 9 5
our_biopsies_norm <- normalize_expt(our_biopsies, filter = TRUE, transform = "log2",
convert = "cpm", batch = "svaseq")## Removing 6439 low-count genes (13513 remaining).
## Setting 146 low elements to zero.
## transform_counts: Found 146 values equal to 0, adding 1 to the matrix.
plot_pca(our_biopsies_norm)$plotscott_biopsies_norm <- normalize_expt(external_cf, filter = TRUE, transform = "log2",
convert = "cpm", batch = "svaseq")## Removing 7327 low-count genes (14154 remaining).
## Setting 171 low elements to zero.
## transform_counts: Found 171 values equal to 0, adding 1 to the matrix.
plot_pca(scott_biopsies_norm)$plotboth_norm <- normalize_expt(tmrc3_external, filter = TRUE, transform = "log2",
convert = "cpm", norm = "quant")## Removing 6904 low-count genes (14577 remaining).
## transform_counts: Found 18 values equal to 0, adding 1 to the matrix.
plot_pca(both_norm)$plotboth_nb <- normalize_expt(tmrc3_external, filter = TRUE, transform = "log2",
convert = "cpm", batch = "svaseq")## Removing 6904 low-count genes (14577 remaining).
## Setting 3653 low elements to zero.
## transform_counts: Found 3653 values equal to 0, adding 1 to the matrix.
plot_pca(both_nb)$plotexternal_species <- set_expt_conditions(tmrc3_external, fact = "ParasiteSpecies") %>%
subset_expt(subset = "ParasiteSpecies!='notapplicable'") %>%
set_expt_batches(fact = "lab")## The numbers of samples by condition are:
##
## lvbraziliensis lvpanamensis notapplicable
## 22 14 3
## The samples excluded are: TMRC30018, TMRC30045, TMRC30155.
## subset_expt(): There were 39, now there are 36 samples.
## The number of samples by batch are:
##
## Brazil Colombia
## 21 15
tt <- normalize_expt(external_species, transform = "log2", convert = "cpm", norm = "quant",
filter = TRUE)## Removing 6955 low-count genes (14526 remaining).
## transform_counts: Found 17 values equal to 0, adding 1 to the matrix.
plot_pca(tt)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by lvbraziliensis, lvpanamensis
## Shapes are defined by Brazil, Colombia.
ttt <- normalize_expt(external_species, transform = "log2", convert = "cpm", batch = "svaseq",
filter = TRUE)## Removing 6955 low-count genes (14526 remaining).
## Setting 2874 low elements to zero.
## transform_counts: Found 2874 values equal to 0, adding 1 to the matrix.
plot_pca(ttt)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by lvbraziliensis, lvpanamensis
## Shapes are defined by Brazil, Colombia.
I am resurrecting some of the comparisons of the parasite transcriptome in the host data.
lp_cf <- set_expt_conditions(lp_expt, fact = "finaloutcome")## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_expt' not found
table(pData(lp_cf)[["typeofcells"]])## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'lp_cf' not found
lp_cf_norm <- normalize_expt(lp_cf, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'lp_cf' not found
lp_cf_sm <- plot_sm(lp_cf_norm)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_sm': object 'lp_cf_norm' not found
lp_cf_sm## Error in eval(expr, envir, enclos): object 'lp_cf_sm' not found
lp_cf_corheat <- plot_corheat(lp_cf_norm)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt_data' in selecting a method for function 'plot_heatmap': object 'lp_cf_norm' not found
lp_cf_corheat## Error in eval(expr, envir, enclos): object 'lp_cf_corheat' not found
lp_cf_norm_pca <- plot_pca(lp_cf_norm)## Error in eval(expr, envir, enclos): object 'lp_cf_norm' not found
lp_cf_norm_pca## Error in eval(expr, envir, enclos): object 'lp_cf_norm_pca' not found
lp_cf_nb <- normalize_expt(lp_cf, transform = "log2", convert = "cpm",
batch = "svaseq", filter = "simple")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'lp_cf' not found
lp_cf_nb_pca <- plot_pca(lp_cf_nb)## Error in eval(expr, envir, enclos): object 'lp_cf_nb' not found
lp_cf_nb_pca## Error in eval(expr, envir, enclos): object 'lp_cf_nb_pca' not found
Note, the previous task includes visits 2/3 and multiple cell types and as a result is likely to include the most profoundly infected people (only those in whom we observe >30,000 reads and >3,000 genes of parasite reads. Thus, even though it sort of looks like there might be a C/F difference, the sva shows that to be a lie.
Nonetheless, we can make this clearer by excluding the visits2/3 and/or non-biopsies.
lp_cf_biop <- subset_expt(lp_cf, subset = "typeofcells=='Biopsy'")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'subset_expt': object 'lp_cf' not found
lp_cf_biop_norm <- normalize_expt(lp_cf_biop, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'lp_cf_biop' not found
lp_cf_biop_sm <- plot_sm(lp_cf_biop_norm)## Error in h(simpleError(msg, call)): error in evaluating the argument 'data' in selecting a method for function 'plot_sm': object 'lp_cf_biop_norm' not found
lp_cf_biop_sm## Error in eval(expr, envir, enclos): object 'lp_cf_biop_sm' not found
lp_cf_biop_corheat <- plot_corheat(lp_cf_biop_norm)## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt_data' in selecting a method for function 'plot_heatmap': object 'lp_cf_biop_norm' not found
lp_cf_biop_corheat## Error in eval(expr, envir, enclos): object 'lp_cf_biop_corheat' not found
lp_cf_biop_norm_pca <- plot_pca(lp_cf_biop_norm)## Error in eval(expr, envir, enclos): object 'lp_cf_biop_norm' not found
lp_cf_biop_norm_pca## Error in eval(expr, envir, enclos): object 'lp_cf_biop_norm_pca' not found
lp_cf_biop_nb <- normalize_expt(lp_cf_biop, transform = "log2", convert = "cpm",
batch = "svaseq", filter = "simple")## Error in h(simpleError(msg, call)): error in evaluating the argument 'expt' in selecting a method for function 'normalize_expt': object 'lp_cf_biop' not found
lp_cf_biop_nb_pca <- plot_pca(lp_cf_biop_nb)## Error in eval(expr, envir, enclos): object 'lp_cf_biop_nb' not found
lp_cf_biop_nb_pca## Error in eval(expr, envir, enclos): object 'lp_cf_biop_nb_pca' not found
This is coming out of the 09varcor_regression document and was initially performed by Theresa. If this works well and is sufficient, I might remove that document and therefore have that much less stuff to check on for correctness.
Note from atb: I need to make a few changes to this section, primarily we need to be able to automatically generate the tables of f-statistics; in case the data changes (which it did since Theresa performed this, one sample was removed I think). With that caveat, the following is coming directly out of her SVA_V3_Tumaco document. I also would like to compare the SV-fstats to similar metrics I took of PCs vs. metadata factors. My assumption (if I understand the math in sva at all) is that they should largely complement/agree with each other.
We would like to know what the heck SVA is actually correcting for when we do an SVA correction. Are there any metadatas that these SV’s are correlated with?
To do this, I will run SVA to get the SV loadings. I will then do something akin to PC loadings analysis to see how these individual SVs (and combinatorial SVs) are associated with any
I will use a computed F-statistic for this association to measure the between:within cluster variance in a model (and tell us if that factor is a “good” indicator of separation based on that sv loading).
\[\begin{equation} F-statistic = \frac{TSS - RSS}{RSS} \end{equation}\]
So for this, I will use a series of linear regressions which model each dimension of SVA as a function of the observed variables that describe the known underlying group structure (clinic, visit, patient, …)
\[\begin{equation} \underbrace{X_i}_\text{dimension i of SVA} = \underbrace{B_0 + B_1 celltype/visit/clinic/donor}_\text{underlying group structure} \end{equation}\]
We can do this breakdown in a few ways to answer different questions which I will explore further below.
We have decided the Cali samples don’t offer a lot of extra information for us, and there is significant clinic batch effect, so we are going to remove the Cali samples and evaluate the SV loadings.
The first thing to do is the actual SVA to get the loadings.
I may have changed a few of Theresa’s variable names when I first copy/pasted this document together without taking note of the modification; but I am reasonably certain that the intended data structures are the same.
clinic_sva <- normalize_expt(t_clinical, filter = TRUE)## Removing 5796 low-count genes (14156 remaining).
pheno <- pData(clinic_sva)
edata <- exprs(clinic_sva)
mod <- model.matrix(~as.factor(finaloutcome), data = pheno)
mod0 <- model.matrix(~1, data = pheno)
svobj <- sva::svaseq(edata, mod, mod0)## Number of significant surrogate variables is: 4
## Iteration (out of 5 ):1 2 3 4 5
SVA found 4 SV’s. We can plot them individually to visually inspect their separation w.r.t some metadata.
svs <- as.data.frame(svobj$sv)
colnames(svs) <- paste0("sv_", seq(1:4))
svs <- cbind(svs, pheno)
sv1_typeofcells <- ggplot(svs, aes(y = sv_1, x = typeofcells, fill = typeofcells)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Type of Cells") +
ylab("SV 1") +
theme_classic() +
theme(legend.position = "none")
sv1_visit <- ggplot(svs, aes(y = sv_1, x = visitnumber, fill = visitnumber)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Visit Number") +
ylab("SV 1") +
theme_classic() +
theme(legend.position = "none")
sv1_donor <- ggplot(svs, aes(y = sv_1, x = donor, fill = donor)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Donor") +
ylab("SV 1") +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5))
sv1_typeofcellssv1_visitsv1_donor## Warning: Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
##grid.arrange(sv1_typeofcells, sv1_visit, sv1_donor, nrow = 2)sv2_typeofcells <- ggplot(svs, aes(y = sv_2, x = typeofcells, fill = typeofcells)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Type of Cells") +
ylab("SV 2") +
theme_classic() +
theme(legend.position = "none")
sv2_visit <- ggplot(svs, aes(y = sv_2, x = visitnumber, fill = visitnumber)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Visit Number") +
ylab("SV 2") +
theme_classic() +
theme(legend.position = "none")
sv2_donor <- ggplot(svs, aes(y = sv_2, x = donor, fill = donor)) +
geom_violin() +
geom_point(alpha = 0.75) +
xlab("Donor") +
ylab("SV 2") +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5))
#grid.arrange(sv2_typeofcells, sv2_visit, sv2_donor, nrow = 2)
sv2_typeofcellssv2_visitsv2_donor## Warning: Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
I spent a little time to simplify and try to make the reasoning above a little more robust so that I can regenerate Theresa’s xlsx table of f-statistics as well as add a little more information. The following block attempts this…
Najib correctly pointed out that I left off clinic in this first invocation.
queries <- c("typeofcells", "visitnumber", "clinic", "donor")
tc_clinical_fpstats <- svpc_fstats(tc_clinical, num_pcs = 5, queries = queries)## The input appears raw, performing default normalization.
## Removing 5654 low-count genes (14298 remaining).
## transform_counts: Found 222365 values equal to 0, adding 1 to the matrix.
## Removing 5654 low-count genes (14298 remaining).
## Setting 27144 low elements to zero.
## transform_counts: Found 27144 values equal to 0, adding 1 to the matrix.
queries <- c("typeofcells", "visitnumber", "donor")
t_clinical_fpstats <- svpc_fstats(t_clinical, num_pcs = 5, queries = queries)## The input appears raw, performing default normalization.
## Removing 5796 low-count genes (14156 remaining).
## transform_counts: Found 126745 values equal to 0, adding 1 to the matrix.
## Removing 5796 low-count genes (14156 remaining).
## Setting 17331 low elements to zero.
## transform_counts: Found 17331 values equal to 0, adding 1 to the matrix.
c_clinical_fpstats <- svpc_fstats(c_clinical, num_pcs = 5, queries = queries)## The input appears raw, performing default normalization.
## Removing 6553 low-count genes (13399 remaining).
## transform_counts: Found 66487 values equal to 0, adding 1 to the matrix.
## Removing 6553 low-count genes (13399 remaining).
## Setting 5038 low elements to zero.
## transform_counts: Found 5038 values equal to 0, adding 1 to the matrix.
I am going to add a little code in this section to send this to an xlsx file. I might need to add a little bit of code as well because I am not certain that there is a document which contains this calculation for each data subset.
I put together a quick function which writes the results of one of these analyses to a xlsx file, but it very much assumes a single dataset and is not easily amendable to multiple; therefore I will strip the code out here into a new function to repeat itself for the Tumaco/Cali/Both data for an arbitrary combination.
Query from Maria Adelaida: Perform a similar f/p statistics plot/xlsx table but using the first 5 PCs and SVs; perhaps also include the amount of variance remaining tale (I forget its name: residuals).
But also do slightly different plots: 2 plots: 1 with PCs before SVA followed by the SVs, the 1 with SVs followed by post PCs.
Given this, perform this task with: Clinic, Donor, Visit, Celltype using the clinical samples (no biopsies).
write_combined_fpstats <- function(both = tc_clinical_fpstats, tumaco = t_clinical_fpstats,
cali = c_clinical_fpstats,
excel = "excel/combined_svpc_fstats.xlsx") {
xlsx <- init_xlsx(excel)
wb <- xlsx[["wb"]]
excel_basename <- xlsx[["basename"]]
do_excel <- TRUE
if (is.null(wb)) {
do_excel <- FALSE
}
current_row <- 1
pref <- both[["pre_f"]]
svf <- both[["sv_f"]]
postf <- both[["post_f"]]
## Changing the rownames due to rbind rownames shenanigans.
rownames(pref) <- paste0("PrePC", seq_len(nrow(pref)))
rownames(postf) <- paste0("PostPC", seq_len(nrow(postf)))
allf <- rbind(pref, svf, postf)
prep <- both[["pre_p"]]
svp <- both[["sv_p"]]
postp <- both[["post_p"]]
rownames(prep) <- paste0("PrePC", seq_len(nrow(prep)))
rownames(postp) <- paste0("PostPC", seq_len(nrow(postp)))
allp <- rbind(prep, svp, postp)
fun_plot <- heatmap.3(as.matrix(allp), dendrogram = "none",
scale = "none", trace = "none",
Colv = FALSE, Rowv = FALSE)
image <- grDevices::recordPlot()
xlsx_result <- write_xlsx(data = allf, wb = wb, sheet = "Fvalues", start_row = current_row,
title = "Both clinics, SVA and PC analysis, F-values")
xlsx_result <- write_xlsx(data = allp, wb = wb, sheet = "Pvalues", start_row = current_row,
title = "Both clinics, SVA and PC analysis, P-values")
current_row <- xlsx_result[["end_row"]] + 2
try_result <- xlsx_insert_png(
a_plot = image, wb = wb, sheet = "Pvalues", start_col = ncol(allp) + 2)
image_files = c()
if (! "try-error" %in% class(try_result)) {
image_files = try_result[["filename"]]
}
pref <- tumaco[["pre_f"]]
svf <- tumaco[["sv_f"]]
postf <- tumaco[["post_f"]]
## Changing the rownames due to rbind rownames shenanigans.
rownames(pref) <- paste0("PrePC", seq_len(nrow(pref)))
rownames(postf) <- paste0("PostPC", seq_len(nrow(postf)))
allf <- rbind(pref, svf, postf)
prep <- tumaco[["pre_p"]]
svp <- tumaco[["sv_p"]]
postp <- tumaco[["post_p"]]
rownames(prep) <- paste0("PrePC", seq_len(nrow(prep)))
rownames(postp) <- paste0("PostPC", seq_len(nrow(postp)))
allp <- rbind(prep, svp, postp)
xlsx_result <- write_xlsx(data = allf, wb = wb, sheet = "Fvalues", start_row = current_row,
title = "Tumaco, SVA and PC analysis, F-values")
xlsx_result <- write_xlsx(data = allp, wb = wb, sheet = "Pvalues", start_row = current_row,
title = "Tumaco, SVA and PC analysis, P-values")
current_row <- xlsx_result[["end_row"]] + 2
pref <- cali[["pre_f"]]
svf <- cali[["sv_f"]]
postf <- cali[["post_f"]]
## Changing the rownames due to rbind rownames shenanigans.
rownames(pref) <- paste0("PrePC", seq_len(nrow(pref)))
rownames(postf) <- paste0("PostPC", seq_len(nrow(postf)))
allf <- rbind(pref, svf, postf)
prep <- cali[["pre_p"]]
svp <- cali[["sv_p"]]
postp <- cali[["post_p"]]
rownames(prep) <- paste0("PrePC", seq_len(nrow(prep)))
rownames(postp) <- paste0("PostPC", seq_len(nrow(postp)))
allp <- rbind(prep, svp, postp)
xlsx_result <- write_xlsx(data = allf, wb = wb, sheet = "Fvalues", start_row = current_row,
title = "Cali, SVA and PC analysis, F-values")
xlsx_result <- write_xlsx(data = allp, sheet = "Pvalues", wb = wb, start_row = current_row,
title = "Cali, SVA and PC analysis, P-values")
current_row <- xlsx_result[["end_row"]] + 2
excel_ret <- try(openxlsx::saveWorkbook(wb, excel, overwrite = TRUE))
removed <- try(suppressWarnings(file.remove(image_files)), silent = TRUE)
}
clinical_fpstats <- write_combined_fpstats(
both = tc_clinical_fpstats, tumaco = t_clinical_fpstats, cali = c_clinical_fpstats,
excel = glue("excel/clinical_fpstats-v{ver}.xlsx"))The F-stat resulting from an anova for the model sv ~ metadata_factor shows that the main thing we are correcting for with an SVA correction (with cure/fail as the model factor) is the cell type. The factor donor contributes the next highest separation, with clinic falling in third. the visit contributes essentially no variance in this data, which we knew from the DE results.