The R markdown documents in this directory are intended to provide a complete accounting of the analyses performed in the preparation of
“Innate biosignature of treatment failure in patients with cutaneous leishmaniasis.”
I assume that if anyone ever reads this, s/he is looking for the source of the data, figures, and tables from that paper and to see if the methods employed to generate them are good, bad, indifferent, or garbage. These documents are being generated by a singularity container which provides the input count tables (Once we have accessions, I will make a companion which is able to create them), sample sheets, input documents, and all the software used to process them. The overal configuration of the container is contained in the toplevel .yml file and provides the base OS (Debian stable) and the scripts used to install the software. ‘local/bin/setup_debian.sh’ handles the base container; ‘local/bin/setup_hpgltools.sh’ sets up R, bioconductor, and my package ‘hpgltools’, and ‘local/bin/runscript’ is the script run if one invokes the container without any arguments; when run it uses the various installed R packages to ‘render’ the Rmd files in /data to freshly created html. This README is the first document rendered.
As of 20240919, Maria Adelaida kindly sent me the putatively final set of figures/tables. I am going to lay out the location in the html logs and their Rmd parents where these may be found. If you wish to play along, make sure you run the entire 01datastructures.Rmd.
Also, this container is automatically regenerated and rerun when I make changes; so one may just go here to play along (that is where I am going to hunt down the figures/tables):
https://containerbook.umiacs.io/
These numbers are coming from 01datastructures: when considering the samples from the perspective of how many people fall into each category, that is coming directly from section 4.1 “Metadata Sources” and the demographics xlsx file. This provides the number of people who fall into each category of panel A/B.
TODO: Clarify v1/v2/v3 vs. Pre-Tx, Mid-Tx, End-Tx in the notebook
This is also derived from 01datastructures, but from the perspective of the numbers of samples which survive our various filters. Thus, to arrive at these numbers, one should start at section 6.1 “Create Expressionset.” and wander through the document; however doing so will likely make most people sad because it is a long journey. Instead, you may skip down to section 14 “Summarize: Tabulate sample numbers”. The section labeled ‘Both’ provides these numbers. Note, this table used to be just Tumaco, which follows.
The numbers of samples in each group are restated in the summaries. The only real caveat is that I wrote them as ‘visit1’ or ‘v1’ for Pre-Tx, ‘visit2’ for Mid-Tx, and ‘visit3’ for End-Tx.
Another way to recapitulate these numbers is to check out the sankey plots in section 8 ‘Visualize the sample breakdown’. An example invocation looks like:
clinic_type_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "typeofcells", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)
clinic_type_outcome_sankey
clinic_ethnicity_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "etnia", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)
clinic_ethnicity_outcome_sankey
clinic_sex_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "sex", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)
clinic_sex_outcome_sankey
Found in 02visualization, section 8 “Global views of all cell types” Hey, check it, different types of immune cells are different!
One invocation looks like this (there are a few versions):
tc_pca <- plot_pca(tc_norm, plot_labels = FALSE,
plot_title = "PCA - Cell type", size_column = "visitnumber")
tc_pca
PCA showing celltypes
Ibid. Both panels are actually a little further down in section 8.1.
Here is the invocation:
tc_cf_corheat <- plot_corheat(tc_cf_norm, plot_title = "Heirarchical clustering:
cell types")
tc_cf_corheat