This document is intended to create the data structures used to evaluate our TMRC2 samples. In some cases, this includes only those samples starting in 2019; in other instances I am including our previous (2015-2016) samples.
In all cases the processing performed was:
In a couple of important ways the TMRC2 data is much more complex than the TMRC3:
Everything which follows depends on the Existing TriTrypDB annotations revision 46, circa 2019. The following block loads a database of these annotations and turns it into a matrix where the rows are genes and columns are all the annotation types provided by TriTrypDB.
The same database was used to create a matrix of orthologous genes between L.panamensis and all of the other species in the TriTrypDB.
The same database of annotations also provides mappings to the set of annotated GO categories for the L.panamensis genome along with gene lengths.
meta <- download_eupath_metadata(webservice = "tritrypdb", overwrite = FALSE)
panamensis_entry <- get_eupath_entry("MHOM", metadata = meta[["valid"]])
panamensis_db <- make_eupath_orgdb(panamensis_entry)
panamensis_pkg <- panamensis_db[["pkgname"]]
package_name <- panamensis_db[["pkgname"]]
if (is.null(panamensis_pkg)) {
panamensis_pkg <- panamensis_entry[["OrgdbPkg"]]
package_name <- panamensis_pkg
}
tt <- library(panamensis_pkg, character.only = TRUE)
panamensis_env <- get0(panamensis_pkg)
all_fields <- columns(panamensis_env)
all_lp_annot <- sm(load_orgdb_annotations(
panamensis_env,
keytype = "gid",
fields = c("annot_gene_entrez_id", "annot_gene_name",
"annot_strand", "annot_chromosome", "annot_cds_length",
"annot_gene_product")))$genes
## Testing to see just how big the full database is.
## testing <- load_orgdb_annotations(panamensis_pkg, keytype = "gid", fields = "all")
lp_go <- load_orgdb_go(panamensis_pkg)
lp_go <- lp_go[, c("GID", "GO")]
lp_lengths <- all_lp_annot[, c("gid", "annot_cds_length")]
colnames(lp_lengths) <- c("ID", "length")
all_lp_annot[["annot_gene_product"]] <- tolower(all_lp_annot[["annot_gene_product"]])
orthos <- sm(extract_eupath_orthologs(db = panamensis_pkg))
data_structures <- c(data_structures, "lp_lengths", "lp_go", "all_lp_annot")all_installed <- rownames(installed.packages())
candidates <- grepl(pattern = "^org.Lpanamensis.MHOM.*v68.eg.db", x = all_installed)
do_parasite <- FALSE
if (sum(candidates) > 0) {
do_parasite <- TRUE
orgdb_pkg_name <- all_installed[candidates]
tt <- library(orgdb_pkg_name, character.only = TRUE)
panamensis_pkg <- get0(orgdb_pkg_name)
all_fields <- columns(panamensis_pkg)
all_lp_annot <- sm(load_orgdb_annotations(
panamensis_pkg,
keytype = "gid",
fields = c("annot_gene_entrez_id", "annot_gene_name",
"annot_strand", "annot_chromosome", "annot_cds_length",
"annot_gene_product")))$genes
lp_go <- load_orgdb_go(panamensis_pkg)
lp_go <- lp_go[, c("GID", "GO")]
lp_lengths <- all_lp_annot[, c("gid", "annot_cds_length")]
colnames(lp_lengths) <- c("ID", "length")
all_lp_annot[["annot_gene_product"]] <- tolower(all_lp_annot[["annot_gene_product"]])
data_structures <- c(data_structures, "lp_lengths", "lp_go", "all_lp_annot")
}## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following object is masked from 'package:dplyr':
##
## explain
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:dplyr':
##
## combine
## The following objects are masked from 'package:hpgltools':
##
## conditions, conditions<-, normalize
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:glue':
##
## trim
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
##
## The chosen keytype was not available. Using 'GID'.
## This is an orgdb, good.
## 'select()' returned 1:many mapping between keys and columns
## 'select()' returned 1:many mapping between keys and columns
The following block loads the full genome sequence for panamensis. We may use this later to attempt to estimate PCR primers to discern strains.
I am not sure how to increase the number of open files in a container, as a result this does not work.
The process of sample estimation takes two primary inputs:
An expressionSet(or summarizedExperiment) is a data structure used in R to examine RNASeq data. It is comprised of annotations, metadata, and expression data. In the case of our processing pipeline, the location of the expression data is provided by the filenames in the metadata.
The following list contains the colors we have chosen to use when plotting the various ways of discerning the data.
color_choices <- list(
"strain" = list(
## "z1.0" = "#333333", ## Changed this to 'braz' to make it easier to find them.
"z2.0" = "#555555",
"z3.0" = "#777777",
"z2.1" = "#874400",
"z2.2" = "#0000cc",
"z2.3" = "#cc0000",
"z2.4" = "#df7000",
"z3.2" = "#888888",
"z1.0" = "#cc00cc",
"z1.5" = "#cc00cc",
"b2904" = "#cc00cc",
"unknown" = "#cbcbcb"),
## "null" = "#000000"),
"zymo" = list(
"none" = "#000000",
"z22" = "#0000cc",
"z23" = "#cc0000"),
"cf" = list(
"cure" = "#006f00",
"fail" = "#9dffa0",
"unknown" = "#cbcbcb",
"notapplicable" = "#000000"),
"condition" = list(
"inf" = "#199c75",
"inf_sb" = "#d65d00",
"uninf" = "#6e6ea3",
"uninf_sb" = "#d83956"),
"significance" = list(
"lt0" = "#ffe0e0",
"lt1" = "#ffa0a0",
"lt2" = "#f94040",
"lt4" = "#a00000",
"gt0" = "#eeccf9",
"gt1" = "#de8bf9",
"gt2" = "#ad07e3",
"gt4" = "#410257"),
"drug" = list(
"none" = "#989898",
"antimony" = "#088b64"),
"oldnew" = list(
"previous" = "#2233aa",
"current" = "#9c0303"),
"infectedp" = list(
"uninfected" = "#676767",
"infected" = "#ac06e2"),
"treatment_zymo" = list(
"inf_sb_z23" = "#E7298A",
"inf_z23" = "#D95F02",
"uninf_none" = "#66A61E",
"uninf_sb_none" = "#E6AB02",
"inf_z22" = "#1B9E77",
"inf_sb_z22" = "#7570B3"),
"donor" = list(
"d01" = "#8d0000",
"d02" = "#E6AB02",
"d09" = "#7570B3",
"d81" = "#2233aa"),
"susceptibility" = list(
"resistant" = "#8563a7",
"sensitive" = "#8d0000",
"ambiguous" = "#cbcbcb",
"unknown" = "#555555"))
data_structures <- c(data_structures, "color_choices")All of the above focused entire on the parasite samples, now let us pull up the macrophage infected samples. This will comprise two datasets, one of the human and one of the parasite.
The metadata for the macrophage samples contains a couple of columns for mapped human and parasite reads. We will therefore use them separately to create two expressionsets, one for each species.
** Note **: I forgot to commit the addition of plot_metadata factors() in the last run of this. In addition, I need to add an explicit month to load_biomart_annotations() or change the function to search a couple more months before it stops trying to find an archive.
## Using mart: ENSEMBL_MART_ENSEMBL from host: apr2020.archive.ensembl.org.
## Successfully connected to the hsapiens_gene_ensembl database.
## Finished downloading ensembl gene annotations.
## Finished downloading ensembl structure annotations.
## Including symbols, there are 67159 vs the 249740 gene annotations.
## Not dropping haplotype chromosome annotations, set drop_haplotypes = TRUE if this is bad.
## Saving annotations to hsapiens_biomart_annotations.rda.
## Finished save().
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["transcript_version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique = TRUE)
rownames(hs_annot) <- paste0("gene:", rownames(hs_annot))
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
sanitize_columns <- c("drug", "macrophagetreatment", "macrophagezymodeme")
macr_annot <- hs_annot
rownames(macr_annot) <- gsub(x = rownames(macr_annot),
pattern = "^gene:",
replacement = "")
hs_macrophage <- create_se(sample_sheet, gene_info = macr_annot,
file_column = "hg38100hisatfile") %>%
set_conditions(fact = "macrophagetreatment") %>%
set_batches(fact = "macrophagezymodeme") %>%
sanitize_metadata(columns = sanitize_columns) %>%
subset_se(nonzero = 12000)## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 69 rows(samples) and 86 columns(metadata fields).
## Matched 21481 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final summarized experiment has 21481 rows and 86 columns.
## The numbers of samples by condition are:
##
## inf inf_sb uninf uninf_sb
## 30 29 5 5
## The number of samples by batch are:
##
## none z2.2 z2.3
## 10 30 29
## Recasting the data.frame to DataFrame.
## sampleid tubelabelorigin samplename numberofvials
## Length:69 Length:69 Length:69 Min. :1
## Class :character Class :character Class :character 1st Qu.:1
## Mode :character Mode :character Mode :character Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
##
## sourcelab expperson cellssource lvpanamensiszymodeme
## Length:69 Length:69 Length:69 Min. :2.20
## Class :character Class :character Class :character 1st Qu.:2.20
## Mode :character Mode :character Mode :character Median :2.20
## Mean :2.25
## 3rd Qu.:2.30
## Max. :2.30
## NA's :22
## lvpanamensissbvsusceptibility samplecollectiondate experimentalbatch
## Length:69 Min. :20190629 Length:69
## Class :character 1st Qu.:20210813 Class :character
## Mode :character Median :20220827 Mode :character
## Mean :20212685
## 3rd Qu.:20220827
## Max. :20220916
##
## typeofcells donor isolationmethod
## Length:69 Length:69 Length:69
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## cellspurificationmethod selectionmethod rnapreservation
## Length:69 Length:69 Length:69
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## rnaextractiondate rnavolumeul rnaavailableul rnaqctesteddate
## Length:69 Min. :30 Min. : 5.2 Min. : 42382
## Class :character 1st Qu.:30 1st Qu.:22.2 1st Qu.:20200114
## Mode :character Median :30 Median :23.3 Median :20205521
## Mean :30 Mean :23.1 Mean :19485696
## 3rd Qu.:30 3rd Qu.:25.6 3rd Qu.:20211111
## Max. :30 Max. :26.2 Max. :20211221
## NA's :41 NA's :41 NA's :41
## bioanalyzerrnangul rnaqcpassed rin nanodroprnangul
## Length:69 Length:69 Length:69 Length:69
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## x260280 x260230 rnausedtoconstructlibrariesul
## Length:69 Length:69 Min. : 2.20
## Class :character Class :character 1st Qu.: 2.80
## Mode :character Mode :character Median : 3.41
## Mean : 4.61
## 3rd Qu.: 4.83
## Max. :23.30
## NA's :41
## rnausedtoconstructlibrariesng libraryqctesteddate libqcpassed
## Min. : 0.5 Min. :20200115 Length:69
## 1st Qu.: 0.5 1st Qu.:20200115 Class :character
## Median :250.3 Median :20201221 Mode :character
## Mean :275.3 Mean :20205505
## 3rd Qu.:500.0 3rd Qu.:20211219
## Max. :800.0 Max. :20211223
## NA's :41 NA's :42
## index libraryvolumeul libraryvolumesenttonajibslabul
## Min. : 1.0 Min. :28 Min. :15
## 1st Qu.: 7.0 1st Qu.:28 1st Qu.:15
## Median :14.5 Median :28 Median :15
## Mean :13.9 Mean :28 Mean :15
## 3rd Qu.:20.2 3rd Qu.:28 3rd Qu.:15
## Max. :27.0 Max. :28 Max. :15
## NA's :41 NA's :41 NA's :41
## shipmentdate oldnew countersampleatcideimul drug
## Min. :20200217 Length:69 Min. :13 antimony:34
## 1st Qu.:20200217 Class :character 1st Qu.:13 none :35
## Median :20210558 Mode :character Median :13
## Mean :20210188 Mean :13
## 3rd Qu.:20220103 3rd Qu.:13
## Max. :20220103 Max. :13
## NA's :41 NA's :56
## descriptonandremarks observation
## Length:69 Length:69
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## librarybioanalyzerprofileelsayedlabfilenamewelllane libraryconcnm
## Length:69 Min. : 22.1
## Class :character 1st Qu.: 57.8
## Mode :character Median : 98.7
## Mean : 96.2
## 3rd Qu.:112.8
## Max. :218.0
## NA's :55
## samplefor100ul2or4nmsequencing waterfor100ul2or4nmsequencing
## Min. :0.917 Min. :92.0
## 1st Qu.:1.754 1st Qu.:96.9
## Median :1.942 Median :98.1
## Mean :2.995 Mean :97.0
## 3rd Qu.:3.082 3rd Qu.:98.2
## Max. :8.000 Max. :99.1
## NA's :56 NA's :56
## sequencingorderno seqorderdate seqcompletedate totalreads
## Length:69 Min. :20200901 Min. :20200910 Min. :12254196
## Class :character 1st Qu.:20200901 1st Qu.:20200910 1st Qu.:21496945
## Mode :character Median :20200901 Median :20200910 Median :23391483
## Mean :20202287 Mean :20202296 Mean :29119440
## 3rd Qu.:20200901 3rd Qu.:20200910 3rd Qu.:27612540
## Max. :20210601 Max. :20210610 Max. :86048061
## NA's :55 NA's :55
## trimmedreads percentkept hg38100salmonfile hg38100hisatfile
## Min. :10400801 Min. :0.735 Length:69 Length:69
## 1st Qu.:19375511 1st Qu.:0.887 Class :character Class :character
## Median :21401552 Median :0.902 Mode :character Mode :character
## Mean :26367248 Mean :0.903
## 3rd Qu.:25048428 3rd Qu.:0.940
## Max. :79775170 Max. :0.945
##
## hisatsinglemappedhg38 hisatmultimappedhg38 hisatmappingratehg38
## Min. : 501881 Min. : 328180 Min. :0.0418
## 1st Qu.:17620997 1st Qu.: 605198 1st Qu.:0.9114
## Median :18986346 Median : 682134 Median :0.9606
## Mean :23074608 Mean :1339401 Mean :0.9223
## 3rd Qu.:22472518 3rd Qu.: 830271 3rd Qu.:0.9787
## Max. :70100660 Max. :7030835 Max. :0.9885
##
## lpanamensisv36hisatfile hisatlpsinglemapped hisatlpmultimapped
## Length:69 Min. : 199 Min. : 14
## Class :character 1st Qu.: 11504 1st Qu.: 770
## Mode :character Median : 284271 Median : 17386
## Mean : 909574 Mean : 70729
## 3rd Qu.:1006681 3rd Qu.: 81008
## Max. :7086371 Max. :620799
##
## parasitemappingrate parasitehostratio x68 macrophagetreatment
## Min. :0.000011 Min. :0.000204 Length:69 inf :30
## 1st Qu.:0.000487 1st Qu.:0.003393 Class :character inf_sb :29
## Median :0.010078 Median :0.003393 Mode :character uninf : 5
## Mean :0.036938 Mean :0.010336 uninf_sb: 5
## 3rd Qu.:0.048453 3rd Qu.:0.003393
## Max. :0.292395 Max. :0.155351
##
## macrophagezymodeme strainid slr1fwd slr1rc
## none:10 Length:69 Min. : 0.00 Min. : 0
## z22 :30 Class :character 1st Qu.: 0.00 1st Qu.: 0
## z23 :29 Mode :character Median : 0.00 Median : 6
## Mean : 1.16 Mean : 31
## 3rd Qu.: 1.00 3rd Qu.: 27
## Max. :11.00 Max. :641
##
## slr2fwd slr2rc slsum slvsreads
## Min. : 2 Min. : 0.000 Min. : 2 Min. :1.03e-07
## 1st Qu.: 224 1st Qu.: 0.000 1st Qu.: 225 1st Qu.:7.81e-06
## Median : 4249 Median : 0.000 Median : 4300 Median :1.83e-04
## Mean : 17256 Mean : 0.667 Mean : 17289 Mean :6.68e-04
## 3rd Qu.: 22089 3rd Qu.: 0.000 3rd Qu.: 22188 3rd Qu.:8.17e-04
## Max. :168826 Max. :13.000 Max. :169487 Max. :6.57e-03
##
## slvshuman trimomatic_input trimomatic_output trimomatic_percent
## Min. :1.05e-07 Length:69 Length:69 Length:69
## 1st Qu.:1.06e-05 Class :character Class :character Class :character
## Median :1.87e-04 Mode :character Mode :character Mode :character
## Mean :8.21e-04
## 3rd Qu.:8.79e-04
## Max. :1.02e-02
##
## fastqc_pct_gc hisat_genome_single_concordant
## Length:69 Length:69
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## hisat_genome_multi_concordant hisat_genome_single_all hisat_genome_multi_all
## Length:69 Length:69 Length:69
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## hisat_genome_percent hisat_count_table manualzymodeme
## Length:69 Length:69 Length:69
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## zymodemevariantcounts zymodemescore condition batch
## Length:69 Length:69 Length:69 Length:69
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## The samples (and read coverage) removed when filtering 12000 non-zero genes are:
## TMRC30162
## 521145
## TMRC30162
## 10208
## Samples removed: 10208
fixed_genenames <- gsub(x = rownames(assay(hs_macrophage)), pattern = "^gene:",
replacement = "")
hs_macrophage <- set_genenames(hs_macrophage, ids = fixed_genenames)
table(colData(hs_macrophage)$condition)##
## inf inf_sb uninf uninf_sb
## 29 29 5 5
## Added to make a simplified PCA plot.
colData(hs_macrophage)[["experiment"]] <- "macrophage"
## The following 3 lines were copy/pasted to datastructures and should be removed soon.
nostrain <- is.na(colData(hs_macrophage)[["strainid"]])
colData(hs_macrophage)[nostrain, "strainid"] <- "none"
colData(hs_macrophage)[["strain_zymo"]] <- paste0("s", colData(hs_macrophage)[["strainid"]],
"_", colData(hs_macrophage)[["macrophagezymodeme"]])
uninfected <- colData(hs_macrophage)[["strain_zymo"]] == "snone_none"
colData(hs_macrophage)[uninfected, "strain_zymo"] <- "uninfected"
colData(hs_macrophage)[["infectedp"]] <- "infected"
colData(hs_macrophage)[uninfected, "infectedp"] <- "uninfected"
data_structures <- c(data_structures, "hs_macrophage")1 sample has been excluded from the analysis but is in the sample sheet. I am reasonably certain I know which, but will double-check here.
sample_sheet_ids = c("TMRC30051","TMRC30057","TMRC30059","TMRC30060","TMRC30061","TMRC30062",
"TMRC30063","TMRC30064","TMRC30065","TMRC30066","TMRC30067","TMRC30069",
"TMRC30117","TMRC30162","TMRC30243","TMRC30244","TMRC30245","TMRC30246",
"TMRC30247","TMRC30248","TMRC30249","TMRC30250","TMRC30251","TMRC30252",
"TMRC30266","TMRC30267","TMRC30268","TMRC30286","TMRC30326","TMRC30316",
"TMRC30317","TMRC30322","TMRC30323","TMRC30328","TMRC30318","TMRC30319",
"TMRC30324","TMRC30325","TMRC30320","TMRC30321","TMRC30327","TMRC30312",
"TMRC30297","TMRC30298","TMRC30299","TMRC30300","TMRC30295","TMRC30296",
"TMRC30303","TMRC30304","TMRC30301","TMRC30302","TMRC30314","TMRC30315",
"TMRC30313")
found <- sample_sheet_ids %in% colnames(assay(hs_macrophage))
colnames(assay(hs_macrophage))[!found]all_human <- sanitize_metadata(hs_macrophage, columns = "drug") %>%
set_conditions(fact = "drug") %>%
set_batches(fact = "typeofcells")## Recasting the data.frame to DataFrame.
## sampleid tubelabelorigin samplename numberofvials
## Length:68 Length:68 Length:68 Min. :1
## Class :character Class :character Class :character 1st Qu.:1
## Mode :character Mode :character Mode :character Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
##
## sourcelab expperson cellssource lvpanamensiszymodeme
## Length:68 Length:68 Length:68 Min. :2.20
## Class :character Class :character Class :character 1st Qu.:2.20
## Mode :character Mode :character Mode :character Median :2.25
## Mean :2.25
## 3rd Qu.:2.30
## Max. :2.30
## NA's :22
## lvpanamensissbvsusceptibility samplecollectiondate experimentalbatch
## Length:68 Min. :20190629 Length:68
## Class :character 1st Qu.:20210813 Class :character
## Mode :character Median :20220827 Mode :character
## Mean :20213009
## 3rd Qu.:20220827
## Max. :20220916
##
## typeofcells donor isolationmethod
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## cellspurificationmethod selectionmethod rnapreservation
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## rnaextractiondate rnavolumeul rnaavailableul rnaqctesteddate
## Length:68 Min. :30 Min. : 5.2 Min. : 42382
## Class :character 1st Qu.:30 1st Qu.:22.1 1st Qu.:20200114
## Mode :character Median :30 Median :23.3 Median :20210914
## Mean :30 Mean :23.0 Mean :19459236
## 3rd Qu.:30 3rd Qu.:25.4 3rd Qu.:20211111
## Max. :30 Max. :26.2 Max. :20211221
## NA's :41 NA's :41 NA's :41
## bioanalyzerrnangul rnaqcpassed rin nanodroprnangul
## Length:68 Length:68 Length:68 Length:68
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## x260280 x260230 rnausedtoconstructlibrariesul
## Length:68 Length:68 Min. : 2.20
## Class :character Class :character 1st Qu.: 2.80
## Mode :character Mode :character Median : 3.50
## Mean : 4.67
## 3rd Qu.: 4.95
## Max. :23.30
## NA's :41
## rnausedtoconstructlibrariesng libraryqctesteddate libqcpassed
## Min. : 0.5 Min. :20200115 Length:68
## 1st Qu.: 0.5 1st Qu.:20200115 Class :character
## Median :500.0 Median :20205670 Mode :character
## Mean :285.4 Mean :20205669
## 3rd Qu.:500.0 3rd Qu.:20211221
## Max. :800.0 Max. :20211223
## NA's :41 NA's :42
## index libraryvolumeul libraryvolumesenttonajibslabul
## Min. : 1.0 Min. :28 Min. :15
## 1st Qu.: 7.5 1st Qu.:28 1st Qu.:15
## Median :15.0 Median :28 Median :15
## Mean :14.2 Mean :28 Mean :15
## 3rd Qu.:20.5 3rd Qu.:28 3rd Qu.:15
## Max. :27.0 Max. :28 Max. :15
## NA's :41 NA's :41 NA's :41
## shipmentdate oldnew countersampleatcideimul drug
## Min. :20200217 Length:68 Min. :13 antimony:34
## 1st Qu.:20200217 Class :character 1st Qu.:13 none :34
## Median :20211012 Mode :character Median :13
## Mean :20210192 Mean :13
## 3rd Qu.:20220103 3rd Qu.:13
## Max. :20220103 Max. :13
## NA's :41 NA's :55
## descriptonandremarks observation
## Length:68 Length:68
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## librarybioanalyzerprofileelsayedlabfilenamewelllane libraryconcnm
## Length:68 Min. : 25.0
## Class :character 1st Qu.: 64.9
## Mode :character Median :103.0
## Mean :101.9
## 3rd Qu.:114.0
## Max. :218.0
## NA's :55
## samplefor100ul2or4nmsequencing waterfor100ul2or4nmsequencing
## Min. :0.917 Min. :92.0
## 1st Qu.:1.754 1st Qu.:96.9
## Median :1.942 Median :98.1
## Mean :2.995 Mean :97.0
## 3rd Qu.:3.082 3rd Qu.:98.2
## Max. :8.000 Max. :99.1
## NA's :55 NA's :55
## sequencingorderno seqorderdate seqcompletedate totalreads
## Length:68 Min. :20200901 Min. :20200910 Min. :12254196
## Class :character 1st Qu.:20200901 1st Qu.:20200910 1st Qu.:21574725
## Mode :character Median :20200901 Median :20200910 Median :23459956
## Mean :20201647 Mean :20201656 Mean :29237101
## 3rd Qu.:20200901 3rd Qu.:20200910 3rd Qu.:27632525
## Max. :20210601 Max. :20210610 Max. :86048061
## NA's :55 NA's :55
## trimmedreads percentkept hg38100salmonfile hg38100hisatfile
## Min. :10400801 Min. :0.735 Length:68 Length:68
## 1st Qu.:19451334 1st Qu.:0.889 Class :character Class :character
## Median :21493351 Median :0.902 Mode :character Mode :character
## Mean :26501589 Mean :0.904
## 3rd Qu.:25238786 3rd Qu.:0.940
## Max. :79775170 Max. :0.945
##
## hisatsinglemappedhg38 hisatmultimappedhg38 hisatmappingratehg38
## Min. : 501881 Min. : 328180 Min. :0.0418
## 1st Qu.:17798143 1st Qu.: 601364 1st Qu.:0.9194
## Median :19059690 Median : 684508 Median :0.9608
## Mean :23215121 Mean :1350199 Mean :0.9238
## 3rd Qu.:22491433 3rd Qu.: 830880 3rd Qu.:0.9791
## Max. :70100660 Max. :7030835 Max. :0.9885
##
## lpanamensisv36hisatfile hisatlpsinglemapped hisatlpmultimapped
## Length:68 Min. : 199 Min. : 14
## Class :character 1st Qu.: 10778 1st Qu.: 737
## Mode :character Median : 240454 Median : 16033
## Mean : 893249 Mean : 69200
## 3rd Qu.: 968384 3rd Qu.: 74954
## Max. :7086371 Max. :620799
##
## parasitemappingrate parasitehostratio x68 macrophagetreatment
## Min. :0.000011 Min. :0.000204 Length:68 inf :29
## 1st Qu.:0.000467 1st Qu.:0.003393 Class :character inf_sb :29
## Median :0.009909 Median :0.003393 Mode :character uninf : 5
## Mean :0.035609 Mean :0.008203 uninf_sb: 5
## 3rd Qu.:0.048448 3rd Qu.:0.003393
## Max. :0.292395 Max. :0.110394
##
## macrophagezymodeme strainid slr1fwd slr1rc
## none:10 Length:68 Min. : 0.00 Min. : 0.0
## z22 :29 Class :character 1st Qu.: 0.00 1st Qu.: 0.0
## z23 :29 Mode :character Median : 0.00 Median : 5.5
## Mean : 1.18 Mean : 29.8
## 3rd Qu.: 1.00 3rd Qu.: 27.0
## Max. :11.00 Max. :641.0
##
## slr2fwd slr2rc slsum slvsreads
## Min. : 2 Min. : 0.000 Min. : 2 Min. :1.03e-07
## 1st Qu.: 203 1st Qu.: 0.000 1st Qu.: 204 1st Qu.:7.72e-06
## Median : 3885 Median : 0.000 Median : 3916 Median :1.52e-04
## Mean : 16917 Mean : 0.647 Mean : 16948 Mean :6.43e-04
## 3rd Qu.: 20858 3rd Qu.: 0.000 3rd Qu.: 20900 3rd Qu.:7.98e-04
## Max. :168826 Max. :13.000 Max. :169487 Max. :6.57e-03
##
## slvshuman trimomatic_input trimomatic_output trimomatic_percent
## Min. :1.05e-07 Length:68 Length:68 Length:68
## 1st Qu.:9.93e-06 Class :character Class :character Class :character
## Median :1.56e-04 Mode :character Mode :character Mode :character
## Mean :7.91e-04
## 3rd Qu.:8.74e-04
## Max. :1.02e-02
##
## fastqc_pct_gc hisat_genome_single_concordant
## Length:68 Length:68
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## hisat_genome_multi_concordant hisat_genome_single_all hisat_genome_multi_all
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## hisat_genome_percent hisat_count_table manualzymodeme
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## zymodemevariantcounts zymodemescore condition batch
## Length:68 Length:68 inf :29 none:10
## Class :character Class :character inf_sb :29 z2.2:29
## Mode :character Mode :character uninf : 5 z2.3:29
## uninf_sb: 5
##
##
##
## num_nonzero experiment strain_zymo infectedp
## Min. :13978 Length:68 Length:68 Length:68
## 1st Qu.:14558 Class :character Class :character Class :character
## Median :14828 Mode :character Mode :character Mode :character
## Mean :14871
## 3rd Qu.:15159
## Max. :16421
##
## The numbers of samples by condition are:
##
## antimony none
## 34 34
## The number of samples by batch are:
##
## Macrophages U937
## 54 14
data_structures <- c(data_structures, "all_human")
## The following 3 lines were copy/pasted to datastructures and should be removed soon.
no_strain_idx <- colData(all_human)[["strainid"]] == "none"
##colData(all_human)[["strainid"]] <- paste0("s", colData(all_human)[["strainid"]],
## "_", colData(all_human)[["macrophagezymodeme"]])
colData(all_human)[no_strain_idx, "strainid"] <- "none"
table(colData(all_human)[["strainid"]])##
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367 2169 7158 none
## 2 8 2 2 2 8 7 8 2 7 8 2 10
## The numbers of samples by condition are:
##
## Macrophages U937
## 54 14
## The number of samples by batch are:
##
## antimony none
## 34 34
data_structures <- c(data_structures, "all_human_types")
type_zymo_fact <- paste0(colData(all_human_types)[["condition"]], "_",
colData(all_human_types)[["macrophagezymodeme"]])
type_zymo <- set_conditions(all_human_types, fact = type_zymo_fact)## The numbers of samples by condition are:
##
## Macrophages_none Macrophages_z22 Macrophages_z23 U937_none
## 8 23 23 2
## U937_z22 U937_z23
## 6 6
data_structures <- c(data_structures, "type_zymo")
type_drug_fact <- paste0(colData(all_human_types)[["condition"]], "_",
colData(all_human_types)[["drug"]])
type_drug <- set_conditions(all_human_types, fact = type_drug_fact)## The numbers of samples by condition are:
##
## Macrophages_antimony Macrophages_none U937_antimony
## 27 27 7
## U937_none
## 7
data_structures <- c(data_structures, "type_drug")
strain_fact <- colData(all_human_types)[["strainid"]]
table(strain_fact)## strain_fact
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367 2169 7158 none
## 2 8 2 2 2 8 7 8 2 7 8 2 10
new_conditions <- paste0(colData(hs_macrophage)[["macrophagetreatment"]], "_",
colData(hs_macrophage)[["macrophagezymodeme"]])
## Note the sanitize() call is redundant with the addition of sanitize() in the
## datastructures file, but I don't want to wait to rerun that.
hs_macr <- set_conditions(hs_macrophage, fact = new_conditions) %>%
sanitize_metadata(column = "drug") %>%
set_se_colors(color_choices[["treatment_zymo"]]) %>%
subset_se(subset = "typeofcells!='U937'")## The numbers of samples by condition are:
##
## inf_sb_z22 inf_sb_z23 inf_z22 inf_z23 uninf_none
## 15 14 14 15 5
## uninf_sb_none
## 5
## Recasting the data.frame to DataFrame.
## sampleid tubelabelorigin samplename numberofvials
## Length:68 Length:68 Length:68 Min. :1
## Class :character Class :character Class :character 1st Qu.:1
## Mode :character Mode :character Mode :character Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
##
## sourcelab expperson cellssource lvpanamensiszymodeme
## Length:68 Length:68 Length:68 Min. :2.20
## Class :character Class :character Class :character 1st Qu.:2.20
## Mode :character Mode :character Mode :character Median :2.25
## Mean :2.25
## 3rd Qu.:2.30
## Max. :2.30
## NA's :22
## lvpanamensissbvsusceptibility samplecollectiondate experimentalbatch
## Length:68 Min. :20190629 Length:68
## Class :character 1st Qu.:20210813 Class :character
## Mode :character Median :20220827 Mode :character
## Mean :20213009
## 3rd Qu.:20220827
## Max. :20220916
##
## typeofcells donor isolationmethod
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## cellspurificationmethod selectionmethod rnapreservation
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## rnaextractiondate rnavolumeul rnaavailableul rnaqctesteddate
## Length:68 Min. :30 Min. : 5.2 Min. : 42382
## Class :character 1st Qu.:30 1st Qu.:22.1 1st Qu.:20200114
## Mode :character Median :30 Median :23.3 Median :20210914
## Mean :30 Mean :23.0 Mean :19459236
## 3rd Qu.:30 3rd Qu.:25.4 3rd Qu.:20211111
## Max. :30 Max. :26.2 Max. :20211221
## NA's :41 NA's :41 NA's :41
## bioanalyzerrnangul rnaqcpassed rin nanodroprnangul
## Length:68 Length:68 Length:68 Length:68
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## x260280 x260230 rnausedtoconstructlibrariesul
## Length:68 Length:68 Min. : 2.20
## Class :character Class :character 1st Qu.: 2.80
## Mode :character Mode :character Median : 3.50
## Mean : 4.67
## 3rd Qu.: 4.95
## Max. :23.30
## NA's :41
## rnausedtoconstructlibrariesng libraryqctesteddate libqcpassed
## Min. : 0.5 Min. :20200115 Length:68
## 1st Qu.: 0.5 1st Qu.:20200115 Class :character
## Median :500.0 Median :20205670 Mode :character
## Mean :285.4 Mean :20205669
## 3rd Qu.:500.0 3rd Qu.:20211221
## Max. :800.0 Max. :20211223
## NA's :41 NA's :42
## index libraryvolumeul libraryvolumesenttonajibslabul
## Min. : 1.0 Min. :28 Min. :15
## 1st Qu.: 7.5 1st Qu.:28 1st Qu.:15
## Median :15.0 Median :28 Median :15
## Mean :14.2 Mean :28 Mean :15
## 3rd Qu.:20.5 3rd Qu.:28 3rd Qu.:15
## Max. :27.0 Max. :28 Max. :15
## NA's :41 NA's :41 NA's :41
## shipmentdate oldnew countersampleatcideimul drug
## Min. :20200217 Length:68 Min. :13 antimony:34
## 1st Qu.:20200217 Class :character 1st Qu.:13 none :34
## Median :20211012 Mode :character Median :13
## Mean :20210192 Mean :13
## 3rd Qu.:20220103 3rd Qu.:13
## Max. :20220103 Max. :13
## NA's :41 NA's :55
## descriptonandremarks observation
## Length:68 Length:68
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## librarybioanalyzerprofileelsayedlabfilenamewelllane libraryconcnm
## Length:68 Min. : 25.0
## Class :character 1st Qu.: 64.9
## Mode :character Median :103.0
## Mean :101.9
## 3rd Qu.:114.0
## Max. :218.0
## NA's :55
## samplefor100ul2or4nmsequencing waterfor100ul2or4nmsequencing
## Min. :0.917 Min. :92.0
## 1st Qu.:1.754 1st Qu.:96.9
## Median :1.942 Median :98.1
## Mean :2.995 Mean :97.0
## 3rd Qu.:3.082 3rd Qu.:98.2
## Max. :8.000 Max. :99.1
## NA's :55 NA's :55
## sequencingorderno seqorderdate seqcompletedate totalreads
## Length:68 Min. :20200901 Min. :20200910 Min. :12254196
## Class :character 1st Qu.:20200901 1st Qu.:20200910 1st Qu.:21574725
## Mode :character Median :20200901 Median :20200910 Median :23459956
## Mean :20201647 Mean :20201656 Mean :29237101
## 3rd Qu.:20200901 3rd Qu.:20200910 3rd Qu.:27632525
## Max. :20210601 Max. :20210610 Max. :86048061
## NA's :55 NA's :55
## trimmedreads percentkept hg38100salmonfile hg38100hisatfile
## Min. :10400801 Min. :0.735 Length:68 Length:68
## 1st Qu.:19451334 1st Qu.:0.889 Class :character Class :character
## Median :21493351 Median :0.902 Mode :character Mode :character
## Mean :26501589 Mean :0.904
## 3rd Qu.:25238786 3rd Qu.:0.940
## Max. :79775170 Max. :0.945
##
## hisatsinglemappedhg38 hisatmultimappedhg38 hisatmappingratehg38
## Min. : 501881 Min. : 328180 Min. :0.0418
## 1st Qu.:17798143 1st Qu.: 601364 1st Qu.:0.9194
## Median :19059690 Median : 684508 Median :0.9608
## Mean :23215121 Mean :1350199 Mean :0.9238
## 3rd Qu.:22491433 3rd Qu.: 830880 3rd Qu.:0.9791
## Max. :70100660 Max. :7030835 Max. :0.9885
##
## lpanamensisv36hisatfile hisatlpsinglemapped hisatlpmultimapped
## Length:68 Min. : 199 Min. : 14
## Class :character 1st Qu.: 10778 1st Qu.: 737
## Mode :character Median : 240454 Median : 16033
## Mean : 893249 Mean : 69200
## 3rd Qu.: 968384 3rd Qu.: 74954
## Max. :7086371 Max. :620799
##
## parasitemappingrate parasitehostratio x68 macrophagetreatment
## Min. :0.000011 Min. :0.000204 Length:68 inf :29
## 1st Qu.:0.000467 1st Qu.:0.003393 Class :character inf_sb :29
## Median :0.009909 Median :0.003393 Mode :character uninf : 5
## Mean :0.035609 Mean :0.008203 uninf_sb: 5
## 3rd Qu.:0.048448 3rd Qu.:0.003393
## Max. :0.292395 Max. :0.110394
##
## macrophagezymodeme strainid slr1fwd slr1rc
## none:10 Length:68 Min. : 0.00 Min. : 0.0
## z22 :29 Class :character 1st Qu.: 0.00 1st Qu.: 0.0
## z23 :29 Mode :character Median : 0.00 Median : 5.5
## Mean : 1.18 Mean : 29.8
## 3rd Qu.: 1.00 3rd Qu.: 27.0
## Max. :11.00 Max. :641.0
##
## slr2fwd slr2rc slsum slvsreads
## Min. : 2 Min. : 0.000 Min. : 2 Min. :1.03e-07
## 1st Qu.: 203 1st Qu.: 0.000 1st Qu.: 204 1st Qu.:7.72e-06
## Median : 3885 Median : 0.000 Median : 3916 Median :1.52e-04
## Mean : 16917 Mean : 0.647 Mean : 16948 Mean :6.43e-04
## 3rd Qu.: 20858 3rd Qu.: 0.000 3rd Qu.: 20900 3rd Qu.:7.98e-04
## Max. :168826 Max. :13.000 Max. :169487 Max. :6.57e-03
##
## slvshuman trimomatic_input trimomatic_output trimomatic_percent
## Min. :1.05e-07 Length:68 Length:68 Length:68
## 1st Qu.:9.93e-06 Class :character Class :character Class :character
## Median :1.56e-04 Mode :character Mode :character Mode :character
## Mean :7.91e-04
## 3rd Qu.:8.74e-04
## Max. :1.02e-02
##
## fastqc_pct_gc hisat_genome_single_concordant
## Length:68 Length:68
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## hisat_genome_multi_concordant hisat_genome_single_all hisat_genome_multi_all
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## hisat_genome_percent hisat_count_table manualzymodeme
## Length:68 Length:68 Length:68
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## zymodemevariantcounts zymodemescore condition batch
## Length:68 Length:68 Length:68 none:10
## Class :character Class :character Class :character z2.2:29
## Mode :character Mode :character Mode :character z2.3:29
##
##
##
##
## num_nonzero experiment strain_zymo infectedp
## Min. :13978 Length:68 Length:68 Length:68
## 1st Qu.:14558 Class :character Class :character Class :character
## Median :14828 Mode :character Mode :character Mode :character
## Mean :14871
## 3rd Qu.:15159
## Max. :16421
##
data_structures <- c(data_structures, "hs_macr")
ggstats_parasite <- plot_metadata_factors(hs_macr, column = "parasitemappingrate",
type = "ggstats", scale = "log2")
pp(file = "images/ggstats_parasiterate_all_macrophage_drug_treatment.png")
ggstats_parasite
dev.off()## png
## 2
## The numbers of samples by condition are:
##
## antimony none
## 27 27
hs_macr_strain_se <- set_conditions(hs_macr, fact = "macrophagezymodeme") %>%
subset_se(subset = "macrophagezymodeme != 'none'")## The numbers of samples by condition are:
##
## none z22 z23
## 8 23 23
##
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367 2169 7158 none
## 2 6 2 2 2 6 5 6 2 5 6 2 8
Let us see if the sankey plot of these samples looks useful…
ggstats_slreads <- plot_metadata_factors(hs_macrophage, column = "hisatlpsinglemapped",
type = "ggstats", scale = "log2")
pp(file = "images/ggstats_slreads_all_macrophage.png")
ggstats_slreads
dev.off()## png
## 2
ggstats_violin <- plot_metadata_factors(hs_macrophage, column = "hisatlpsinglemapped",
scale = "log2")
ggstats_violinmacr_sankey <- plot_meta_sankey(hs_macrophage, color_choices = color_choices,
factors = c("oldnew", "drug", "infectedp", "macrophagezymodeme"))## Warning: attributes are not identical across measure variables; they will be
## dropped
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## i Please use the `linewidth` argument instead.
## i The deprecated feature was likely used in the ggsankey package.
## Please report the issue at <https://github.com/davidsjoberg/ggsankey/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## A sankey plot describing the metadata of 68 samples,
## including 26 out of 0 nodes and traversing metadata factors:
## oldnew, drug, infectedp, macrophagezymodeme.
Finally, split off the U937 samples.
In the previous block, we used a new invocation of ensembl-derived annotation data, this time we can just use our existing parasite gene annotations.
if (isTRUE(do_parasite)) {
lp_macrophage <- create_se(
sample_sheet, file_column = "lpanamensisv36hisatfile", gene_info = all_lp_annot,
savefile = glue("rda/lp_macrophage-v{ver}.rda"),
annotation = "org.Lpanamensis.MHOMCOL81L13.v46.eg.db") %>%
set_conditions(fact = "macrophagezymodeme") %>%
set_batches(fact = "macrophagetreatment")
unfilt_written <- write_se(
lp_macrophage,
excel = glue("analyses/macrophage_de/{ver}/read_counts/lp_macrophage_reads_unfiltered-v{ver}.xlsx"))
lp_macrophage_filt <- subset_se(lp_macrophage, nonzero = 2500) %>%
semantic_filter(semantic = c("amastin", "gp63", "leishmanolysin"),
semantic_column = "annot_gene_product")
data_structures <- c(data_structures, "lp_macrophage", "lp_macrophage_filt")
filt_written <- write_se(lp_macrophage_filt,
excel = glue("analyses/macrophage_de/{ver}/read_counts/lp_macrophage_reads_filtered-v{ver}.xlsx"))
lp_macrophage <- lp_macrophage_filt
lp_macrophage_nosb <- subset_se(lp_macrophage, subset="batch!='inf_sb'")
lp_nosb_write <- write_se(
lp_macrophage_nosb,
excel = glue("analyses/macrophage_de/{ver}/read_counts/lp_macrophage_nosb_reads-v{ver}.xlsx"))
data_structures <- c(data_structures, "lp_macrophage_nosb")
spec <- make_rnaseq_spec()
test <- gather_preprocessing_metadata(sample_sheet, specification = spec)
}## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 69 rows(samples) and 86 columns(metadata fields).
## Warning in create_se(sample_sheet, file_column = "lpanamensisv36hisatfile", :
## Some samples were removed when cross referencing the samples against the count
## data.
## Matched 8778 annotations and counts.
## The final summarized experiment has 8778 rows and 86 columns.
## The numbers of samples by condition are:
##
## none z2.2 z2.3
## 8 29 29
## The number of samples by batch are:
##
## inf inf_sb uninf uninf_sb
## 29 29 4 4
## Writing the first sheet, containing a legend and some summary data.
## Warning in .local(x, row.names, optional, ...): arguments in '...' ignored
## The following samples have less than 5705.7 genes.
## [1] "TMRC30066" "TMRC30117" "TMRC30244" "TMRC30246" "TMRC30249" "TMRC30266"
## [7] "TMRC30268" "TMRC30292" "TMRC30300" "TMRC30302" "TMRC30304" "TMRC30309"
## [13] "TMRC30312" "TMRC30313" "TMRC30319" "TMRC30323" "TMRC30325" "TMRC30326"
## [19] "TMRC30327" "TMRC30330" "TMRC30331" "TMRC30332"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.
## i The deprecated feature was likely used in the hpgltools package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in fortify(data, ...): Arguments in `...` must be used.
## x Problematic argument:
## * colour = colors
## i Did you misspell an argument name?
## 175550 entries are 0. We are on a log scale, adding 1 to the data.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
## `geom_smooth()` using formula = 'y ~ x'This dataset does not support lmer with condition+batch
## Removing 0 low-count genes (8778 remaining).
## transform_counts: Found 175550 values equal to 0, adding 1 to the matrix.
## `geom_smooth()` using formula = 'y ~ x'The factor none has 8 rows.
## The factor z2.2 has 29 rows.
## The factor z2.3 has 29 rows.
## The samples (and read coverage) removed when filtering 2500 non-zero genes are:
## TMRC30066 TMRC30117 TMRC30244 TMRC30246 TMRC30266 TMRC30268 TMRC30304 TMRC30309
## 3080 1147 1662 2834 822 3444 289 188
## TMRC30312 TMRC30313 TMRC30319 TMRC30323 TMRC30325 TMRC30326 TMRC30327 TMRC30330
## 76 96 374 84 356 375 129 181
## TMRC30066 TMRC30117 TMRC30244 TMRC30246 TMRC30266 TMRC30268 TMRC30304 TMRC30309
## 1890 888 1135 1796 649 1915 207 166
## TMRC30312 TMRC30313 TMRC30319 TMRC30323 TMRC30325 TMRC30326 TMRC30327 TMRC30330
## 76 84 270 74 279 303 123 135
## Samples removed: 1890, 888, 1135, 1796, 649, 1915, 207, 166, 76, 84, 270, 74, 279, 303, 123, 135
## semantic_expt_filter(): Removed 68 genes.
## Writing the first sheet, containing a legend and some summary data.
## Warning in .local(x, row.names, optional, ...): arguments in '...' ignored
## The following samples have less than 5661.5 genes.
## [1] "TMRC30249" "TMRC30292" "TMRC30300" "TMRC30302" "TMRC30331" "TMRC30332"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning in fortify(data, ...): Arguments in `...` must be used.
## x Problematic argument:
## * colour = colors
## i Did you misspell an argument name?
## 44583 entries are 0. We are on a log scale, adding 1 to the data.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
## `geom_smooth()` using formula = 'y ~ x'Removing 0 low-count genes (8710 remaining).
## transform_counts: Found 44583 values equal to 0, adding 1 to the matrix.
## `geom_smooth()` using formula = 'y ~ x'The factor z2.2 has 21 rows.
## The factor z2.3 has 29 rows.
## Writing the first sheet, containing a legend and some summary data.
## Warning in .local(x, row.names, optional, ...): arguments in '...' ignored
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning in fortify(data, ...): Arguments in `...` must be used.
## x Problematic argument:
## * colour = colors
## i Did you misspell an argument name?
## 6396 entries are 0. We are on a log scale, adding 1 to the data.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## i Please use tidy evaluation idioms with `aes()`.
## i See also `vignette("ggplot2-in-packages")` for more information.
## i The deprecated feature was likely used in the directlabels package.
## Please report the issue at <https://github.com/tdhock/directlabels/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
## `geom_smooth()` using formula = 'y ~ x'The dataset has a minimal or missing set of conditions/batches.
## Removing 119 low-count genes (8591 remaining).
## transform_counts: Found 3163 values equal to 0, adding 1 to the matrix.
## `geom_smooth()` using formula = 'y ~ x'The factor z2.2 has 14 rows.
## The factor z2.3 has 15 rows.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: trimomatic_input already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: trimomatic_output already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: trimomatic_percent already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: fastqc_pct_gc already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: hisat_genome_single_concordant already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: hisat_genome_multi_concordant already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: hisat_genome_single_all already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: hisat_genome_multi_all already exists, replacing it.
## Warning in gather_preprocessing_metadata(sample_sheet, specification = spec):
## Column: hisat_count_table already exists, replacing it.
## preprocessing/TMRC30051/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30057/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30059/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30060/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30061/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30062/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30063/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30064/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30065/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30066/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30067/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30069/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30117/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30162/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30243/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30244/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30245/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30246/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30247/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30248/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30249/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30250/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30251/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30252/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30266/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30267/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30268/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30286/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30291/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30292/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30293/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30294/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30295/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30296/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30297/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30298/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30299/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30300/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30301/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30302/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30303/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30304/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30305/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30306/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30307/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30308/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30309/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30310/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30311/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30312/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30313/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30314/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30315/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30316/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30317/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30318/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30319/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30320/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30321/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30322/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30323/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30324/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30325/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30326/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30327/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30328/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30330/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30331/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## preprocessing/TMRC30332/outputs/*hisat*_*/*_*genome*_gene_ID_fcounts.csv.xz
## Writing new metadata to: sample_sheets/macrophage_samples_modified.xlsx
found_idx <- data_structures %in% ls()
if (sum(!found_idx) > 0) {
not_found <- data_structures[!found_idx]
warning("Some datastructures were not generated: ", toString(not_found), ".")
data_structures <- data_structures[found_idx]
}
save(list = data_structures, file = glue("rda/tmrc2_data_structures-v{ver}.rda"))## Warning: Your system is mis-configured: '/etc/localtime' is not a symlink
## Warning: It is strongly recommended to set envionment variable TZ to
## 'America/New_York' (or equivalent)
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
locale: C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ruv(v.0.9.7.1), BiocParallel(v.1.42.1), variancePartition(v.1.38.1), org.Lpanamensis.MHOMCOL81L13.v68.eg.db(v.2024.09), AnnotationDbi(v.1.70.0), IRanges(v.2.42.0), S4Vectors(v.0.46.0), Biobase(v.2.68.0), BiocGenerics(v.0.54.0), generics(v.0.1.4), dplyr(v.1.1.4), Heatplus(v.3.16.0), ggplot2(v.4.0.0), glue(v.1.8.0) and hpgltools(v.1.2)
loaded via a namespace (and not attached): fs(v.1.6.6), matrixStats(v.1.5.0), bitops(v.1.0-9), doParallel(v.1.0.17), httr(v.1.4.7), RColorBrewer(v.1.1-3), insight(v.1.4.2), numDeriv(v.2016.8-1.1), tools(v.4.5.0), backports(v.1.5.0), R6(v.2.6.1), statsExpressions(v.1.7.1), lazyeval(v.0.2.2), mgcv(v.1.9-3), withr(v.3.0.2), gridExtra(v.2.3), prettyunits(v.1.2.0), cli(v.3.6.5), performance(v.0.15.1), labeling(v.0.4.3), sass(v.0.4.10), prismatic(v.1.1.2), BWStest(v.0.2.3), mvtnorm(v.1.3-3), S7(v.0.2.0), readr(v.2.1.5), genefilter(v.1.90.0), pbapply(v.1.7-4), Rsamtools(v.2.24.0), yulab.utils(v.0.2.1), DOSE(v.4.2.0), R.utils(v.2.13.0), dichromat(v.2.0-0.1), limma(v.3.64.3), RSQLite(v.2.4.3), BiocIO(v.1.18.0), vroom(v.1.6.5), gtools(v.3.9.5), zip(v.2.3.3), GO.db(v.3.21.0), Matrix(v.1.7-3), abind(v.1.4-8), R.methodsS3(v.1.8.2), lifecycle(v.1.0.4), yaml(v.2.3.10), edgeR(v.4.6.3), SummarizedExperiment(v.1.38.1), gplots(v.3.2.0), qvalue(v.2.40.0), SparseArray(v.1.8.1), BiocFileCache(v.2.16.1), Rtsne(v.0.17), paletteer(v.1.6.0), grid(v.4.5.0), blob(v.1.2.4), promises(v.1.3.3), crayon(v.1.5.3), lattice(v.0.22-7), cowplot(v.1.2.0), GenomicFeatures(v.1.60.0), annotate(v.1.86.1), KEGGREST(v.1.48.1), zeallot(v.0.2.0), pillar(v.1.11.0), knitr(v.1.50), varhandle(v.2.0.6), fgsea(v.1.34.2), GenomicRanges(v.1.60.0), rjson(v.0.2.23), boot(v.1.3-31), corpcor(v.1.6.10), kSamples(v.1.2-12), codetools(v.0.2-20), fastmatch(v.1.1-6), data.table(v.1.17.8), vctrs(v.0.6.5), png(v.0.1-8), Rdpack(v.2.6.4), gtable(v.0.3.6), rematch2(v.2.1.2), datawizard(v.1.2.0), cachem(v.1.1.0), xfun(v.0.53), openxlsx(v.4.2.8), rbibutils(v.2.3), S4Arrays(v.1.8.1), mime(v.0.13), correlation(v.0.8.8), reformulas(v.0.4.1), coda(v.0.19-4.1), survival(v.3.8-3), iterators(v.1.0.14), statmod(v.1.5.0), gmp(v.0.7-5), directlabels(v.2025.6.24), nlme(v.3.1-168), pbkrtest(v.0.5.5), bit64(v.4.6.0-1), EnvStats(v.3.1.0), progress(v.1.2.3), filelock(v.1.0.3), GenomeInfoDb(v.1.44.2), bslib(v.0.9.0), KernSmooth(v.2.23-26), DBI(v.1.2.3), tidyselect(v.1.2.1), bit(v.4.6.0), compiler(v.4.5.0), curl(v.7.0.0), httr2(v.1.2.1), graph(v.1.86.0), xml2(v.1.4.0), DelayedArray(v.0.34.1), plotly(v.4.11.0), bayestestR(v.0.17.0), rtracklayer(v.1.68.0), scales(v.1.4.0), caTools(v.1.18.3), remaCor(v.0.0.20), quadprog(v.1.5-8), multcompView(v.0.1-10), rappdirs(v.0.3.3), stringr(v.1.5.1), digest(v.0.6.37), ggsankey(v.0.0.99999), minqa(v.1.2.8), rmarkdown(v.2.29), aod(v.1.3.3), XVector(v.0.48.0), RhpcBLASctl(v.0.23-42), htmltools(v.0.5.8.1), pkgconfig(v.2.0.3), lme4(v.1.1-37), MatrixGenerics(v.1.20.0), dbplyr(v.2.5.0), fastmap(v.1.2.0), rlang(v.1.1.6), htmlwidgets(v.1.6.4), UCSC.utils(v.1.4.0), shiny(v.1.11.1), SuppDists(v.1.1-9.9), farver(v.2.1.2), jquerylib(v.0.1.4), jsonlite(v.2.0.0), GOSemSim(v.2.34.0), R.oo(v.1.27.1), RCurl(v.1.98-1.17), magrittr(v.2.0.3), GenomeInfoDbData(v.1.2.14), patchwork(v.1.3.2), parameters(v.0.28.2), Rcpp(v.1.1.0), stringi(v.1.8.7), MASS(v.7.3-65), plyr(v.1.8.9), parallel(v.4.5.0), ggrepel(v.0.9.6), Biostrings(v.2.76.0), PMCMRplus(v.1.9.12), splines(v.4.5.0), pander(v.0.6.6), hms(v.1.1.3), locfit(v.1.5-9.12), fastcluster(v.1.3.0), ggsignif(v.0.6.4), effectsize(v.1.0.1), reshape2(v.1.4.4), biomaRt(v.2.64.0), rstantools(v.2.5.0), XML(v.3.99-0.19), evaluate(v.1.0.4), RcppParallel(v.5.1.11-1), tzdb(v.0.5.0), nloptr(v.2.2.1), foreach(v.1.5.2), httpuv(v.1.6.16), MatrixModels(v.0.5-4), BayesFactor(v.0.9.12-4.7), tidyr(v.1.3.1), purrr(v.1.1.0), broom(v.1.0.10), xtable(v.1.8-4), restfulr(v.0.0.16), Rmpfr(v.1.1-1), fANCOVA(v.0.6-1), later(v.1.4.3), viridisLite(v.0.4.2), tibble(v.3.3.0), lmerTest(v.3.1-3), ggstatsplot(v.0.13.2), memoise(v.2.0.1), GenomicAlignments(v.1.44.0), sva(v.3.56.0) and GSEABase(v.1.70.0)
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 499a1491ca81e6bda52768411a1ba6e2912d2e74
## This is hpgltools commit: Sat Sep 27 13:11:43 2025 -0400: 499a1491ca81e6bda52768411a1ba6e2912d2e74
## Saving to 01datasets.rda.xz